From vlad at dev.mellanox.co.il Thu Mar 1 01:09:35 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 01 Mar 2007 11:09:35 +0200 Subject: [ofa-general] Re: ofed_1_2_scripts update patchs In-Reply-To: <45E635E7.5030401@cse.ohio-state.edu> References: <45E635E7.5030401@cse.ohio-state.edu> Message-ID: <1172740175.17950.11.camel@vladsk-laptop> On Wed, 2007-02-28 at 21:09 -0500, Shaun Rowland wrote: > I've uploaded a new MVAPICH2 SRPM: mvapich2-0.9.8-5.src.rpm. I will have > to upload a new version again before the beta release, but I wanted to > get these patches out and a new SRPM uploaded ASAP. I've attached the > following patches done against the latest update for the > ofed_1_2_scripts GIT repository: > > mvapich2.patch > -------------- > - fix for bug 386 > - adds mpi-selector support to MVAPICH2 > - changes one default setting for MVAPICH2 in the case the user has not > specified build options > > mpi-selector.patch > ------------------ > - fixes an ordering problem in install.sh around line 130 (see below) Applied. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From vlad at lists.openfabrics.org Thu Mar 1 02:35:49 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Thu, 1 Mar 2007 02:35:49 -0800 (PST) Subject: [ofa-general] ofa_1_2_kernel 20070301-0200 daily build status Message-ID: <20070301103550.939ECE60826@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.12 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.17 Passed on powerpc with linux-2.6.16 Passed on ia64 with linux-2.6.13 Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.14 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.18-1.2798.fc6 Failed: Build failed on x86_64 with linux-2.6.5-7.244-smp Log: /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_exit': /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:468: error: implicit declaration of function 'proto_unregister' /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_init': /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:517: error: implicit declaration of function 'proto_register' make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.5-7.244-smp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-22.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: ‘ADVERTISE_PAUSE_CAP’ undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.) /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: ‘ADVERTISE_PAUSE_ASYM’ undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-34.ELsmp Log: /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘add_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: ‘adapter_list_lock’ undeclared (first use in this function) /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘remove_adapter’: /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: ‘adapter_list_lock’ undeclared (first use in this function) make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From vlad at mellanox.co.il Thu Mar 1 02:55:58 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 01 Mar 2007 12:55:58 +0200 Subject: [ofa-general] OFED 1.2 daily builds In-Reply-To: References: Message-ID: <1172746558.17950.38.camel@vladsk-laptop> OFED daily builds started on openfabrics server. http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-yyyymmdd-hhmm.tgz will be created every day at 06:00 PST. http://www.openfabrics.org/builds/ofed-1.2/latest.txt includes the name of the latest OFED package. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From jsquyres at cisco.com Thu Mar 1 03:03:34 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 1 Mar 2007 06:03:34 -0500 Subject: [ofa-general] Re: OFED 1.2 daily builds In-Reply-To: <1172746558.17950.38.camel@vladsk-laptop> References: <1172746558.17950.38.camel@vladsk-laptop> Message-ID: It looks like the tarball name changed today -- there's an extra "1.2" in there. Was that intentional? On Mar 1, 2007, at 5:55 AM, Vladimir Sokolovsky wrote: > > OFED daily builds started on openfabrics server. > http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-yyyymmdd-hhmm.tgz > will be created every day at 06:00 PST. > > http://www.openfabrics.org/builds/ofed-1.2/latest.txt includes the > name > of the latest OFED package. > > -- > Vladimir Sokolovsky > Mellanox Technologies Ltd. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From mst at mellanox.co.il Thu Mar 1 03:29:17 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Mar 2007 13:29:17 +0200 Subject: [ofa-general] Re: ofa_1_2_kernel 20070301-0200 daily build status In-Reply-To: <20070301103550.939ECE60826@openfabrics.org> References: <20070301103550.939ECE60826@openfabrics.org> Message-ID: <20070301112917.GE14282@mellanox.co.il> Build failed on x86_64 with linux-2.6.5-7.244-smp > Log: > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_exit': > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:468: error: implicit declaration of function 'proto_unregister' > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_init': > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:517: error: implicit declaration of function 'proto_register' > make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.o] Error 1 > make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds] Error 2 > make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check] Error 2 > make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.5-7.244-smp' > make: *** [kernel] Error 2 > ---------------------------------------------------------------------------------- RDS fails on SLES9SP3. Vlad? > Build failed on x86_64 with linux-2.6.9-22.ELsmp > Log: > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: ???ADVERTISE_PAUSE_CAP??? undeclared (first use in this function) > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.) > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: ???ADVERTISE_PAUSE_ASYM??? undeclared (first use in this function) > make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1 > make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 > make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2 > make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp' > make: *** [kernel] Error 2 > ---------------------------------------------------------------------------------- > Build failed on x86_64 with linux-2.6.9-34.ELsmp > Log: > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ???add_adapter???: > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: ???adapter_list_lock??? undeclared (first use in this function) > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ???remove_adapter???: > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: ???adapter_list_lock??? undeclared (first use in this function) > make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1 > make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 > make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2 > make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp' > make: *** [kernel] Error 2 Chelsio fails on RHELU2 and RHELU3. Steve, do you intend to fix this or should it be disabled on these kernels? -- MST From vlad at mellanox.co.il Thu Mar 1 03:40:00 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 01 Mar 2007 13:40:00 +0200 Subject: [ofa-general] Re: [ewg] Re: OFED 1.2 daily builds In-Reply-To: References: <1172746558.17950.38.camel@vladsk-laptop> Message-ID: <1172749200.17950.40.camel@vladsk-laptop> On Thu, 2007-03-01 at 06:03 -0500, Jeff Squyres wrote: > It looks like the tarball name changed today -- there's an extra > "1.2" in there. > > Was that intentional? > It was a typo. Fixed. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From mst at mellanox.co.il Thu Mar 1 06:02:36 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Mar 2007 16:02:36 +0200 Subject: [ofa-general] [PATCH for-2.6.21] IB/mthca: QP reset fixes Message-ID: <20070301140236.GI14282@mellanox.co.il> Fix 2 issues related to QP reset: 1. After moving QP to reset, make sure no event handlers are in progress 2. In QP destroy, reset QP before removing it from QP table: otherwise we get bogus QP/unknown QP warnings (and theoretically, crash, if the same slot is reused with the same QPN). This fixes openfabrics bugzilla 394. Signed-off-by: Michael S. Tsirkin --- Roland, please queue for 2.6.21. Link to bugzilla entry: https://bugs.openfabrics.org/show_bug.cgi?id=394 diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 71dc84b..345c84e 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -864,6 +864,12 @@ int mthca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask, if (qp->ibqp.send_cq != qp->ibqp.recv_cq) mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, NULL); + if (dev->mthca_flags & MTHCA_FLAG_MSI_X) { + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_COMP].msi_x_vector); + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_ASYNC].msi_x_vector); + } else + synchronize_irq(dev->pdev->irq); + mthca_wq_reset(&qp->sq); qp->sq.last = get_send_wqe(qp, qp->sq.max - 1); @@ -1393,6 +1399,10 @@ void mthca_free_qp(struct mthca_dev *dev, send_cq = to_mcq(qp->ibqp.send_cq); recv_cq = to_mcq(qp->ibqp.recv_cq); + if (qp->state != IB_QPS_RESET) + mthca_MODIFY_QP(dev, qp->state, IB_QPS_RESET, qp->qpn, 0, + NULL, 0, &status); + /* * Lock CQs here, so that CQ polling code can do QP lookup * without taking a lock. @@ -1409,10 +1419,6 @@ void mthca_free_qp(struct mthca_dev *dev, wait_event(qp->wait, !get_qp_refcount(dev, qp)); - if (qp->state != IB_QPS_RESET) - mthca_MODIFY_QP(dev, qp->state, IB_QPS_RESET, qp->qpn, 0, - NULL, 0, &status); - /* * If this is a userspace QP, the buffers, MR, CQs and so on * will be cleaned up in userspace, so all we have to do is @@ -1425,6 +1431,12 @@ void mthca_free_qp(struct mthca_dev *dev, mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); + if (dev->mthca_flags & MTHCA_FLAG_MSI_X) { + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_COMP].msi_x_vector); + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_ASYNC].msi_x_vector); + } else + synchronize_irq(dev->pdev->irq); + mthca_free_memfree(dev, qp); mthca_free_wqe_buf(dev, qp); } -- MST From mst at mellanox.co.il Thu Mar 1 06:41:39 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Mar 2007 16:41:39 +0200 Subject: [ofa-general] [PATCHv2 for-2.6.21] IB/mthca: QP reset fixes In-Reply-To: <20070301140236.GI14282@mellanox.co.il> References: <20070301140236.GI14282@mellanox.co.il> Message-ID: <20070301144139.GK14282@mellanox.co.il> Fix 2 issues related to QP reset: 1. After moving QP to reset, make sure no event handlers are in progress 2. In QP destroy, reset QP before removing it from QP table: otherwise we get bogus QP/unknown QP warnings (and theoretically, crash, if the same slot is reused with the same QPN). This fixes openfabrics bugzilla 394. Signed-off-by: Michael S. Tsirkin --- Changeds wrt v1: since we use synchronize_irq, it is cleaner to include linux/hardirq.h in qp.c directly. Roland, please queue for 2.6.21. Link to bugzilla entry: https://bugs.openfabrics.org/show_bug.cgi?id=394 diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 71dc84b..387af53 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -37,6 +37,7 @@ #include #include +#include #include @@ -864,6 +865,12 @@ int mthca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask, if (qp->ibqp.send_cq != qp->ibqp.recv_cq) mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, NULL); + if (dev->mthca_flags & MTHCA_FLAG_MSI_X) { + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_COMP].msi_x_vector); + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_ASYNC].msi_x_vector); + } else + synchronize_irq(dev->pdev->irq); + mthca_wq_reset(&qp->sq); qp->sq.last = get_send_wqe(qp, qp->sq.max - 1); @@ -1393,6 +1400,10 @@ void mthca_free_qp(struct mthca_dev *dev, send_cq = to_mcq(qp->ibqp.send_cq); recv_cq = to_mcq(qp->ibqp.recv_cq); + if (qp->state != IB_QPS_RESET) + mthca_MODIFY_QP(dev, qp->state, IB_QPS_RESET, qp->qpn, 0, + NULL, 0, &status); + /* * Lock CQs here, so that CQ polling code can do QP lookup * without taking a lock. @@ -1409,10 +1420,6 @@ void mthca_free_qp(struct mthca_dev *dev, wait_event(qp->wait, !get_qp_refcount(dev, qp)); - if (qp->state != IB_QPS_RESET) - mthca_MODIFY_QP(dev, qp->state, IB_QPS_RESET, qp->qpn, 0, - NULL, 0, &status); - /* * If this is a userspace QP, the buffers, MR, CQs and so on * will be cleaned up in userspace, so all we have to do is @@ -1425,6 +1432,12 @@ void mthca_free_qp(struct mthca_dev *dev, mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); + if (dev->mthca_flags & MTHCA_FLAG_MSI_X) { + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_COMP].msi_x_vector); + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_ASYNC].msi_x_vector); + } else + synchronize_irq(dev->pdev->irq); + mthca_free_memfree(dev, qp); mthca_free_wqe_buf(dev, qp); } -- MST From monil at voltaire.com Thu Mar 1 06:48:00 2007 From: monil at voltaire.com (Moni Levy) Date: Thu, 01 Mar 2007 16:48:00 +0200 Subject: [ofa-general] [PATCHv3] IB/ipoib: Fix ipoib handling for pkey reordering Message-ID: <45E6E7A0.7070902@voltaire.com> This issue was found during partitioning & SM fail over testing. The fix was tested over the weekend with pkey reshuffling, removal and addition every few seconds concurrent with OFED restart. Changes from v1: * added flush flag to ipoib_ib_dev_stop(), ipoib_ib_dev_down() alike * fixed a bug in device extraction from the work struct * removed some warnings in case they are caused due to missing PKEY as this seems like a valid flow now. Changes from v2: * less/fixed debug prints - (MST remark) * flush_restart_qp stuff renamed to just restart_qp (MST remark) * the patch now depends on Roland's "IPoIB: Only handle async events for one port" SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP. Signed-off-by: Moni Levy --- drivers/infiniband/ulp/ipoib/ipoib.h | 4 + drivers/infiniband/ulp/ipoib/ipoib_ib.c | 51 ++++++++++++++++++++----- drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 11 ++--- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 7 ++- 5 files changed, 59 insertions(+), 19 deletions(-) Index: b/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-01 14:11:43.698307017 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-01 14:43:04.624119588 +0200 @@ -205,6 +205,7 @@ struct ipoib_dev_priv { struct delayed_work pkey_task; struct delayed_work mcast_task; struct work_struct flush_task; + struct work_struct restart_qp_task; struct work_struct restart_task; struct delayed_work ah_reap_task; @@ -334,12 +335,13 @@ struct ipoib_dev_priv *ipoib_intf_alloc( int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_ib_dev_flush(struct work_struct *work); +void ipoib_ib_dev_restart_qp(struct work_struct *work); void ipoib_ib_dev_cleanup(struct net_device *dev); int ipoib_ib_dev_open(struct net_device *dev); int ipoib_ib_dev_up(struct net_device *dev); int ipoib_ib_dev_down(struct net_device *dev, int flush); -int ipoib_ib_dev_stop(struct net_device *dev); +int ipoib_ib_dev_stop(struct net_device *dev, int flush); int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_dev_cleanup(struct net_device *dev); Index: b/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-01 14:11:43.713304355 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-01 16:14:17.003881103 +0200 @@ -415,21 +415,22 @@ int ipoib_ib_dev_open(struct net_device ret = ipoib_init_qp(dev); if (ret) { - ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); + if (ret != -ENOENT) + ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); return -1; } ret = ipoib_ib_post_receives(dev); if (ret) { ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -1; } ret = ipoib_cm_dev_open(dev); if (ret) { ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -1; } @@ -508,7 +509,7 @@ static int recvs_pending(struct net_devi return pending; } -int ipoib_ib_dev_stop(struct net_device *dev) +int ipoib_ib_dev_stop(struct net_device *dev, int flush) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_attr qp_attr; @@ -581,7 +582,8 @@ timeout: /* Wait for all AHs to be reaped */ set_bit(IPOIB_STOP_REAPER, &priv->flags); cancel_delayed_work(&priv->ah_reap_task); - flush_workqueue(ipoib_workqueue); + if (flush) + flush_workqueue(ipoib_workqueue); begin = jiffies; @@ -622,13 +624,17 @@ int ipoib_ib_dev_init(struct net_device return 0; } -void ipoib_ib_dev_flush(struct work_struct *work) +static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, int restart_qp) { - struct ipoib_dev_priv *cpriv, *priv = - container_of(work, struct ipoib_dev_priv, flush_task); + struct ipoib_dev_priv *cpriv; struct net_device *dev = priv->dev; - if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) ) { + /* + * ipoib_ib_dev_stop() below may not find the PKey and leave the + * IPOIB_FLAG_INITIALIZED flag off so flush in that case with restart_qp + * flag on is Ok. + */ + if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) && !restart_qp) { ipoib_dbg(priv, "Not flushing - IPOIB_FLAG_INITIALIZED not set.\n"); return; } @@ -642,6 +648,13 @@ void ipoib_ib_dev_flush(struct work_stru ipoib_ib_dev_down(dev, 0); + if (restart_qp) { + ipoib_dbg(priv, "restarting the device QP\n"); + if (test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) ) + ipoib_ib_dev_stop(dev, 0); + ipoib_ib_dev_open(dev); + } + /* * The device could have been brought down between the start and when * we get here, don't bring it back up if it's not configured up @@ -655,11 +668,29 @@ void ipoib_ib_dev_flush(struct work_stru /* Flush any child interfaces too */ list_for_each_entry(cpriv, &priv->child_intfs, list) - ipoib_ib_dev_flush(&cpriv->flush_task); + __ipoib_ib_dev_flush(cpriv, restart_qp); mutex_unlock(&priv->vlan_mutex); } +void ipoib_ib_dev_flush(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, flush_task); + /* We only restart the QP in case of pkey change event */ + ipoib_dbg(priv, "Flushing %s\n", priv->dev->name); + __ipoib_ib_dev_flush(priv, 0); +} + +void ipoib_ib_dev_restart_qp(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, restart_qp_task); + /* We only restart the QP in case of pkey change event */ + ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", priv->dev->name); + __ipoib_ib_dev_flush(priv, 1); +} + void ipoib_ib_dev_cleanup(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); Index: b/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-01 14:11:43.729301517 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-01 14:43:04.666112093 +0200 @@ -107,7 +107,7 @@ int ipoib_open(struct net_device *dev) return -EINVAL; if (ipoib_ib_dev_up(dev)) { - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -EINVAL; } @@ -152,7 +152,7 @@ static int ipoib_stop(struct net_device flush_workqueue(ipoib_workqueue); ipoib_ib_dev_down(dev, 1); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { struct ipoib_dev_priv *cpriv; @@ -993,6 +993,7 @@ static void ipoib_setup(struct net_devic INIT_DELAYED_WORK(&priv->pkey_task, ipoib_pkey_poll); INIT_DELAYED_WORK(&priv->mcast_task, ipoib_mcast_join_task); INIT_WORK(&priv->flush_task, ipoib_ib_dev_flush); + INIT_WORK(&priv->restart_qp_task, ipoib_ib_dev_restart_qp); INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); } Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-01 14:11:43.743299033 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-01 16:21:43.128181147 +0200 @@ -232,9 +232,10 @@ static int ipoib_mcast_join_finish(struc ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), &mcast->mcmember.mgid); if (ret < 0) { - ipoib_warn(priv, "couldn't attach QP to multicast group " - IPOIB_GID_FMT "\n", - IPOIB_GID_ARG(mcast->mcmember.mgid)); + if (ret != -ENXIO) /* No pkey found */ + ipoib_warn(priv, "couldn't attach QP to multicast group " + IPOIB_GID_FMT "\n", + IPOIB_GID_ARG(mcast->mcmember.mgid)); clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags); return ret; @@ -312,7 +313,7 @@ ipoib_mcast_sendonly_join_complete(int s status = ipoib_mcast_join_finish(mcast, &multicast->rec); if (status) { - if (mcast->logcount++ < 20) + if (mcast->logcount++ < 20 && status != -ENXIO) ipoib_dbg_mcast(netdev_priv(dev), "multicast join failed for " IPOIB_GID_FMT ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), status); @@ -416,7 +417,7 @@ static int ipoib_mcast_join_complete(int ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), status); - } else { + } else if (status != -ENXIO) { ipoib_warn(priv, "multicast join failed for " IPOIB_GID_FMT ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), Index: b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-03-01 14:39:46.712444790 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-03-01 16:12:55.069541201 +0200 @@ -52,8 +52,10 @@ int ipoib_mcast_attach(struct net_device if (ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) { clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); ret = -ENXIO; + ipoib_dbg(priv, "pkey %X not found\n", priv->pkey); goto out; } + ipoib_dbg(priv, "pkey %X found at index %d\n", priv->pkey, pkey_index); set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); /* set correct QKey for QP */ @@ -260,7 +262,6 @@ void ipoib_event(struct ib_event_handler container_of(handler, struct ipoib_dev_priv, event_handler); if ((record->event == IB_EVENT_PORT_ERR || - record->event == IB_EVENT_PKEY_CHANGE || record->event == IB_EVENT_PORT_ACTIVE || record->event == IB_EVENT_LID_CHANGE || record->event == IB_EVENT_SM_CHANGE || @@ -268,5 +269,9 @@ void ipoib_event(struct ib_event_handler record->element.port_num == priv->port) { ipoib_dbg(priv, "Port state change event\n"); queue_work(ipoib_workqueue, &priv->flush_task); + } else if (record->event == IB_EVENT_PKEY_CHANGE && + record->element.port_num == priv->port) { + ipoib_dbg(priv, "pkey change event on port:%d\n", priv->port); + queue_work(ipoib_workqueue, &priv->restart_qp_task); } } From mst at mellanox.co.il Thu Mar 1 06:56:44 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Mar 2007 16:56:44 +0200 Subject: [ofa-general] Re: [PATCHv3] IB/ipoib: Fix ipoib handling for pkey reordering In-Reply-To: <45E6E7A0.7070902@voltaire.com> References: <45E6E7A0.7070902@voltaire.com> Message-ID: <20070301145644.GL14282@mellanox.co.il> > SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence > does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP. Please limit line length to 80 chars or so. Also - does the patch attempt to fix some more issues? > Index: b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > =================================================================== > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-01 14:11:43.713304355 +0200 > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-01 16:14:17.003881103 +0200 > @@ -415,21 +415,22 @@ int ipoib_ib_dev_open(struct net_device > > ret = ipoib_init_qp(dev); > if (ret) { > - ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); > + if (ret != -ENOENT) > + ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); > return -1; > } What's the reason for this change? > Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > =================================================================== > --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-01 14:11:43.743299033 +0200 > +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-01 16:21:43.128181147 +0200 > @@ -232,9 +232,10 @@ static int ipoib_mcast_join_finish(struc > ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), > &mcast->mcmember.mgid); > if (ret < 0) { > - ipoib_warn(priv, "couldn't attach QP to multicast group " > - IPOIB_GID_FMT "\n", > - IPOIB_GID_ARG(mcast->mcmember.mgid)); > + if (ret != -ENXIO) /* No pkey found */ > + ipoib_warn(priv, "couldn't attach QP to multicast group " > + IPOIB_GID_FMT "\n", > + IPOIB_GID_ARG(mcast->mcmember.mgid)); > > clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags); > return ret; And this? Thanks, MST -- MST From mst at mellanox.co.il Thu Mar 1 07:48:11 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Mar 2007 17:48:11 +0200 Subject: [ofa-general] librdmacm - sysfs dependency Message-ID: <20070301154811.GA18128@mellanox.co.il> Sean, librdmacm currently depends on libsys. This is a deprecated library not installed by default on some distros, and not even available for others. Could this dependency be removed please? libsysfs really adds very little in the way of functionality. I would like this fix to go into OFED 1.2 so that OFED does not have this dependency. I opened bug 407 to track this. -- MST From tziporet at mellanox.co.il Thu Mar 1 08:05:05 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 01 Mar 2007 18:05:05 +0200 Subject: [ofa-general] Re: ofa_1_2_kernel 20070301-0200 daily build status In-Reply-To: <20070301112917.GE14282@mellanox.co.il> References: <20070301103550.939ECE60826@openfabrics.org> <20070301112917.GE14282@mellanox.co.il> Message-ID: <45E6F9B1.30501@mellanox.co.il> Michael S. Tsirkin wrote: > Build failed on x86_64 with linux-2.6.5-7.244-smp > > RDS fails on SLES9SP3. Vlad? > RDS is not supporting SLES9SP3 Tziporet From tziporet at mellanox.co.il Thu Mar 1 08:10:39 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 01 Mar 2007 18:10:39 +0200 Subject: [ofa-general] IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! In-Reply-To: References: <200702281350.03788.hnguyen@linux.vnet.ibm.com> Message-ID: <45E6FAFF.4010502@mellanox.co.il> Roland Dreier wrote: > I guess the solution is to merge IPoIB NAPI to avoid overloading the > system with interrupts. I'll fix up a few last things with my NAPI > patch and we can try to get it in shape to merge for 2.6.22. > But all this discussion is not related to the issue I saw Tziporet From swise at opengridcomputing.com Thu Mar 1 08:18:40 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Mar 2007 10:18:40 -0600 Subject: [ofa-general] Re: ofa_1_2_kernel 20070301-0200 daily build status In-Reply-To: <20070301112917.GE14282@mellanox.co.il> References: <20070301103550.939ECE60826@openfabrics.org> <20070301112917.GE14282@mellanox.co.il> Message-ID: <1172765920.25089.14.camel@stevo-desktop> I'll let you know by the end of the day. If I can get it done quickly I'll post a patch. Otherwise will disable cxgb3 on U2/U3. On Thu, 2007-03-01 at 13:29 +0200, Michael S. Tsirkin wrote: > Build failed on x86_64 with linux-2.6.5-7.244-smp > > Log: > > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_exit': > > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:468: error: implicit declaration of function 'proto_unregister' > > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_init': > > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:517: error: implicit declaration of function 'proto_register' > > make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.o] Error 1 > > make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds] Error 2 > > make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.5-7.244-smp_x86_64_check] Error 2 > > make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.5-7.244-smp' > > make: *** [kernel] Error 2 > > ---------------------------------------------------------------------------------- > > RDS fails on SLES9SP3. Vlad? > > > Build failed on x86_64 with linux-2.6.9-22.ELsmp > > Log: > > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: ???ADVERTISE_PAUSE_CAP??? undeclared (first use in this function) > > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once > > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.) > > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: ???ADVERTISE_PAUSE_ASYM??? undeclared (first use in this function) > > make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1 > > make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 > > make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2 > > make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp' > > make: *** [kernel] Error 2 > > ---------------------------------------------------------------------------------- > > Build failed on x86_64 with linux-2.6.9-34.ELsmp > > Log: > > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ???add_adapter???: > > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: ???adapter_list_lock??? undeclared (first use in this function) > > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ???remove_adapter???: > > /home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: ???adapter_list_lock??? undeclared (first use in this function) > > make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1 > > make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2 > > make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070301-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2 > > make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp' > > make: *** [kernel] Error 2 > > Chelsio fails on RHELU2 and RHELU3. Steve, do you intend to fix this > or should it be disabled on these kernels? > From roland.list at gmail.com Thu Mar 1 08:27:01 2007 From: roland.list at gmail.com (Roland Dreier) Date: Thu, 1 Mar 2007 08:27:01 -0800 Subject: [ofa-general] IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! In-Reply-To: <45E6FAFF.4010502@mellanox.co.il> References: <200702281350.03788.hnguyen@linux.vnet.ibm.com> <45E6FAFF.4010502@mellanox.co.il> Message-ID: > > I guess the solution is to merge IPoIB NAPI to avoid overloading the > > system with interrupts. I'll fix up a few last things with my NAPI > > patch and we can try to get it in shape to merge for 2.6.22. > > > But all this discussion is not related to the issue I saw Oh you're right. In your original message it looks like you found a refcounting bug or something with IPoIB-CM. MST, any idea? - R. From monil at voltaire.com Thu Mar 1 08:34:29 2007 From: monil at voltaire.com (Moni Levy) Date: Thu, 1 Mar 2007 18:34:29 +0200 Subject: [ofa-general] Re: [PATCHv3] IB/ipoib: Fix ipoib handling for pkey reordering In-Reply-To: <20070301145644.GL14282@mellanox.co.il> References: <45E6E7A0.7070902@voltaire.com> <20070301145644.GL14282@mellanox.co.il> Message-ID: <6a122cc00703010834v43c67cbxaf27b1c30678875d@mail.gmail.com> On 3/1/07, Michael S. Tsirkin wrote: > > SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence > > does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP. > > Please limit line length to 80 chars or so. Ok, sorry for that, will pay attention in the future. > > Also - does the patch attempt to fix some more issues? No > > > Index: b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > > =================================================================== > > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-01 14:11:43.713304355 +0200 > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-01 16:14:17.003881103 +0200 > > @@ -415,21 +415,22 @@ int ipoib_ib_dev_open(struct net_device > > > > ret = ipoib_init_qp(dev); > > if (ret) { > > - ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); > > + if (ret != -ENOENT) > > + ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); > > return -1; > > } > > What's the reason for this change? -ENOENT means the pkey was not found which is not anymore something problematic and we have a clear message about that now in other place. > > > Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > > =================================================================== > > --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-01 14:11:43.743299033 +0200 > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-01 16:21:43.128181147 +0200 > > @@ -232,9 +232,10 @@ static int ipoib_mcast_join_finish(struc > > ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), > > &mcast->mcmember.mgid); > > if (ret < 0) { > > - ipoib_warn(priv, "couldn't attach QP to multicast group " > > - IPOIB_GID_FMT "\n", > > - IPOIB_GID_ARG(mcast->mcmember.mgid)); > > + if (ret != -ENXIO) /* No pkey found */ > > + ipoib_warn(priv, "couldn't attach QP to multicast group " > > + IPOIB_GID_FMT "\n", > > + IPOIB_GID_ARG(mcast->mcmember.mgid)); > > > > clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags); > > return ret; > > And this? The same as above. I tried to limit the prints to the "straight to the point" minimum. -- Moni > > Thanks, > MST > -- > MST > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sean.hefty at intel.com Thu Mar 1 08:38:36 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 1 Mar 2007 08:38:36 -0800 Subject: [ofa-general] RE: librdmacm - sysfs dependency In-Reply-To: <20070301154811.GA18128@mellanox.co.il> Message-ID: <000001c75c20$0f7fefa0$49fd070a@amr.corp.intel.com> >librdmacm currently depends on libsys. Dependency on sysfs was removed a while ago. Where do you see that the dependency still exists? - Sean From swise at opengridcomputing.com Thu Mar 1 08:54:30 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Mar 2007 10:54:30 -0600 Subject: [ofa-general] [PATCH] Chelsio RHEL4 U2/U3 Support. Message-ID: <1172768070.25089.18.camel@stevo-desktop> Vlad, This patch fixes the compile problems with Chelsio cxgb3 on RHEL4 U2/U3. You can pull this change from: git://staging.openfabrics.org/~swise/ofed_1_2 rhel4u2-3 Thanks, Steve. --------------------- Chelsio RHEL4 U2/U3 Support. Get them to compile. Signed-off-by: Steve Wise --- .../backport/2.6.9_U2/include/linux/mii.h | 13 +++++++++++++ .../backport/2.6.9_U2/include/linux/spinlock.h | 15 +++++++++++++++ .../backport/2.6.9_U3/include/linux/spinlock.h | 15 +++++++++++++++ 3 files changed, 43 insertions(+), 0 deletions(-) diff --git a/kernel_addons/backport/2.6.9_U2/include/linux/mii.h b/kernel_addons/backport/2.6.9_U2/include/linux/mii.h new file mode 100644 index 0000000..daaf8e1 --- /dev/null +++ b/kernel_addons/backport/2.6.9_U2/include/linux/mii.h @@ -0,0 +1,13 @@ +#ifndef BACKPORT_LINUX_MII_TO_SLES9SP3 +#define BACKPORT_LINUX_MII_TO_SLES9SP3 + +#include_next + +#define BMCR_SPEED1000 0x0040 /* MSB of Speed (1000) */ +#define ADVERTISE_PAUSE_CAP 0x0400 /* Try for pause */ +#define ADVERTISE_PAUSE_ASYM 0x0800 /* Try for asymetric pause */ +#define MII_CTRL1000 0x09 /* 1000BASE-T control */ +#define ADVERTISE_1000FULL 0x0200 /* Advertise 1000BASE-T full duplex */ +#define ADVERTISE_1000HALF 0x0100 /* Advertise 1000BASE-T half duplex */ + +#endif diff --git a/kernel_addons/backport/2.6.9_U2/include/linux/spinlock.h b/kernel_addons/backport/2.6.9_U2/include/linux/spinlock.h index 0d24ba3..8f92e6e 100644 --- a/kernel_addons/backport/2.6.9_U2/include/linux/spinlock.h +++ b/kernel_addons/backport/2.6.9_U2/include/linux/spinlock.h @@ -3,9 +3,24 @@ #define BACKPORT_LINUX_SPINLOCK_H #include_next #define DEFINE_SPINLOCK(x) spinlock_t x = SPIN_LOCK_UNLOCKED +#define DEFINE_RWLOCK(x) rwlock_t x = RW_LOCK_UNLOCKED + +#define spin_trylock_irqsave(lock, flags) \ +({ \ + local_irq_save(flags); \ + spin_trylock(lock) ? \ + 1 : ({local_irq_restore(flags); 0;}); \ +}) #define spin_lock_nested(lock, subclass) spin_lock(lock) +#define spin_trylock_irq(lock) \ +({ \ + local_irq_disable(); \ + spin_trylock(lock) ? \ + 1 : ({ local_irq_enable(); 0; }); \ +}) + #define assert_spin_locked(lock) do { (void)(lock); } while(0) #endif diff --git a/kernel_addons/backport/2.6.9_U3/include/linux/spinlock.h b/kernel_addons/backport/2.6.9_U3/include/linux/spinlock.h index 0d24ba3..8f92e6e 100644 --- a/kernel_addons/backport/2.6.9_U3/include/linux/spinlock.h +++ b/kernel_addons/backport/2.6.9_U3/include/linux/spinlock.h @@ -3,9 +3,24 @@ #define BACKPORT_LINUX_SPINLOCK_H #include_next #define DEFINE_SPINLOCK(x) spinlock_t x = SPIN_LOCK_UNLOCKED +#define DEFINE_RWLOCK(x) rwlock_t x = RW_LOCK_UNLOCKED + +#define spin_trylock_irqsave(lock, flags) \ +({ \ + local_irq_save(flags); \ + spin_trylock(lock) ? \ + 1 : ({local_irq_restore(flags); 0;}); \ +}) #define spin_lock_nested(lock, subclass) spin_lock(lock) +#define spin_trylock_irq(lock) \ +({ \ + local_irq_disable(); \ + spin_trylock(lock) ? \ + 1 : ({ local_irq_enable(); 0; }); \ +}) + #define assert_spin_locked(lock) do { (void)(lock); } while(0) #endif From vlad at mellanox.co.il Thu Mar 1 09:16:34 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 01 Mar 2007 19:16:34 +0200 Subject: [ofa-general] [PATCH] Chelsio RHEL4 U2/U3 Support. In-Reply-To: <1172768070.25089.18.camel@stevo-desktop> References: <1172768070.25089.18.camel@stevo-desktop> Message-ID: <1172769394.17950.66.camel@vladsk-laptop> On Thu, 2007-03-01 at 10:54 -0600, Steve Wise wrote: > Vlad, > > This patch fixes the compile problems with Chelsio cxgb3 on RHEL4 U2/U3. > > You can pull this change from: > > git://staging.openfabrics.org/~swise/ofed_1_2 rhel4u2-3 > > Thanks, > > Steve. > > --------------------- Applied. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From mst at mellanox.co.il Thu Mar 1 09:40:17 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Mar 2007 19:40:17 +0200 Subject: [ofa-general] Re: ofa_1_2_kernel 20070301-0200 daily build status In-Reply-To: <45E6F9B1.30501@mellanox.co.il> References: <20070301103550.939ECE60826@openfabrics.org> <20070301112917.GE14282@mellanox.co.il> <45E6F9B1.30501@mellanox.co.il> Message-ID: <20070301174017.GB19828@mellanox.co.il> > Quoting Tziporet Koren : > Subject: Re: [ofa-general] Re: ofa_1_2_kernel 20070301-0200 daily build status > > Michael S. Tsirkin wrote: > > Build failed on x86_64 with linux-2.6.5-7.244-smp > > > > RDS fails on SLES9SP3. Vlad? > > > > RDS is not supporting SLES9SP3 Let's not try building it there then? -- MST From mst at mellanox.co.il Thu Mar 1 09:43:32 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Mar 2007 19:43:32 +0200 Subject: [ofa-general] Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! In-Reply-To: References: <200702281350.03788.hnguyen@linux.vnet.ibm.com> <45E6FAFF.4010502@mellanox.co.il> Message-ID: <20070301174332.GC19828@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! > > >> I guess the solution is to merge IPoIB NAPI to avoid overloading the > >> system with interrupts. I'll fix up a few last things with my NAPI > >> patch and we can try to get it in shape to merge for 2.6.22. > >> > >But all this discussion is not related to the issue I saw > > Oh you're right. In your original message it looks like you found a > refcounting bug or > something with IPoIB-CM. MST, any idea? Not yet. -- MST From swise at opengridcomputing.com Thu Mar 1 11:01:43 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Mar 2007 13:01:43 -0600 Subject: [ofa-general] bug 399 - cannot install kernel mods on 2.6.20.1 Message-ID: <1172775703.5983.1.camel@stevo-desktop> Any progress on this? This would seem to be a blocker for entering beta... >From 399: ---------------------------- I built the ofed 1.2 rpms from the OFED-1.2-20070227-0602 build and the kernel rpm fails to install on a 2.6.20.1 kernel: vic13:/usr/local/src/OFED-1.2-20070227-0602/RPMS/sles-release-10-15.2 # rpm -U kernel-ib-1.2-2.6.20.1.x86_64.rpm error: Failed dependencies: ksym(schedule) = 1000e51 is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(__up_wakeup) = 1042cbb5 is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(pci_request_region) = 10cc2981 is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(skb_dequeue) = 10fc721b is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(mod_timer) = 14777d07 is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(remap_pfn_range) = 155834a8 is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(unregister_netevent_notifier) = 1598dc9d is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(bad_dma_address) = 1675606f is needed by kernel-ib-1.2-2.6.20.1.x86_64 ksym(dev_get_by_name) = 16ab1a6b is needed by kernel-ib-1.2-2.6.20.1.x86_64 ... From vlad at mellanox.co.il Thu Mar 1 11:17:32 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 01 Mar 2007 21:17:32 +0200 Subject: [ofa-general] Re: bug 399 - cannot install kernel mods on 2.6.20.1 In-Reply-To: <1172775703.5983.1.camel@stevo-desktop> References: <1172775703.5983.1.camel@stevo-desktop> Message-ID: <1172776652.17950.72.camel@vladsk-laptop> On Thu, 2007-03-01 at 13:01 -0600, Steve Wise wrote: > Any progress on this? > > This would seem to be a blocker for entering beta... > This issue happens with kernels that were installed from sources (not as RPM). OFED installation script use "rpm -i --force --nodeps kernel-ib-..." to handle this issue. Regards, Vladimir > > > >From 399: > ---------------------------- > > I built the ofed 1.2 rpms from the OFED-1.2-20070227-0602 build and the > kernel rpm fails to install on a 2.6.20.1 kernel: > > vic13:/usr/local/src/OFED-1.2-20070227-0602/RPMS/sles-release-10-15.2 # rpm -U > kernel-ib-1.2-2.6.20.1.x86_64.rpm > error: Failed dependencies: > ksym(schedule) = 1000e51 is needed by kernel-ib-1.2-2.6.20.1.x86_64 > ksym(__up_wakeup) = 1042cbb5 is needed by kernel-ib-1.2-2.6.20.1.x86_64 > ksym(pci_request_region) = 10cc2981 is needed by > kernel-ib-1.2-2.6.20.1.x86_64 > ksym(skb_dequeue) = 10fc721b is needed by kernel-ib-1.2-2.6.20.1.x86_64 > ksym(mod_timer) = 14777d07 is needed by kernel-ib-1.2-2.6.20.1.x86_64 > ksym(remap_pfn_range) = 155834a8 is needed by > kernel-ib-1.2-2.6.20.1.x86_64 > ksym(unregister_netevent_notifier) = 1598dc9d is needed by > kernel-ib-1.2-2.6.20.1.x86_64 > ksym(bad_dma_address) = 1675606f is needed by > kernel-ib-1.2-2.6.20.1.x86_64 > ksym(dev_get_by_name) = 16ab1a6b is needed by > kernel-ib-1.2-2.6.20.1.x86_64 > > ... > From afriedle at indiana.edu Thu Mar 1 11:27:43 2007 From: afriedle at indiana.edu (Andrew Friedley) Date: Thu, 01 Mar 2007 14:27:43 -0500 Subject: [ofa-general] [PATCH] Chelsio RHEL4 U2/U3 Support. In-Reply-To: <1172768070.25089.18.camel@stevo-desktop> References: <1172768070.25089.18.camel@stevo-desktop> Message-ID: <45E7292F.1020700@indiana.edu> Steve Wise wrote: > Vlad, > > This patch fixes the compile problems with Chelsio cxgb3 on RHEL4 U2/U3. Looks like I have the problem for this fix on U4 as well, could this be applied there too? Andrew From mst at mellanox.co.il Thu Mar 1 11:30:36 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Mar 2007 21:30:36 +0200 Subject: [ofa-general] Re: [PATCH] Chelsio RHEL4 U2/U3 Support. In-Reply-To: <45E7292F.1020700@indiana.edu> References: <1172768070.25089.18.camel@stevo-desktop> <45E7292F.1020700@indiana.edu> Message-ID: <20070301193036.GB23870@mellanox.co.il> > Quoting r. Andrew Friedley : > Subject: Re: [PATCH] Chelsio RHEL4 U2/U3 Support. > > Steve Wise wrote: > >Vlad, > > > >This patch fixes the compile problems with Chelsio cxgb3 on RHEL4 U2/U3. > > Looks like I have the problem for this fix on U4 as well, could this be > applied there too? what's the problem? -- MST From afriedle at open-mpi.org Thu Mar 1 11:48:49 2007 From: afriedle at open-mpi.org (Andrew Friedley) Date: Thu, 01 Mar 2007 14:48:49 -0500 Subject: [ofa-general] Re: [PATCH] Chelsio RHEL4 U2/U3 Support. In-Reply-To: <20070301193036.GB23870@mellanox.co.il> References: <1172768070.25089.18.camel@stevo-desktop> <45E7292F.1020700@indiana.edu> <20070301193036.GB23870@mellanox.co.il> Message-ID: <45E72E21.2000600@open-mpi.org> Michael S. Tsirkin wrote: >>> This patch fixes the compile problems with Chelsio cxgb3 on RHEL4 U2/U3. >> Looks like I have the problem for this fix on U4 as well, could this be >> applied there too? > > what's the problem? > My mistake -- I was told this was a U4 system, but the OFED install appears to be using the U3 backport patches. However the error I see is different from the nightly automated build, though related: /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:56: error: syntax error before "adapter_list_lock" /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:56: warning: type defaults to `int' in declaration of `adapter_list_lock' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:56: error: incompatible types in initialization /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:56: error: initializer element is not constant /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:56: warning: data definition has no type or storage class /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c: In function `is_offloading': /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:884: warning: passing arg 1 of `_read_lock_bh' from incompatible pointer type /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:888: warning: passing arg 1 of `_read_unlock_bh' from incompatible pointer type /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:893: warning: passing arg 1 of `_read_unlock_bh' from incompatible pointer type /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c: In function `add_adapter': /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1061: warning: passing arg 1 of `_write_lock_bh' from incompatible pointer type /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1063: warning: passing arg 1 of `_write_unlock_bh' from incompatible pointer type /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c: In function `remove_adapter': /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1068: warning: passing arg 1 of `_write_lock_bh' from incompatible pointer type /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1070: warning: passing arg 1 of `_write_unlock_bh' from incompatible pointer type From rdreier at cisco.com Thu Mar 1 12:49:02 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 01 Mar 2007 12:49:02 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: (Shirley Ma's message of "Tue, 27 Feb 2007 16:59:23 -0700") References: Message-ID: Hmm, I'm a little worried about this. Could setting netif_carrier_on() before all multicast groups are joined cause problems for IPv6 address autoconfiguration and duplicate address detection? In other words a node might end up choosing a duplicate address because it sends ND messages that don't reach the correct destination. (Also, no signed-off-by line with your patch) - R. From rdreier at cisco.com Thu Mar 1 13:04:05 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 01 Mar 2007 13:04:05 -0800 Subject: [ofa-general] Re: [PATCH 2.6.21-rc2] ehca: fix mismatched sync between completion handler and destroy cq In-Reply-To: <200702281801.02747.hnguyen@linux.vnet.ibm.com> (Hoang-Nam Nguyen's message of "Wed, 28 Feb 2007 18:01:02 +0100") References: <200702281801.02747.hnguyen@linux.vnet.ibm.com> Message-ID: Looks OK to me, queued for 2.6.21, except: > +#include This can just be , because you're only using wait_queue_head_t and not struct completion, right? I fixed this up before merging. From rdreier at cisco.com Thu Mar 1 13:12:07 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 01 Mar 2007 13:12:07 -0800 Subject: [ofa-general] Re: [PATCHv2 for-2.6.21] IB/mthca: QP reset fixes In-Reply-To: <20070301144139.GK14282@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 1 Mar 2007 16:41:39 +0200") References: <20070301140236.GI14282@mellanox.co.il> <20070301144139.GK14282@mellanox.co.il> Message-ID: Yes, this definitely looks like a needed fix. However: > if (qp->ibqp.send_cq != qp->ibqp.recv_cq) > mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, NULL); > > + if (dev->mthca_flags & MTHCA_FLAG_MSI_X) { > + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_COMP].msi_x_vector); > + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_ASYNC].msi_x_vector); > + } else > + synchronize_irq(dev->pdev->irq); I'm not quite sure I understand why we have to synchronize against the completion EQ's interrupt here, but I assume it's to make sure that no more CQEs are written that come from this QP's work queues. Is that definitely necessary? The hardware that writes CQEs isn't synchronized with the 2RST firmware command? Anyway, assuming that we do need to synchronize with the completion interrupt, then the order looks suspicious above -- we clean out CQEs from CQs attached to the QP and then synchronize with the interrupt, which appears to leave a window where a new CQE could be written and not end up getting cleaned. (and the same thing seems to apply to the changes to mthca_free_cq() also). - R. From halr at voltaire.com Thu Mar 1 13:12:51 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Mar 2007 16:12:51 -0500 Subject: [ofa-general] Re: [PATCH] opensm: switch pre-routing preparation status check In-Reply-To: <20070228212103.GE30973@sashak.voltaire.com> References: <20070228212103.GE30973@sashak.voltaire.com> Message-ID: <1172783556.31770.179721.camel@hal.voltaire.com> On Wed, 2007-02-28 at 16:21, Sasha Khapyorsky wrote: > osm_switch_prepare_path_rebuild() will return status value now, it is > needed in order to track switch pre-routing preparation properly. Also > tiny p_sw->hops rework for potentially lockless p_sw->hops accessing. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to both master and ofed_1_2). -- Hal From rdreier at cisco.com Thu Mar 1 13:25:02 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 01 Mar 2007 13:25:02 -0800 Subject: [ofa-general] [PATCH] IB/mthca: Fix error path in mthca_alloc_memfree() Message-ID: The garbled logic in mthca_alloc_memfree() causes it to return 0, even if it fails to allocate all doorbell records. Fix it to return -ENOMEM when it fails. Signed-off-by: Roland Dreier --- I found this while looking for something else. Anyway, I'll queue it for 2.6.21 unless someone sees a mistake... drivers/infiniband/hw/mthca/mthca_qp.c | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 71dc84b..1c6b63a 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -1088,21 +1088,21 @@ static void mthca_unmap_memfree(struct mthca_dev *dev, static int mthca_alloc_memfree(struct mthca_dev *dev, struct mthca_qp *qp) { - int ret = 0; - if (mthca_is_memfree(dev)) { qp->rq.db_index = mthca_alloc_db(dev, MTHCA_DB_TYPE_RQ, qp->qpn, &qp->rq.db); if (qp->rq.db_index < 0) - return ret; + return -ENOMEM; qp->sq.db_index = mthca_alloc_db(dev, MTHCA_DB_TYPE_SQ, qp->qpn, &qp->sq.db); - if (qp->sq.db_index < 0) + if (qp->sq.db_index < 0) { mthca_free_db(dev, MTHCA_DB_TYPE_RQ, qp->rq.db_index); + return -ENOMEM; + } } - return ret; + return 0; } static void mthca_free_memfree(struct mthca_dev *dev, -- 1.5.0.1 From swise at opengridcomputing.com Thu Mar 1 13:25:40 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Mar 2007 15:25:40 -0600 Subject: [ofa-general] Re: [PATCH] Chelsio RHEL4 U2/U3 Support. In-Reply-To: <45E72E21.2000600@open-mpi.org> References: <1172768070.25089.18.camel@stevo-desktop> <45E7292F.1020700@indiana.edu> <20070301193036.GB23870@mellanox.co.il> <45E72E21.2000600@open-mpi.org> Message-ID: <1172784340.5983.8.camel@stevo-desktop> It should work now with the latest ofed_1_2 tree... On Thu, 2007-03-01 at 14:48 -0500, Andrew Friedley wrote: > > Michael S. Tsirkin wrote: > > >>> This patch fixes the compile problems with Chelsio cxgb3 on RHEL4 U2/U3. > >> Looks like I have the problem for this fix on U4 as well, could this be > >> applied there too? > > > > what's the problem? > > > > My mistake -- I was told this was a U4 system, but the OFED install > appears to be using the U3 backport patches. However the error I see is > different from the nightly automated build, though related: > > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:56: > error: syntax error before "adapter_list_lock" > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:56: > warning: type defaults to `int' in declaration of `adapter_list_lock' > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:56: > error: incompatible types in initialization > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:56: > error: initializer element is not constant > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:56: > warning: data definition has no type or storage class > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c: > In function `is_offloading': > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:884: > warning: passing arg 1 of `_read_lock_bh' from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:888: > warning: passing arg 1 of `_read_unlock_bh' from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:893: > warning: passing arg 1 of `_read_unlock_bh' from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c: > In function `add_adapter': > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1061: > warning: passing arg 1 of `_write_lock_bh' from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1063: > warning: passing arg 1 of `_write_unlock_bh' from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c: > In function `remove_adapter': > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1068: > warning: passing arg 1 of `_write_lock_bh' from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1070: > warning: passing arg 1 of `_write_unlock_bh' from incompatible pointer type > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Thu Mar 1 13:38:21 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 01 Mar 2007 13:38:21 -0800 Subject: [ofa-general] [PATCH, RFC] libibverbs: Add hooks for rereg_mr, memory windows Message-ID: Here's a patch that I came up with that I would like to add to the libibverbs 1.1 branch. This would give us the opportunity to add reregister MR and memory window operations to libibverbs 1.1 after the 1.1.0 release without breaking any ABI (beyond adding the actual entry points, which are easy for apps to deal with by using compile-time checks). If we mess up and forget something, it's not that big a deal, but we'll have to wait until the next ABI break (ie libibverbs 1.2) to add the extra functions, so I'd like to try and do a good job. So does anyone see anything obviously missing or wrong here? Thanks... diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index 49cd581..a9c5fde 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -1,7 +1,7 @@ /* * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. * Copyright (c) 2004 Intel Corporation. All rights reserved. - * Copyright (c) 2005, 2006 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2005, 2006, 2007 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2005 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two @@ -288,6 +288,13 @@ struct ibv_pd { uint32_t handle; }; +enum ibv_rereg_mr_flags { + IBV_REREG_MR_CHANGE_TRANSLATION = (1 << 0), + IBV_REREG_MR_CHANGE_PD = (1 << 1), + IBV_REREG_MR_CHANGE_ACCESS = (1 << 2), + IBV_REREG_MR_KEEP_VALID = (1 << 3) +}; + struct ibv_mr { struct ibv_context *context; struct ibv_pd *pd; @@ -298,6 +305,17 @@ struct ibv_mr { uint32_t rkey; }; +enum ibv_mw_type { + IBV_MW_TYPE_1 = 1, + IBV_MW_TYPE_2 = 2 +}; + +struct ibv_mw { + struct ibv_context *context; + struct ibv_pd *pd; + uint32_t rkey; +}; + struct ibv_global_route { union ibv_gid dgid; uint32_t flow_label; @@ -517,6 +535,15 @@ struct ibv_recv_wr { int num_sge; }; +struct ibv_mw_bind { + struct ibv_mr *mr; + uint64_t wr_id; + uint64_t addr; + uint64_t length; + enum ibv_send_flags send_flags; + enum ibv_access_flags mw_access_flags; +}; + struct ibv_srq { struct ibv_context *context; void *srq_context; @@ -603,7 +630,16 @@ struct ibv_context_ops { int (*dealloc_pd)(struct ibv_pd *pd); struct ibv_mr * (*reg_mr)(struct ibv_pd *pd, void *addr, size_t length, enum ibv_access_flags access); + struct ibv_mr * (*rereg_mr)(struct ibv_mr *mr, + enum ibv_rereg_mr_flags flags, + struct ibv_pd *pd, void *addr, + size_t length, + enum ibv_access_flags access); int (*dereg_mr)(struct ibv_mr *mr); + struct ibv_mw * (*alloc_mw)(struct ibv_pd *pd, enum ibv_mw_type type); + int (*bind_mw)(struct ibv_qp *qp, struct ibv_mw *mw, + struct ibv_mw_bind *mw_bind); + int (*dealloc_mw)(struct ibv_mw *mw); struct ibv_cq * (*create_cq)(struct ibv_context *context, int cqe, struct ibv_comp_channel *channel, int comp_vector); From rdreier at cisco.com Thu Mar 1 13:55:28 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 01 Mar 2007 13:55:28 -0800 Subject: [ofa-general] Re: [PATCH] IB/mthca: recv poll cq optimization In-Reply-To: <20070228210235.GC8564@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 28 Feb 2007 23:02:35 +0200") References: <20070228210235.GC8564@mellanox.co.il> Message-ID: > All good recv work requests generate HW completions in FIFO order, so we can use > rq->tail rather than hardware data. In this way, we save a branch on data path > for recv completions (branch is still there for send completions). > > Roland, what do you think? This increases the overall code size but I think the > extra code is on the error CQE handling path. BTW, since most kernel QPs seem > not to use selective signaling, it might be worth it to optimize send > completions in a similiar way in case selective singaling is disabled on QP. Do you have any measurements that say this helps? Having a bigger I-cache footprint is really globally worse for the system, so I don't like this part: > + if (unlikely(is_error)) { > + if (!is_send && !(*cur_qp)->ibqp.srq) { > + s32 wqe = be32_to_cpu(cqe->wqe); > + wqe_index = wqe >> wq->wqe_shift; > + /* > + * WQE addr == base - 1 might be reported in receive completion > + * with error instead of (rq size - 1) by Sinai FW 1.0.800 and > + * Arbel FW 5.1.400. This bug should be fixed in later FW revs. > + */ > + if (unlikely(wqe_index < 0)) > + wqe_index = wq->max - 1; > + } > > - if (is_error) { is there any way to move that into handle_error_cqe() so that it's definitely out of the way for the normal path? In fact, why do we need this code at all with your change -- aren't RQ error completions reported in FIFO order too? I'm not sure that it's worth testing whether a SQ has selective signaling or not -- after all, that's just changing one conditional branch for another. And in fact, looking at the code, I think we could rewrite if (wq->last_comp < wqe_index) wq->tail += wqe_index - wq->last_comp; else wq->tail += wqe_index + wq->max - wq->last_comp; as just wq->tail += (wqe_index + wq->max - wq->last_comp) & (wq->max - 1); and avoid the conditional in a simpler way. (I've not checked that match but it looks right to me) - R. From ardavis at ichips.intel.com Thu Mar 1 14:18:23 2007 From: ardavis at ichips.intel.com (Arlin Davis) Date: Thu, 01 Mar 2007 14:18:23 -0800 Subject: [ofa-general] [PATCH] Add dapltest headers to Makefile.am In-Reply-To: <1172679526.21382.114.camel@vladsk-laptop> References: <1172679526.21382.114.camel@vladsk-laptop> Message-ID: <45E7512F.2020602@ichips.intel.com> Vladimir Sokolovsky wrote: >Hi Arlin, >The followin patch fix dapltest compilation after 'make dist': > >Add dapltest headers to EXTRA_DIST > >Signed-off-by: Vladimir Sokolovsky > > > Thanks, applied to both master and ofed_1_2 branches From swise at opengridcomputing.com Thu Mar 1 14:34:56 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Mar 2007 16:34:56 -0600 Subject: [ofa-general] librdmacm build failure Message-ID: <1172788496.5983.11.camel@stevo-desktop> Sean, whats up with LIBRDMACM_VERSION_SCRIPT? I'm using your latest librdmacm on a 2.6.21-rc2 system and I get this trying to build it: gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -g -O2 -MT src_librdmacm_la-cma.lo -MD -MP -MF .deps/src_librdmacm_la-cma.Tpo -c src/cma.c -fPIC -DPIC -o .libs/src_librdmacm_la-cma.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -g -O2 -MT src_librdmacm_la-cma.lo -MD -MP -MF .deps/src_librdmacm_la-cma.Tpo -c src/cma.c -o src_librdmacm_la-cma.o >/dev/null 2>&1 /bin/sh ./libtool --tag=CC --mode=link gcc -g -Wall -D_GNU_SOURCE -g -O2 -o src/librdmacm.la -rpath /usr/local/lib -version-info 1 -export-dynamic @LIBRDMACM_VERSION_SCRIPT@ src_librdmacm_la-cma.lo -libverbs mkdir src/.libs gcc -shared .libs/src_librdmacm_la-cma.o -libverbs @LIBRDMACM_VERSION_SCRIPT@ -Wl,-soname -Wl,librdmacm.so.1 -o src/.libs/librdmacm.so.1.0.0 gcc: @LIBRDMACM_VERSION_SCRIPT@: No such file or directory make[1]: *** [src/librdmacm.la] Error 1 make[1]: Leaving directory `/usr/local/src/git/librdmacm' make: *** [all] Error 2 vic14:/usr/local/src/git/librdmacm From mshefty at ichips.intel.com Thu Mar 1 14:41:39 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 01 Mar 2007 14:41:39 -0800 Subject: [ofa-general] [PATCH, RFC] libibverbs: Add hooks for rereg_mr, memory windows In-Reply-To: References: Message-ID: <45E756A3.1040905@ichips.intel.com> > So does anyone see anything obviously missing or wrong here? It looks fine to me with one minor comment. > +struct ibv_mw_bind { > + struct ibv_mr *mr; > + uint64_t wr_id; > + uint64_t addr; > + uint64_t length; The memory region uses size_t for length. Do we care about matching the data type? - Sean From xma at us.ibm.com Thu Mar 1 14:38:56 2007 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 1 Mar 2007 14:38:56 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: Message-ID: Roland Dreier wrote on 03/01/2007 12:49:02 PM: > Hmm, I'm a little worried about this. Could setting netif_carrier_on() > before all multicast groups are joined cause problems for IPv6 address > autoconfiguration and duplicate address detection? In other words a > node might end up choosing a duplicate address because it sends ND > messages that don't reach the correct destination. > > (Also, no signed-off-by line with your patch) > > - R. Setting carrier on should be OK here. ND can't send out before the interface is capable for sending any packet. And if ND messages can't reach the correct destination because of multicast join failure later then the IPv6 address is useless. I will add the signed-off-by line. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Thu Mar 1 14:50:36 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Mar 2007 16:50:36 -0600 Subject: [ofa-general] [PATCH 2.6.21 0/6] iw_cxgb3: Bug Fixes Message-ID: <20070301225036.2373.71668.stgit@dell3.ogc.int> Hey Roland, Here is a set of bug fixes for iw_cxgb3 that I'd like to roll into 2.6.21. Thanks, Steve. From swise at opengridcomputing.com Thu Mar 1 14:50:38 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Mar 2007 16:50:38 -0600 Subject: [ofa-general] [PATCH 2.6.21 1/6] iw_cxgb3: Fixes for "normal close" failures. In-Reply-To: <20070301225036.2373.71668.stgit@dell3.ogc.int> References: <20070301225036.2373.71668.stgit@dell3.ogc.int> Message-ID: <20070301225038.2373.13980.stgit@dell3.ogc.int> Fixes for "normal close" failures. - Start normal close timer when moving to CLOSING state. - Handle ABORTING state in close_con_rpl(). - Stop timer correctly on abort during a normal close. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 11 +++++++---- 1 files changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index b21fde8..1dcfedc 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -1415,6 +1415,7 @@ static int peer_close(struct t3cdev *tde wake_up(&ep->com.waitq); break; case FPDU_MODE: + start_ep_timer(ep); __state_set(&ep->com, CLOSING); attrs.next_state = IWCH_QP_STATE_CLOSING; iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, @@ -1425,7 +1426,6 @@ static int peer_close(struct t3cdev *tde disconnect = 0; break; case CLOSING: - start_ep_timer(ep); __state_set(&ep->com, MORIBUND); disconnect = 0; break; @@ -1507,9 +1507,10 @@ static int peer_abort(struct t3cdev *tde get_ep(&ep->com); break; case MORIBUND: + case CLOSING: stop_ep_timer(ep); + /*FALLTHROUGH*/ case FPDU_MODE: - case CLOSING: if (ep->com.cm_id && ep->com.qp) { attrs.next_state = IWCH_QP_STATE_ERROR; ret = iwch_modify_qp(ep->com.qp->rhp, @@ -1570,7 +1571,6 @@ static int close_con_rpl(struct t3cdev * spin_lock_irqsave(&ep->com.lock, flags); switch (ep->com.state) { case CLOSING: - start_ep_timer(ep); __state_set(&ep->com, MORIBUND); break; case MORIBUND: @@ -1586,6 +1586,8 @@ static int close_con_rpl(struct t3cdev * __state_set(&ep->com, DEAD); release = 1; break; + case ABORTING: + break; case DEAD: default: BUG_ON(1); @@ -1659,6 +1661,7 @@ static void ep_timeout(unsigned long arg break; case MPA_REQ_WAIT: break; + case CLOSING: case MORIBUND: if (ep->com.cm_id && ep->com.qp) { attrs.next_state = IWCH_QP_STATE_ERROR; @@ -1957,11 +1960,11 @@ int iwch_ep_disconnect(struct iwch_ep *e case MPA_REQ_RCVD: case MPA_REP_SENT: case FPDU_MODE: + start_ep_timer(ep); ep->com.state = CLOSING; close = 1; break; case CLOSING: - start_ep_timer(ep); ep->com.state = MORIBUND; close = 1; break; From swise at opengridcomputing.com Thu Mar 1 14:50:40 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Mar 2007 16:50:40 -0600 Subject: [ofa-general] [PATCH 2.6.21 2/6] iw_cxgb3: Move QP to error on destroy if the state is IDLE. In-Reply-To: <20070301225036.2373.71668.stgit@dell3.ogc.int> References: <20070301225036.2373.71668.stgit@dell3.ogc.int> Message-ID: <20070301225040.2373.75346.stgit@dell3.ogc.int> Move QP to error on destroy if the state is IDLE. Change iwch_destroy_qp() to always move the QP to ERROR and let iwch_modify_qp() decide what to do. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 6 ++---- 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 9947a14..a56c902 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -736,10 +736,8 @@ static int iwch_destroy_qp(struct ib_qp qhp = to_iwch_qp(ib_qp); rhp = qhp->rhp; - if (qhp->attr.state == IWCH_QP_STATE_RTS) { - attrs.next_state = IWCH_QP_STATE_ERROR; - iwch_modify_qp(rhp, qhp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 0); - } + attrs.next_state = IWCH_QP_STATE_ERROR; + iwch_modify_qp(rhp, qhp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 0); wait_event(qhp->wait, !qhp->ep); remove_handle(rhp, &rhp->qpidr, qhp->wq.qpid); From swise at opengridcomputing.com Thu Mar 1 14:50:42 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Mar 2007 16:50:42 -0600 Subject: [ofa-general] [PATCH 2.6.21 3/6] iw_cxgb3: Stop the endpoint timer when the MPA exchange is aborted by the peer. In-Reply-To: <20070301225036.2373.71668.stgit@dell3.ogc.int> References: <20070301225036.2373.71668.stgit@dell3.ogc.int> Message-ID: <20070301225042.2373.68022.stgit@dell3.ogc.int> Stop the endpoint timer when the MPA exchange is aborted by the peer. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index 1dcfedc..8e6f6df 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -1487,8 +1487,10 @@ static int peer_abort(struct t3cdev *tde case CONNECTING: break; case MPA_REQ_WAIT: + stop_ep_timer(ep); break; case MPA_REQ_SENT: + stop_ep_timer(ep); connect_reply_upcall(ep, -ECONNRESET); break; case MPA_REP_SENT: From swise at opengridcomputing.com Thu Mar 1 14:50:44 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Mar 2007 16:50:44 -0600 Subject: [ofa-general] [PATCH 2.6.21 4/6] iw_cxgb3: Squelch logging AE errors. In-Reply-To: <20070301225036.2373.71668.stgit@dell3.ogc.int> References: <20070301225036.2373.71668.stgit@dell3.ogc.int> Message-ID: <20070301225044.2373.12275.stgit@dell3.ogc.int> Squelch logging AE errors. Only post one AE error for a given connection in the kernel log. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_ev.c | 12 ++++++------ 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_ev.c b/drivers/infiniband/hw/cxgb3/iwch_ev.c index 54362af..b406766 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_ev.c +++ b/drivers/infiniband/hw/cxgb3/iwch_ev.c @@ -47,12 +47,6 @@ static void post_qp_event(struct iwch_de struct iwch_qp_attributes attrs; struct iwch_qp *qhp; - printk(KERN_ERR "%s - AE qpid 0x%x opcode %d status 0x%x " - "type %d wrid.hi 0x%x wrid.lo 0x%x \n", __FUNCTION__, - CQE_QPID(rsp_msg->cqe), CQE_OPCODE(rsp_msg->cqe), - CQE_STATUS(rsp_msg->cqe), CQE_TYPE(rsp_msg->cqe), - CQE_WRID_HI(rsp_msg->cqe), CQE_WRID_LOW(rsp_msg->cqe)); - spin_lock(&rnicp->lock); qhp = get_qhp(rnicp, CQE_QPID(rsp_msg->cqe)); @@ -73,6 +67,12 @@ static void post_qp_event(struct iwch_de return; } + printk(KERN_ERR "%s - AE qpid 0x%x opcode %d status 0x%x " + "type %d wrid.hi 0x%x wrid.lo 0x%x \n", __FUNCTION__, + CQE_QPID(rsp_msg->cqe), CQE_OPCODE(rsp_msg->cqe), + CQE_STATUS(rsp_msg->cqe), CQE_TYPE(rsp_msg->cqe), + CQE_WRID_HI(rsp_msg->cqe), CQE_WRID_LOW(rsp_msg->cqe)); + atomic_inc(&qhp->refcnt); spin_unlock(&rnicp->lock); From swise at opengridcomputing.com Thu Mar 1 14:50:46 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Mar 2007 16:50:46 -0600 Subject: [ofa-general] [PATCH 2.6.21 5/6] iw_cxgb3: Don't reuse skbuffs that are non-linear or cloned. In-Reply-To: <20070301225036.2373.71668.stgit@dell3.ogc.int> References: <20070301225036.2373.71668.stgit@dell3.ogc.int> Message-ID: <20070301225046.2373.21370.stgit@dell3.ogc.int> Don't reuse skbuffs that are non-linear or cloned. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index 8e6f6df..fd2f3ca 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -305,8 +305,7 @@ static int status2errno(int status) */ static struct sk_buff *get_skb(struct sk_buff *skb, int len, gfp_t gfp) { - if (skb) { - BUG_ON(skb_cloned(skb)); + if (skb && !skb_is_nonlinear(skb) && !skb_cloned(skb)) { skb_trim(skb, 0); skb_get(skb); } else { From swise at opengridcomputing.com Thu Mar 1 14:50:48 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Mar 2007 16:50:48 -0600 Subject: [ofa-general] [PATCH 2.6.21 6/6] iw_cxgb3: Fix MR permission problems. In-Reply-To: <20070301225036.2373.71668.stgit@dell3.ogc.int> References: <20070301225036.2373.71668.stgit@dell3.ogc.int> Message-ID: <20070301225048.2373.91667.stgit@dell3.ogc.int> Fix MR permission problems. - remove useless and redundant iwch_mem_perms enum. - create ib_to_tpt_access_rights() for mapping ib access rights to T3 TPT permissions. - create ib_to_mwbind_access_rights() for mapping ib access rights to T3 MWBIND WR permissions. - fix up the mem reg code to utilize the new functions. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 24 +++----------------- drivers/infiniband/hw/cxgb3/iwch_provider.h | 33 +++++++++++---------------- drivers/infiniband/hw/cxgb3/iwch_qp.c | 2 +- 3 files changed, 18 insertions(+), 41 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index a56c902..4af1c0f 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -463,9 +463,6 @@ static struct ib_mr *iwch_register_phys_ php = to_iwch_pd(pd); rhp = php->rhp; - acc = iwch_convert_access(acc); - - mhp = kzalloc(sizeof(*mhp), GFP_KERNEL); if (!mhp) return ERR_PTR(-ENOMEM); @@ -491,12 +488,7 @@ static struct ib_mr *iwch_register_phys_ mhp->attr.pdid = php->pdid; mhp->attr.zbva = 0; - /* NOTE: TPT perms are backwards from BIND WR perms! */ - mhp->attr.perms = (acc & 0x1) << 3; - mhp->attr.perms |= (acc & 0x2) << 1; - mhp->attr.perms |= (acc & 0x4) >> 1; - mhp->attr.perms |= (acc & 0x8) >> 3; - + mhp->attr.perms = iwch_ib_to_tpt_access(acc); mhp->attr.va_fbo = *iova_start; mhp->attr.page_size = shift - 12; @@ -525,7 +517,6 @@ static int iwch_reregister_phys_mem(stru struct iwch_mr mh, *mhp; struct iwch_pd *php; struct iwch_dev *rhp; - int new_acc; __be64 *page_list = NULL; int shift = 0; u64 total_size; @@ -546,14 +537,12 @@ static int iwch_reregister_phys_mem(stru if (rhp != php->rhp) return -EINVAL; - new_acc = mhp->attr.perms; - memcpy(&mh, mhp, sizeof *mhp); if (mr_rereg_mask & IB_MR_REREG_PD) php = to_iwch_pd(pd); if (mr_rereg_mask & IB_MR_REREG_ACCESS) - mh.attr.perms = iwch_convert_access(acc); + mh.attr.perms = iwch_ib_to_tpt_access(acc); if (mr_rereg_mask & IB_MR_REREG_TRANS) ret = build_phys_page_list(buffer_list, num_phys_buf, iova_start, @@ -568,7 +557,7 @@ static int iwch_reregister_phys_mem(stru if (mr_rereg_mask & IB_MR_REREG_PD) mhp->attr.pdid = php->pdid; if (mr_rereg_mask & IB_MR_REREG_ACCESS) - mhp->attr.perms = acc; + mhp->attr.perms = iwch_ib_to_tpt_access(acc); if (mr_rereg_mask & IB_MR_REREG_TRANS) { mhp->attr.zbva = 0; mhp->attr.va_fbo = *iova_start; @@ -613,8 +602,6 @@ static struct ib_mr *iwch_reg_user_mr(st goto err; } - acc = iwch_convert_access(acc); - i = n = 0; list_for_each_entry(chunk, ®ion->chunk_list, list) @@ -630,10 +617,7 @@ static struct ib_mr *iwch_reg_user_mr(st mhp->rhp = rhp; mhp->attr.pdid = php->pdid; mhp->attr.zbva = 0; - mhp->attr.perms = (acc & 0x1) << 3; - mhp->attr.perms |= (acc & 0x2) << 1; - mhp->attr.perms |= (acc & 0x4) >> 1; - mhp->attr.perms |= (acc & 0x8) >> 3; + mhp->attr.perms = iwch_ib_to_tpt_access(acc); mhp->attr.va_fbo = region->virt_base; mhp->attr.page_size = shift - 12; mhp->attr.len = (u32) region->length; diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index de0fe1b..93bcc56 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -286,27 +286,20 @@ static inline int iwch_convert_state(enu } } -enum iwch_mem_perms { - IWCH_MEM_ACCESS_LOCAL_READ = 1 << 0, - IWCH_MEM_ACCESS_LOCAL_WRITE = 1 << 1, - IWCH_MEM_ACCESS_REMOTE_READ = 1 << 2, - IWCH_MEM_ACCESS_REMOTE_WRITE = 1 << 3, - IWCH_MEM_ACCESS_ATOMICS = 1 << 4, - IWCH_MEM_ACCESS_BINDING = 1 << 5, - IWCH_MEM_ACCESS_LOCAL = - (IWCH_MEM_ACCESS_LOCAL_READ | IWCH_MEM_ACCESS_LOCAL_WRITE), - IWCH_MEM_ACCESS_REMOTE = - (IWCH_MEM_ACCESS_REMOTE_WRITE | IWCH_MEM_ACCESS_REMOTE_READ) - /* cannot go beyond 1 << 31 */ -} __attribute__ ((packed)); - -static inline u32 iwch_convert_access(int acc) +static inline u32 iwch_ib_to_tpt_access(int acc) { - return (acc & IB_ACCESS_REMOTE_WRITE ? IWCH_MEM_ACCESS_REMOTE_WRITE : 0) - | (acc & IB_ACCESS_REMOTE_READ ? IWCH_MEM_ACCESS_REMOTE_READ : 0) | - (acc & IB_ACCESS_LOCAL_WRITE ? IWCH_MEM_ACCESS_LOCAL_WRITE : 0) | - (acc & IB_ACCESS_MW_BIND ? IWCH_MEM_ACCESS_BINDING : 0) | - IWCH_MEM_ACCESS_LOCAL_READ; + return (acc & IB_ACCESS_REMOTE_WRITE ? TPT_REMOTE_WRITE : 0) | + (acc & IB_ACCESS_REMOTE_READ ? TPT_REMOTE_READ : 0) | + (acc & IB_ACCESS_LOCAL_WRITE ? TPT_LOCAL_WRITE : 0) | + TPT_LOCAL_READ; +} + +static inline u32 iwch_ib_to_mwbind_access(int acc) +{ + return (acc & IB_ACCESS_REMOTE_WRITE ? T3_MEM_ACCESS_REM_WRITE : 0) | + (acc & IB_ACCESS_REMOTE_READ ? T3_MEM_ACCESS_REM_READ : 0) | + (acc & IB_ACCESS_LOCAL_WRITE ? T3_MEM_ACCESS_LOCAL_WRITE : 0) | + T3_MEM_ACCESS_LOCAL_READ; } enum iwch_mmid_state { diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index 9ea00cc..0a472c9 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -439,7 +439,7 @@ int iwch_bind_mw(struct ib_qp *qp, wqe->bind.type = T3_VA_BASED_TO; /* TBD: check perms */ - wqe->bind.perms = iwch_convert_access(mw_bind->mw_access_flags); + wqe->bind.perms = iwch_ib_to_mwbind_access(mw_bind->mw_access_flags); wqe->bind.mr_stag = cpu_to_be32(mw_bind->mr->lkey); wqe->bind.mw_stag = cpu_to_be32(mw->rkey); wqe->bind.mw_len = cpu_to_be32(mw_bind->length); From mshefty at ichips.intel.com Thu Mar 1 14:56:25 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 01 Mar 2007 14:56:25 -0800 Subject: [ofa-general] librdmacm build failure In-Reply-To: <1172788496.5983.11.camel@stevo-desktop> References: <1172788496.5983.11.camel@stevo-desktop> Message-ID: <45E75A19.40007@ichips.intel.com> > gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -g -O2 -MT > src_librdmacm_la-cma.lo -MD -MP -MF .deps/src_librdmacm_la-cma.Tpo -c > src/cma.c -fPIC -DPIC -o .libs/src_librdmacm_la-cma.o gcc -DHAVE_CONFIG_H > -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -g -O2 -MT > src_librdmacm_la-cma.lo -MD -MP -MF .deps/src_librdmacm_la-cma.Tpo -c > src/cma.c -o src_librdmacm_la-cma.o >/dev/null 2>&1 /bin/sh ./libtool > --tag=CC --mode=link gcc -g -Wall -D_GNU_SOURCE -g -O2 -o src/librdmacm.la > -rpath /usr/local/lib -version-info 1 -export-dynamic > @LIBRDMACM_VERSION_SCRIPT@ src_librdmacm_la-cma.lo -libverbs mkdir src/.libs > gcc -shared .libs/src_librdmacm_la-cma.o -libverbs > @LIBRDMACM_VERSION_SCRIPT@ -Wl,-soname -Wl,librdmacm.so.1 -o > src/.libs/librdmacm.so.1.0.0 gcc: @LIBRDMACM_VERSION_SCRIPT@: No such file or > directory make[1]: *** [src/librdmacm.la] Error 1 make[1]: Leaving directory > `/usr/local/src/git/librdmacm' make: *** [all] Error 2 > vic14:/usr/local/src/git/librdmacm I'm not seeing this on 2.6.18 or 2.6.21-rc1, ever after doing a fresh clone of librdmacm. Did you rerun autogen and configure before make? - Sean From xma at us.ibm.com Thu Mar 1 14:55:55 2007 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 1 Mar 2007 15:55:55 -0700 Subject: [ofa-general] [PATCH] resubmit: enable IPoIB only if broadcast join finish In-Reply-To: Message-ID: Hello, Roalnd, According to IPoIB RFC4391 section 5, once IPoIB broacast group has been joined, the interface should be ready for data transfer. Below patch is going to enable IPoIB carrier only if broadcast join successfully. Any other multicast join failure shouldn't prevent IPoIB interface from ready transferring data. This patch is built against 2.6.21-rc1 kernel. Please review it. Signed-off-by: Shirley Ma diff -urpN ipoib/ipoib_multicast.c ipoib-link/ipoib_multicast.c --- ipoib/ipoib_multicast.c 2007-02-27 07:21:50.000000000 -0800 +++ ipoib-link/ipoib_multicast.c 2007-02-27 07:52:10.000000000 -0800 @@ -407,6 +407,11 @@ static int ipoib_mcast_join_complete(int queue_delayed_work(ipoib_workqueue, &priv->mcast_task, 0); mutex_unlock(&mcast_mutex); + /* + * broadcast join finished, enable carrier + */ + if (unlikely(mcast == priv->broadcast)) + netif_carrier_on(dev); return 0; } @@ -596,7 +601,6 @@ void ipoib_mcast_join_task(struct work_s ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n"); clear_bit(IPOIB_MCAST_RUN, &priv->flags); - netif_carrier_on(dev); } int ipoib_mcast_start_thread(struct net_device *dev) (See attached file: ipoib-link.patch) Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ipoib-link.patch Type: application/octet-stream Size: 816 bytes Desc: not available URL: From swise at opengridcomputing.com Thu Mar 1 15:00:20 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Mar 2007 17:00:20 -0600 Subject: [ofa-general] librdmacm build failure In-Reply-To: <45E75A19.40007@ichips.intel.com> References: <1172788496.5983.11.camel@stevo-desktop> <45E75A19.40007@ichips.intel.com> Message-ID: <1172790020.5983.14.camel@stevo-desktop> On Thu, 2007-03-01 at 14:56 -0800, Sean Hefty wrote: > > gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -g -O2 -MT > > src_librdmacm_la-cma.lo -MD -MP -MF .deps/src_librdmacm_la-cma.Tpo -c > > src/cma.c -fPIC -DPIC -o .libs/src_librdmacm_la-cma.o gcc -DHAVE_CONFIG_H > > -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -g -O2 -MT > > src_librdmacm_la-cma.lo -MD -MP -MF .deps/src_librdmacm_la-cma.Tpo -c > > src/cma.c -o src_librdmacm_la-cma.o >/dev/null 2>&1 /bin/sh ./libtool > > --tag=CC --mode=link gcc -g -Wall -D_GNU_SOURCE -g -O2 -o src/librdmacm.la > > -rpath /usr/local/lib -version-info 1 -export-dynamic > > @LIBRDMACM_VERSION_SCRIPT@ src_librdmacm_la-cma.lo -libverbs mkdir src/.libs > > gcc -shared .libs/src_librdmacm_la-cma.o -libverbs > > @LIBRDMACM_VERSION_SCRIPT@ -Wl,-soname -Wl,librdmacm.so.1 -o > > src/.libs/librdmacm.so.1.0.0 gcc: @LIBRDMACM_VERSION_SCRIPT@: No such file or > > directory make[1]: *** [src/librdmacm.la] Error 1 make[1]: Leaving directory > > `/usr/local/src/git/librdmacm' make: *** [all] Error 2 > > vic14:/usr/local/src/git/librdmacm > > I'm not seeing this on 2.6.18 or 2.6.21-rc1, ever after doing a fresh clone of > librdmacm. Did you rerun autogen and configure before make? > yep. It was fresh clone. distro=sles10 kernel=2.6.21-rc2 vic14:/usr/local/src/git # git clone git://staging.openfabrics.org/~shefty/librdmacm Generating pack... Done counting 428 objects. Deltifying 428 objects. 100% (428/428) done Total 428 (delta 213), reused 329 (delta 159) vic14:/usr/local/src/git # cd librdmacm/ vic14:/usr/local/src/git/librdmacm # ./autogen.sh && ./configure && make && make install + test -d ./config + mkdir ./config + aclocal -I config + libtoolize --force --copy Putting files in AC_CONFIG_AUX_DIR, `config'. + autoheader + automake --foreign --add-missing --copy configure.in: installing `config/install-sh' configure.in: installing `config/missing' Makefile.am: installing `config/compile' Makefile.am: installing `config/depcomp' + autoconf checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for gawk... gawk checking whether make sets $(MAKE)... yes checking build system type... x86_64-suse-linux checking host system type... x86_64-suse-linux checking for style of include used by make... GNU checking for gcc... gcc checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ANSI C... none needed checking dependency style of gcc... gcc3 checking for a sed that does not truncate output... /usr/bin/sed checking for egrep... grep -E checking for ld used by gcc... /usr/x86_64-suse-linux/bin/ld checking if the linker (/usr/x86_64-suse-linux/bin/ld) is GNU ld... yes checking for /usr/x86_64-suse-linux/bin/ld option to reload object files... -r checking for BSD-compatible nm... /usr/bin/nm -B checking whether ln -s works... yes checking how to recognise dependent libraries... pass_all checking how to run the C preprocessor... gcc -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking dlfcn.h usability... yes checking dlfcn.h presence... yes checking for dlfcn.h... yes checking for g++... g++ checking whether we are using the GNU C++ compiler... yes checking whether g++ accepts -g... yes checking dependency style of g++... gcc3 checking how to run the C++ preprocessor... g++ -E checking for g77... no checking for f77... no checking for xlf... no checking for frt... no checking for pgf77... no checking for fort77... no checking for fl32... no checking for af77... no checking for f90... no checking for xlf90... no checking for pgf90... no checking for epcf90... no checking for f95... no checking for fort... no checking for xlf95... no checking for ifc... no checking for efc... no checking for pgf95... no checking for lf95... no checking for gfortran... gfortran checking whether we are using the GNU Fortran 77 compiler... yes checking whether gfortran accepts -g... yes checking the maximum length of command line arguments... 32768 checking command to parse /usr/bin/nm -B output from gcc object... ok checking for objdir... .libs checking for ar... ar checking for ranlib... ranlib checking for strip... strip checking if gcc supports -fno-rtti -fno-exceptions... no checking for gcc option to produce PIC... -fPIC checking if gcc PIC flag -fPIC works... yes checking if gcc static flag -static works... yes checking if gcc supports -c -o file.o... yes checking whether the gcc linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) supports shared libraries... yes checking whether -lc should be explicitly linked in... no checking dynamic linker characteristics... GNU/Linux ld.so checking how to hardcode library paths into programs... immediate checking whether stripping libraries is possible... yes checking if libtool supports shared libraries... yes checking whether to build shared libraries... yes checking whether to build static libraries... yes configure: creating libtool appending configuration tag "CXX" to libtool checking for ld used by g++... /usr/x86_64-suse-linux/bin/ld -m elf_x86_64 checking if the linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) is GNU ld... yes checking whether the g++ linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) supports shared libraries... yes checking for g++ option to produce PIC... -fPIC checking if g++ PIC flag -fPIC works... yes checking if g++ static flag -static works... yes checking if g++ supports -c -o file.o... yes checking whether the g++ linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) supports shared libraries... yes checking dynamic linker characteristics... GNU/Linux ld.so checking how to hardcode library paths into programs... immediate appending configuration tag "F77" to libtool checking if libtool supports shared libraries... yes checking whether to build shared libraries... yes checking whether to build static libraries... yes checking for gfortran option to produce PIC... -fPIC checking if gfortran PIC flag -fPIC works... yes checking if gfortran static flag -static works... yes checking if gfortran supports -c -o file.o... yes checking whether the gfortran linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) supports shared libraries... yes checking dynamic linker characteristics... GNU/Linux ld.so checking how to hardcode library paths into programs... immediate checking for gcc... (cached) gcc checking whether we are using the GNU C compiler... (cached) yes checking whether gcc accepts -g... (cached) yes checking for gcc option to accept ANSI C... (cached) none needed checking dependency style of gcc... (cached) gcc3 checking for an ANSI C-conforming const... yes checking for long... yes checking size of long... 8 checking for ibv_get_device_list in -libverbs... yes checking for ANSI C header files... (cached) yes checking infiniband/verbs.h usability... yes checking infiniband/verbs.h presence... yes checking for infiniband/verbs.h... yes checking whether ld accepts --version-script... yes configure: creating ./config.status config.status: creating Makefile config.status: creating librdmacm.spec config.status: creating config.h config.status: executing depfiles commands make all-am make[1]: Entering directory `/usr/local/src/git/librdmacm' if /bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -g -O2 -MT src_librdmacm_la-cma.lo -MD -MP -MF ".deps/src_librdmacm_la-cma.Tpo" -c -o src_librdmacm_la-cma.lo `test -f 'src/cma.c' || echo './'`src/cma.c; \ then mv -f ".deps/src_librdmacm_la-cma.Tpo" ".deps/src_librdmacm_la-cma.Plo"; else rm -f ".deps/src_librdmacm_la-cma.Tpo"; exit 1; fi mkdir .libs gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -g -O2 -MT src_librdmacm_la-cma.lo -MD -MP -MF .deps/src_librdmacm_la-cma.Tpo -c src/cma.c -fPIC -DPIC -o .libs/src_librdmacm_la-cma.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -g -O2 -MT src_librdmacm_la-cma.lo -MD -MP -MF .deps/src_librdmacm_la-cma.Tpo -c src/cma.c -o src_librdmacm_la-cma.o >/dev/null 2>&1 /bin/sh ./libtool --tag=CC --mode=link gcc -g -Wall -D_GNU_SOURCE -g -O2 -o src/librdmacm.la -rpath /usr/local/lib -version-info 1 -export-dynamic @LIBRDMACM_VERSION_SCRIPT@ src_librdmacm_la-cma.lo -libverbs mkdir src/.libs gcc -shared .libs/src_librdmacm_la-cma.o -libverbs @LIBRDMACM_VERSION_SCRIPT@ -Wl,-soname -Wl,librdmacm.so.1 -o src/.libs/librdmacm.so.1.0.0 gcc: @LIBRDMACM_VERSION_SCRIPT@: No such file or directory make[1]: *** [src/librdmacm.la] Error 1 make[1]: Leaving directory `/usr/local/src/git/librdmacm' make: *** [all] Error 2 vic14:/usr/local/src/git/librdmacm # From rdreier at cisco.com Thu Mar 1 15:15:06 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 01 Mar 2007 15:15:06 -0800 Subject: [ofa-general] Re: [PATCH] resubmit: enable IPoIB only if broadcast join finish In-Reply-To: (Shirley Ma's message of "Thu, 1 Mar 2007 15:55:55 -0700") References: Message-ID: > According to IPoIB RFC4391 section 5, once IPoIB broacast group has been > joined, the interface should be ready for data transfer. I don't see any language in the RFC that says that. The closest thing I see is: Thus, the IPoIB link is formed by the IPoIB nodes joining the broadcast group. but that is far from a definitive statement. And I would like to make sure that this patch doesn't break anything. Can you convince me that this won't break IPv6 DAD for autoconfiguration? - R. From sean.hefty at intel.com Thu Mar 1 15:29:57 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 1 Mar 2007 15:29:57 -0800 Subject: [ofa-general] rdma_cm issues in 2.6.21-rc1 Message-ID: <000001c75c59$86b852e0$ff0da8c0@amr.corp.intel.com> As just a note, I'm investigating two issues with the rdma_cm and 2.6.21-rc1. Running ucmatose twice results in a failure binding to an address the second time that it's run. I'm also seeing a kernel crash if ucmatose is killed while waiting for a connection. - Sean From swise at opengridcomputing.com Thu Mar 1 15:33:41 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 01 Mar 2007 17:33:41 -0600 Subject: [ofa-general] librdmacm build failure In-Reply-To: <1172790020.5983.14.camel@stevo-desktop> References: <1172788496.5983.11.camel@stevo-desktop> <45E75A19.40007@ichips.intel.com> <1172790020.5983.14.camel@stevo-desktop> Message-ID: <1172792021.5983.17.camel@stevo-desktop> Does your configure.in need something this (cut from libibverbs configure.in)? I'm not a automake or configure whiz at all, so I'm just guessing. But libibverbs builds fine and it uses a version script too... if test $ac_cv_version_script = yes; then LIBIBVERBS_VERSION_SCRIPT='-Wl,--version-script=$(srcdir)/src/libibverbs.map' else LIBIBVERBS_VERSION_SCRIPT= fi On Thu, 2007-03-01 at 17:00 -0600, Steve Wise wrote: > On Thu, 2007-03-01 at 14:56 -0800, Sean Hefty wrote: > > > gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -g -O2 -MT > > > src_librdmacm_la-cma.lo -MD -MP -MF .deps/src_librdmacm_la-cma.Tpo -c > > > src/cma.c -fPIC -DPIC -o .libs/src_librdmacm_la-cma.o gcc -DHAVE_CONFIG_H > > > -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -g -O2 -MT > > > src_librdmacm_la-cma.lo -MD -MP -MF .deps/src_librdmacm_la-cma.Tpo -c > > > src/cma.c -o src_librdmacm_la-cma.o >/dev/null 2>&1 /bin/sh ./libtool > > > --tag=CC --mode=link gcc -g -Wall -D_GNU_SOURCE -g -O2 -o src/librdmacm.la > > > -rpath /usr/local/lib -version-info 1 -export-dynamic > > > @LIBRDMACM_VERSION_SCRIPT@ src_librdmacm_la-cma.lo -libverbs mkdir src/.libs > > > gcc -shared .libs/src_librdmacm_la-cma.o -libverbs > > > @LIBRDMACM_VERSION_SCRIPT@ -Wl,-soname -Wl,librdmacm.so.1 -o > > > src/.libs/librdmacm.so.1.0.0 gcc: @LIBRDMACM_VERSION_SCRIPT@: No such file or > > > directory make[1]: *** [src/librdmacm.la] Error 1 make[1]: Leaving directory > > > `/usr/local/src/git/librdmacm' make: *** [all] Error 2 > > > vic14:/usr/local/src/git/librdmacm > > > > I'm not seeing this on 2.6.18 or 2.6.21-rc1, ever after doing a fresh clone of > > librdmacm. Did you rerun autogen and configure before make? > > > yep. It was fresh clone. > > distro=sles10 > kernel=2.6.21-rc2 > > vic14:/usr/local/src/git > # git clone git://staging.openfabrics.org/~shefty/librdmacm > Generating pack... > Done counting 428 objects. > Deltifying 428 objects. > 100% (428/428) done > Total 428 (delta 213), reused 329 (delta 159) > vic14:/usr/local/src/git > # cd librdmacm/ > vic14:/usr/local/src/git/librdmacm > # ./autogen.sh && ./configure && make && make install > + test -d ./config > + mkdir ./config > + aclocal -I config > + libtoolize --force --copy > Putting files in AC_CONFIG_AUX_DIR, `config'. > + autoheader > + automake --foreign --add-missing --copy > configure.in: installing `config/install-sh' > configure.in: installing `config/missing' > Makefile.am: installing `config/compile' > Makefile.am: installing `config/depcomp' > + autoconf > checking for a BSD-compatible install... /usr/bin/install -c > checking whether build environment is sane... yes > checking for gawk... gawk > checking whether make sets $(MAKE)... yes > checking build system type... x86_64-suse-linux > checking host system type... x86_64-suse-linux > checking for style of include used by make... GNU > checking for gcc... gcc > checking for C compiler default output file name... a.out > checking whether the C compiler works... yes > checking whether we are cross compiling... no > checking for suffix of executables... > checking for suffix of object files... o > checking whether we are using the GNU C compiler... yes > checking whether gcc accepts -g... yes > checking for gcc option to accept ANSI C... none needed > checking dependency style of gcc... gcc3 > checking for a sed that does not truncate output... /usr/bin/sed > checking for egrep... grep -E > checking for ld used by gcc... /usr/x86_64-suse-linux/bin/ld > checking if the linker (/usr/x86_64-suse-linux/bin/ld) is GNU ld... yes > checking for /usr/x86_64-suse-linux/bin/ld option to reload object files... -r > checking for BSD-compatible nm... /usr/bin/nm -B > checking whether ln -s works... yes > checking how to recognise dependent libraries... pass_all > checking how to run the C preprocessor... gcc -E > checking for ANSI C header files... yes > checking for sys/types.h... yes > checking for sys/stat.h... yes > checking for stdlib.h... yes > checking for string.h... yes > checking for memory.h... yes > checking for strings.h... yes > checking for inttypes.h... yes > checking for stdint.h... yes > checking for unistd.h... yes > checking dlfcn.h usability... yes > checking dlfcn.h presence... yes > checking for dlfcn.h... yes > checking for g++... g++ > checking whether we are using the GNU C++ compiler... yes > checking whether g++ accepts -g... yes > checking dependency style of g++... gcc3 > checking how to run the C++ preprocessor... g++ -E > checking for g77... no > checking for f77... no > checking for xlf... no > checking for frt... no > checking for pgf77... no > checking for fort77... no > checking for fl32... no > checking for af77... no > checking for f90... no > checking for xlf90... no > checking for pgf90... no > checking for epcf90... no > checking for f95... no > checking for fort... no > checking for xlf95... no > checking for ifc... no > checking for efc... no > checking for pgf95... no > checking for lf95... no > checking for gfortran... gfortran > checking whether we are using the GNU Fortran 77 compiler... yes > checking whether gfortran accepts -g... yes > checking the maximum length of command line arguments... 32768 > checking command to parse /usr/bin/nm -B output from gcc object... ok > checking for objdir... .libs > checking for ar... ar > checking for ranlib... ranlib > checking for strip... strip > checking if gcc supports -fno-rtti -fno-exceptions... no > checking for gcc option to produce PIC... -fPIC > checking if gcc PIC flag -fPIC works... yes > checking if gcc static flag -static works... yes > checking if gcc supports -c -o file.o... yes > checking whether the gcc linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) supports shared libraries... yes > checking whether -lc should be explicitly linked in... no > checking dynamic linker characteristics... GNU/Linux ld.so > checking how to hardcode library paths into programs... immediate > checking whether stripping libraries is possible... yes > checking if libtool supports shared libraries... yes > checking whether to build shared libraries... yes > checking whether to build static libraries... yes > configure: creating libtool > appending configuration tag "CXX" to libtool > checking for ld used by g++... /usr/x86_64-suse-linux/bin/ld -m elf_x86_64 > checking if the linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) is GNU ld... yes > checking whether the g++ linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) supports shared libraries... yes > checking for g++ option to produce PIC... -fPIC > checking if g++ PIC flag -fPIC works... yes > checking if g++ static flag -static works... yes > checking if g++ supports -c -o file.o... yes > checking whether the g++ linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) supports shared libraries... yes > checking dynamic linker characteristics... GNU/Linux ld.so > checking how to hardcode library paths into programs... immediate > appending configuration tag "F77" to libtool > checking if libtool supports shared libraries... yes > checking whether to build shared libraries... yes > checking whether to build static libraries... yes > checking for gfortran option to produce PIC... -fPIC > checking if gfortran PIC flag -fPIC works... yes > checking if gfortran static flag -static works... yes > checking if gfortran supports -c -o file.o... yes > checking whether the gfortran linker (/usr/x86_64-suse-linux/bin/ld -m elf_x86_64) supports shared libraries... yes > checking dynamic linker characteristics... GNU/Linux ld.so > checking how to hardcode library paths into programs... immediate > checking for gcc... (cached) gcc > checking whether we are using the GNU C compiler... (cached) yes > checking whether gcc accepts -g... (cached) yes > checking for gcc option to accept ANSI C... (cached) none needed > checking dependency style of gcc... (cached) gcc3 > checking for an ANSI C-conforming const... yes > checking for long... yes > checking size of long... 8 > checking for ibv_get_device_list in -libverbs... yes > checking for ANSI C header files... (cached) yes > checking infiniband/verbs.h usability... yes > checking infiniband/verbs.h presence... yes > checking for infiniband/verbs.h... yes > checking whether ld accepts --version-script... yes > configure: creating ./config.status > config.status: creating Makefile > config.status: creating librdmacm.spec > config.status: creating config.h > config.status: executing depfiles commands > make all-am > make[1]: Entering directory `/usr/local/src/git/librdmacm' > if /bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -g -O2 -MT src_librdmacm_la-cma.lo -MD -MP -MF ".deps/src_librdmacm_la-cma.Tpo" -c -o src_librdmacm_la-cma.lo `test -f 'src/cma.c' || echo './'`src/cma.c; \ > then mv -f ".deps/src_librdmacm_la-cma.Tpo" ".deps/src_librdmacm_la-cma.Plo"; else rm -f ".deps/src_librdmacm_la-cma.Tpo"; exit 1; fi > mkdir .libs > gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -g -O2 -MT src_librdmacm_la-cma.lo -MD -MP -MF .deps/src_librdmacm_la-cma.Tpo -c src/cma.c -fPIC -DPIC -o .libs/src_librdmacm_la-cma.o > gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -g -O2 -MT src_librdmacm_la-cma.lo -MD -MP -MF .deps/src_librdmacm_la-cma.Tpo -c src/cma.c -o src_librdmacm_la-cma.o >/dev/null 2>&1 > /bin/sh ./libtool --tag=CC --mode=link gcc -g -Wall -D_GNU_SOURCE -g -O2 -o src/librdmacm.la -rpath /usr/local/lib -version-info 1 -export-dynamic @LIBRDMACM_VERSION_SCRIPT@ src_librdmacm_la-cma.lo -libverbs > mkdir src/.libs > gcc -shared .libs/src_librdmacm_la-cma.o -libverbs @LIBRDMACM_VERSION_SCRIPT@ -Wl,-soname -Wl,librdmacm.so.1 -o src/.libs/librdmacm.so.1.0.0 > gcc: @LIBRDMACM_VERSION_SCRIPT@: No such file or directory > make[1]: *** [src/librdmacm.la] Error 1 > make[1]: Leaving directory `/usr/local/src/git/librdmacm' > make: *** [all] Error 2 > vic14:/usr/local/src/git/librdmacm > # > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sean.hefty at intel.com Thu Mar 1 15:39:28 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 1 Mar 2007 15:39:28 -0800 Subject: [ofa-general] librdmacm build failure In-Reply-To: <1172792021.5983.17.camel@stevo-desktop> Message-ID: <000101c75c5a$da589940$ff0da8c0@amr.corp.intel.com> >Does your configure.in need something this (cut from libibverbs >configure.in)? I'm not a automake or configure whiz at all, so I'm just >guessing. But libibverbs builds fine and it uses a version script >too... It may... I tried to match the libibverbs build settings, so this could have been missed. What I don't understand is why it works fine for me, but not for you. You should be able to use the ofed_1_2 branch until I can look into this more. (I'm trying to track down the issues with 2.6.21 at the moment.) - Sean From rdreier at cisco.com Thu Mar 1 15:43:09 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 01 Mar 2007 15:43:09 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: (Shirley Ma's message of "Thu, 1 Mar 2007 14:38:56 -0800") References: Message-ID: > Setting carrier on should be OK here. ND can't send out before the > interface is capable for sending any packet. And if ND messages can't reach > the correct destination because of multicast join failure later then the > IPv6 address is useless. Your patch turns on the carrier before all multicast joins have completed. But if ND messages are sent too early, and a duplicate address is chosen, and then multicast joins complete later, a fabric might end up with duplicate IPv6 link-local addresses, right? Or is there some reason that can't happen? - R. From xma at us.ibm.com Thu Mar 1 15:43:31 2007 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 1 Mar 2007 15:43:31 -0800 Subject: [ofa-general] Re: [PATCH] resubmit: enable IPoIB only if broadcast join finish In-Reply-To: Message-ID: Roland Dreier wrote on 03/01/2007 03:15:06 PM: > > According to IPoIB RFC4391 section 5, once IPoIB broacast group has been > > joined, the interface should be ready for data transfer. > > I don't see any language in the RFC that says that. The closest thing > I see is: > > Thus, the IPoIB link is formed by the IPoIB nodes joining the > broadcast group. > > but that is far from a definitive statement. And I would like to make > sure that this patch doesn't break anything. Can you convince me that > this won't break IPv6 DAD for autoconfiguration? > > - R. In Section 5, Thus, the IPoIB link is formed by the IPoIB nodes joining the broadcast group. There is no physical demarcation of the IPoIB link other than that determined by the broadcast group membership. I interrpreted this to when to set netif_carrier_on(). Is that correct? This won't break IPv6 DAD. I used to work on IPv6 several years ago, and was the author of DHCPv6 sourceforge project. If the interface uses DHCPv6, DHCPv6 will take care of DAD, if interface uses static IPv6 address, if the IB multicast for IPv6 solicited multicast address join failure, then this IPv6 address is useless, if it's successful, then in /var/log/message, there will be an error report: "duplicate address detected!" Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Thu Mar 1 15:52:59 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 1 Mar 2007 15:52:59 -0800 Subject: [ofa-general] RE: OFED 1.2 daily builds In-Reply-To: <1172746558.17950.38.camel@vladsk-laptop> References: <1172746558.17950.38.camel@vladsk-laptop> Message-ID: Excellent, thank you! Scott > -----Original Message----- > From: Vladimir Sokolovsky [mailto:vlad at mellanox.co.il] > Sent: Thursday, March 01, 2007 2:56 AM > To: Scott Weitzenkamp (sweitzen); openfabrics-ewg at openib.org > Cc: Tziporet Koren; Jeff Squyres (jsquyres); Openib-General at Openib.Org > Subject: OFED 1.2 daily builds > > > OFED daily builds started on openfabrics server. > http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-yyyymmdd-hhmm.tgz > will be created every day at 06:00 PST. > > http://www.openfabrics.org/builds/ofed-1.2/latest.txt > includes the name > of the latest OFED package. > > -- > Vladimir Sokolovsky > Mellanox Technologies Ltd. > From sean.hefty at intel.com Thu Mar 1 16:02:57 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 1 Mar 2007 16:02:57 -0800 Subject: [ofa-general] librdmacm build failure In-Reply-To: <1172792021.5983.17.camel@stevo-desktop> Message-ID: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com> Can you try this patch and see if it works for you? --- diff --git a/Makefile.am b/Makefile.am index 57dc0b3..2eb95c6 100644 --- a/Makefile.am +++ b/Makefile.am @@ -6,7 +6,11 @@ AM_CFLAGS = -g -Wall -D_GNU_SOURCE src_librdmacm_la_CFLAGS = $(AM_CFLAGS) -librdmacm_version_script = @LIBRDMACM_VERSION_SCRIPT@ +if HAVE_LD_VERSION_SCRIPT + librdmacm_version_script = -Wl,--version-script=$(srcdir)/src/librdmacm.map +else + librdmacm_version_script = +endif src_librdmacm_la_SOURCES = src/cma.c src_librdmacm_la_LDFLAGS = -version-info 1 -export-dynamic \ From xma at us.ibm.com Thu Mar 1 17:04:43 2007 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 1 Mar 2007 17:04:43 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: Message-ID: Roland Dreier wrote on 03/01/2007 03:43:09 PM: > > Setting carrier on should be OK here. ND can't send out before the > > interface is capable for sending any packet. And if ND messages can't reach > > the correct destination because of multicast join failure later then the > > IPv6 address is useless. > > Your patch turns on the carrier before all multicast joins have > completed. But if ND messages are sent too early, and a duplicate > address is chosen, and then multicast joins complete later, a fabric > might end up with duplicate IPv6 link-local addresses, right? Or is > there some reason that can't happen? > > - R. IPv6 ND doesn't prevent the duplicate IPv6 link-local address in the network. It only saves a warning in /var/log/messages to indicate that this address is duplicated in the network. ND can detect this when sending packets. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From hozer at hozed.org Thu Mar 1 18:52:48 2007 From: hozer at hozed.org (Troy Benjegerdes) Date: Thu, 1 Mar 2007 20:52:48 -0600 Subject: [ofa-general] ofed_1_2 git echa problems Message-ID: <20070302025248.GF27026@narn.hozed.org> I cloned vlad's ofed_1_2 git libibverbs and libehca trees yesterday, and 32 bit builds with echa segfault, and 64 bit builds get me this: p5l7:/usr/src/netpipe3-dev# ibv_rc_pingpong p5l9 ctx: 0x10016e60 ibv_create_cq( 0x10016f40, 501, NULL, 0x0, 0) PID185b EHCA_ERR:write_rwqe Invalid number of WQE SGE. num_sqe=1 max_nr_of_sg=0 PID185b ehca0 EHCA_ERR:ehcau_post_recv Could not write WQE qp_num=c Couldn't post receive (0) If I go back and reinstall libehca and libibverbs from OFED-1.1, it works just fine. p5l7:/usr/src/netpipe3-dev# ibv_rc_pingpong local address: LID 0x0025, QPN 0x00000d, PSN 0x87fb00 remote address: LID 0x0024, QPN 0x000014, PSN 0x099ac2 8192000 bytes in 0.02 seconds = 2787.46 Mbit/sec 1000 iters in 0.02 seconds = 23.51 usec/iter This is on debian with a 2.6.19.2 kernel from kernel.org. From changquing.tang at hp.com Thu Mar 1 19:16:47 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Fri, 2 Mar 2007 03:16:47 -0000 Subject: [ofa-general] What is the size of async event queue ? In-Reply-To: References: Message-ID: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> Roland: What is the default size of the async event queue ? Suppose I create 1024 QP from one process to another process, Somehow the remote process crashes, Can I get all the 1024 QP error async event, how do I make sure I don't loss an event ? I am afraid that if I call ibv_get_async_event() too often, it can affect performance. Is ibv_get_async_event() a lightweight call or a heavyweight call ? Also if I want to detect QP connection error, I can either use completion error, or use ibv_get_async_event(), which way report error faster ? Thanks. --CQ From mst at mellanox.co.il Thu Mar 1 20:42:14 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 2 Mar 2007 06:42:14 +0200 Subject: [ofa-general] Re: [PATCHv2 for-2.6.21] IB/mthca: QP reset fixes In-Reply-To: References: <20070301144139.GK14282@mellanox.co.il> Message-ID: <20070302044214.GC27542@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCHv2 for-2.6.21] IB/mthca: QP reset fixes > > Yes, this definitely looks like a needed fix. However: > > > if (qp->ibqp.send_cq != qp->ibqp.recv_cq) > > mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, NULL); > > > > + if (dev->mthca_flags & MTHCA_FLAG_MSI_X) { > > + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_COMP].msi_x_vector); > > + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_ASYNC].msi_x_vector); > > + } else > > + synchronize_irq(dev->pdev->irq); > > I'm not quite sure I understand why we have to synchronize against the > completion EQ's interrupt here, but I assume it's to make sure that no > more CQEs are written that come from this QP's work queues. Is that > definitely necessary? The hardware that writes CQEs isn't > synchronized with the 2RST firmware command? > > Anyway, assuming that we do need to synchronize with the completion > interrupt, then the order looks suspicious above -- we clean out CQEs > from CQs attached to the QP and then synchronize with the interrupt, > which appears to leave a window where a new CQE could be written and > not end up getting cleaned. > > (and the same thing seems to apply to the changes to mthca_free_cq() also). The hardware is synchronized, so it won't generate new events/CQEs after 2RST. So I think in regular interrupt handler we do not need this synchronize_irq. However, for MSI, interrupt handler for event could be running on a CPU different from one servicing the command IFC, so I think we need the synchronize_irq there. Makes sense? -- MST From hnguyen at linux.vnet.ibm.com Fri Mar 2 00:24:33 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Fri, 2 Mar 2007 09:24:33 +0100 Subject: [ofa-general] [PATCH ofed-1.2-beta 0/5] ehca: bug fixes for kernel space Message-ID: <200703020924.33999.hnguyen@linux.vnet.ibm.com> Hello Vladimir! This is a patch set for ehca with various bug fixes (being queued for 2.6.21) I'd like to incorporate in ofed-1.2-beta. - reworked irq handler to avoid/reduce missed irq events For this I've sent a patch that you have pushed in your git tree ofed_1_2 as kernel_patches/fixes/ehca_2_rework_irq_handler.patch However I realized that it's not applicable since I created it without applying the previous patch. Therefore I'm sending it again. Please replace kernel_patches/fixes/ehca_2_rework_irq_handler.patch in your git tree by this new one. Sorry for this inconvenience! BTW: The build script does not recognize such one mistake and just compiles and installs without error, since nothing is broken without this patch. - fix race condition/locking issues in scaling code - allow en/disabling scaling code via module parameter - query_port() returns LINK_UP instead UNKNOWN - fix mismatched sync between completion handler and destroy cq Thanks! Nam From hnguyen at linux.vnet.ibm.com Fri Mar 2 00:28:06 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Fri, 2 Mar 2007 09:28:06 +0100 Subject: [ofa-general] [PATCH ofed-1.2-beta 1/5] ehca: reworked irq handler to avoid/reduce missed irq events Message-ID: <200703020928.06303.hnguyen@linux.vnet.ibm.com> reworked irq handler to avoid/reduce missed irq events Signed-off-by: Hoang-Nam Nguyen --- ehca_classes.h | 19 +++-- ehca_eq.c | 1 ehca_irq.c | 201 ++++++++++++++++++++++++++++++++++++--------------------- ehca_irq.h | 1 ehca_main.c | 24 +++++- ipz_pt_fn.h | 9 ++ 6 files changed, 172 insertions(+), 83 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h index cf95ee4..acf6705 100644 --- a/drivers/infiniband/hw/ehca/ehca_classes.h +++ b/drivers/infiniband/hw/ehca/ehca_classes.h @@ -42,9 +42,6 @@ #ifndef __EHCA_CLASSES_H__ #define __EHCA_CLASSES_H__ -#include "ehca_classes.h" -#include "ipz_pt_fn.h" - struct ehca_module; struct ehca_qp; struct ehca_cq; @@ -54,14 +51,22 @@ struct ehca_mw; struct ehca_pd; struct ehca_av; +#include +#include + #ifdef CONFIG_PPC64 #include "ehca_classes_pSeries.h" #endif +#include "ehca_classes.h" +#include "ipz_pt_fn.h" +#include "ehca_irq.h" -#include -#include +#define EHCA_EQE_CACHE_SIZE 20 -#include "ehca_irq.h" +struct ehca_eqe_cache_entry { + struct ehca_eqe *eqe; + struct ehca_cq *cq; +}; struct ehca_eq { u32 length; @@ -74,6 +79,8 @@ struct ehca_eq { spinlock_t spinlock; struct tasklet_struct interrupt_task; u32 ist; + spinlock_t irq_spinlock; + struct ehca_eqe_cache_entry eqe_cache[EHCA_EQE_CACHE_SIZE]; }; struct ehca_sport { diff --git a/drivers/infiniband/hw/ehca/ehca_eq.c b/drivers/infiniband/hw/ehca/ehca_eq.c index 5281dec..33c822e 100644 --- a/drivers/infiniband/hw/ehca/ehca_eq.c +++ b/drivers/infiniband/hw/ehca/ehca_eq.c @@ -61,6 +61,7 @@ int ehca_create_eq(struct ehca_shca *shc struct ib_device *ib_dev = &shca->ib_device; spin_lock_init(&eq->spinlock); + spin_lock_init(&eq->irq_spinlock); eq->is_initialized = 0; if (type != EHCA_EQ && type != EHCA_NEQ) { diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index c069be8..eff0936 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -401,87 +401,142 @@ irqreturn_t ehca_interrupt_eq(int irq, v return IRQ_HANDLED; } -void ehca_tasklet_eq(unsigned long data) +static inline void process_eqe(struct ehca_shca *shca, struct ehca_eqe *eqe) { - struct ehca_shca *shca = (struct ehca_shca*)data; - struct ehca_eqe *eqe; - int int_state; - int query_cnt = 0; - - do { - eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq); - - if ((shca->hw_level >= 2) && eqe) - int_state = 1; - else - int_state = 0; - - while ((int_state == 1) || eqe) { - while (eqe) { - u64 eqe_value = eqe->entry; - - ehca_dbg(&shca->ib_device, - "eqe_value=%lx", eqe_value); - - /* TODO: better structure */ - if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, - eqe_value)) { - unsigned long flags; - u32 token; - struct ehca_cq *cq; - - ehca_dbg(&shca->ib_device, - "... completion event"); - token = - EHCA_BMASK_GET(EQE_CQ_TOKEN, - eqe_value); - spin_lock_irqsave(&ehca_cq_idr_lock, - flags); - cq = idr_find(&ehca_cq_idr, token); - - if (cq == NULL) { - spin_unlock_irqrestore(&ehca_cq_idr_lock, - flags); - break; - } - - reset_eq_pending(cq); + u64 eqe_value; + u32 token; + unsigned long flags; + struct ehca_cq *cq; + eqe_value = eqe->entry; + ehca_dbg(&shca->ib_device, "eqe_value=%lx", eqe_value); + if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) { + ehca_dbg(&shca->ib_device, "... completion event"); + token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value); + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + cq = idr_find(&ehca_cq_idr, token); + if (cq == NULL) { + spin_unlock(&ehca_cq_idr_lock); + ehca_err(&shca->ib_device, + "Invalid eqe for non-existing cq token=%x", + token); + return; + } + reset_eq_pending(cq); #ifdef CONFIG_INFINIBAND_EHCA_SCALING - queue_comp_task(cq); - spin_unlock_irqrestore(&ehca_cq_idr_lock, - flags); + queue_comp_task(cq); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); #else - spin_unlock_irqrestore(&ehca_cq_idr_lock, - flags); - comp_event_callback(cq); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + comp_event_callback(cq); #endif - } else { - ehca_dbg(&shca->ib_device, - "... non completion event"); - parse_identifier(shca, eqe_value); - } - eqe = - (struct ehca_eqe *)ehca_poll_eq(shca, - &shca->eq); - } + } else { + ehca_dbg(&shca->ib_device, + "... non completion event"); + parse_identifier(shca, eqe_value); + } +} - if (shca->hw_level >= 2) { - int_state = - hipz_h_query_int_state(shca->ipz_hca_handle, - shca->eq.ist); - query_cnt++; - iosync(); - if (query_cnt >= 100) { - query_cnt = 0; - int_state = 0; - } - } - eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq); +void ehca_process_eq(struct ehca_shca *shca, int is_irq) +{ + struct ehca_eq *eq = &shca->eq; + struct ehca_eqe_cache_entry *eqe_cache = eq->eqe_cache; + u64 eqe_value; + unsigned long flags; + unsigned long irq_flags; + int eqe_cnt, i; + int eq_empty = 0; + + spin_lock_irqsave(&eq->irq_spinlock, irq_flags); + if (is_irq) { + const int max_query_cnt = 100; + int query_cnt = 0; + int int_state = 1; + do { + int_state = hipz_h_query_int_state( + shca->ipz_hca_handle, eq->ist); + query_cnt++; + iosync(); + } while (int_state && query_cnt < max_query_cnt); + if (unlikely((query_cnt == max_query_cnt))) + ehca_err(&shca->ib_device, "int_state=%x query_cnt=%x", + int_state, query_cnt); + } + /* read out all eqes */ + eqe_cnt = 0; + do { + u32 token; + eqe_cache[eqe_cnt].eqe = + (struct ehca_eqe *)ehca_poll_eq(shca, eq); + if (!eqe_cache[eqe_cnt].eqe) + break; + eqe_value = eqe_cache[eqe_cnt].eqe->entry; + if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) { + token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value); + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + eqe_cache[eqe_cnt].cq = idr_find(&ehca_cq_idr, token); + if (!eqe_cache[eqe_cnt].cq) { + spin_unlock_irqrestore(&ehca_cq_idr_lock, + flags); + ehca_err(&shca->ib_device, + "Invalid eqe for non-existing cq " + "token=%x", token); + continue; + } + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + } else + eqe_cache[eqe_cnt].cq = NULL; + eqe_cnt++; + } while (eqe_cnt < EHCA_EQE_CACHE_SIZE); + if (!eqe_cnt) { + if (is_irq) + ehca_dbg(&shca->ib_device, + "No eqe found for irq event"); + goto unlock_irq_spinlock; + } else if (!is_irq) + ehca_dbg(&shca->ib_device, "deadman found %x eqe", eqe_cnt); + if (eqe_cnt == EHCA_EQE_CACHE_SIZE) + ehca_dbg(&shca->ib_device, "too many eqes for one irq event"); + /* enable irq for new packets */ + for (i = 0; i < eqe_cnt; i++) { + if (eq->eqe_cache[i].cq) + reset_eq_pending(eq->eqe_cache[i].cq); + } + /* check eq */ + spin_lock_irqsave(&eq->spinlock, flags); + eq_empty = (!ipz_eqit_eq_peek_valid(&shca->eq.ipz_queue)); + spin_unlock_irqrestore(&eq->spinlock, flags); + /* call completion handler for cached eqes */ + for (i = 0; i < eqe_cnt; i++) + if (eq->eqe_cache[i].cq) +#ifdef CONFIG_INFINIBAND_EHCA_SCALING + queue_comp_task(eq->eqe_cache[i].cq); +#else + comp_event_callback(eq->eqe_cache[i].cq); +#endif + else { + ehca_dbg(&shca->ib_device, "got non completion event"); + parse_identifier(shca, eq->eqe_cache[i].eqe->entry); } - } while (int_state != 0); + /* poll eq if not empty */ + if (eq_empty) + goto unlock_irq_spinlock; + do { + struct ehca_eqe *eqe; + eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq); + if (!eqe) + break; + process_eqe(shca, eqe); + eqe_cnt++; + } while (1); + +unlock_irq_spinlock: + spin_unlock_irqrestore(&eq->irq_spinlock, irq_flags); +} - return; +void ehca_tasklet_eq(unsigned long data) +{ + ehca_process_eq((struct ehca_shca*)data, 1); } #ifdef CONFIG_INFINIBAND_EHCA_SCALING diff --git a/drivers/infiniband/hw/ehca/ehca_irq.h b/drivers/infiniband/hw/ehca/ehca_irq.h index be579cc..6ed06ee 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.h +++ b/drivers/infiniband/hw/ehca/ehca_irq.h @@ -56,6 +56,7 @@ void ehca_tasklet_neq(unsigned long data irqreturn_t ehca_interrupt_eq(int irq, void *dev_id); void ehca_tasklet_eq(unsigned long data); +void ehca_process_eq(struct ehca_shca *shca, int is_irq); struct ehca_cpu_comp_task { wait_queue_head_t wait_queue; diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index f2c328f..2af5225 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -52,7 +52,7 @@ #include "hcp_if.h" MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); -MODULE_VERSION("SVNEHCA_0020"); +MODULE_VERSION("SVNEHCA_0021"); int ehca_open_aqp1 = 0; int ehca_debug_level = 0; @@ -778,8 +778,24 @@ void ehca_poll_eqs(unsigned long data) spin_lock(&shca_list_lock); list_for_each_entry(shca, &shca_list, shca_list) { - if (shca->eq.is_initialized) - ehca_tasklet_eq((unsigned long)(void*)shca); + if (shca->eq.is_initialized) { + /* call deadman proc only if eq ptr does not change */ + struct ehca_eq *eq = &shca->eq; + int max = 3; + volatile u64 q_ofs, q_ofs2; + u64 flags; + spin_lock_irqsave(&eq->spinlock, flags); + q_ofs = eq->ipz_queue.current_q_offset; + spin_unlock_irqrestore(&eq->spinlock, flags); + do { + spin_lock_irqsave(&eq->spinlock, flags); + q_ofs2 = eq->ipz_queue.current_q_offset; + spin_unlock_irqrestore(&eq->spinlock, flags); + max--; + } while (q_ofs == q_ofs2 && max > 0); + if (q_ofs == q_ofs2) + ehca_process_eq(shca, 0); + } } mod_timer(&poll_eqs_timer, jiffies + HZ); spin_unlock(&shca_list_lock); @@ -790,7 +806,7 @@ int __init ehca_module_init(void) int ret; printk(KERN_INFO "eHCA Infiniband Device Driver " - "(Rel.: SVNEHCA_0020)\n"); + "(Rel.: SVNEHCA_0021)\n"); idr_init(&ehca_qp_idr); idr_init(&ehca_cq_idr); spin_lock_init(&ehca_qp_idr_lock); diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.h b/drivers/infiniband/hw/ehca/ipz_pt_fn.h index dc3bda2..4501f75 100644 --- a/drivers/infiniband/hw/ehca/ipz_pt_fn.h +++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.h @@ -247,6 +247,15 @@ static inline void *ipz_eqit_eq_get_inc_ return ret; } +static inline void *ipz_eqit_eq_peek_valid(struct ipz_queue *queue) +{ + void *ret = ipz_qeit_get(queue); + u32 qe = *(u8 *) ret; + if ((qe >> 7) != (queue->toggle_state & 1)) + return NULL; + return ret; +} + /* returns address (GX) of first queue entry */ static inline u64 ipz_qpt_get_firstpage(struct ipz_qpt *qpt) { From hnguyen at linux.vnet.ibm.com Fri Mar 2 00:29:30 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Fri, 2 Mar 2007 09:29:30 +0100 Subject: [ofa-general] [PATCH ofed-1.2-beta 2/5] ehca: fix race condition/locking issues in scaling code Message-ID: <200703020929.30578.hnguyen@linux.vnet.ibm.com> fix a race condition in find_next_cpu_online() and some other locking issues in scaling code Signed-off-by: Hoang-Nam Nguyen --- ehca_irq.c | 68 +++++++++++++++++++++++++++++-------------------------------- diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index eff0936..fa76b71 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -543,28 +543,30 @@ #ifdef CONFIG_INFINIBAND_EHCA_SCALING static inline int find_next_online_cpu(struct ehca_comp_pool* pool) { - unsigned long flags_last_cpu; + int cpu; + unsigned long flags; + WARN_ON_ONCE(!in_interrupt()); if (ehca_debug_level) ehca_dmp(&cpu_online_map, sizeof(cpumask_t), ""); - spin_lock_irqsave(&pool->last_cpu_lock, flags_last_cpu); - pool->last_cpu = next_cpu(pool->last_cpu, cpu_online_map); - if (pool->last_cpu == NR_CPUS) - pool->last_cpu = first_cpu(cpu_online_map); - spin_unlock_irqrestore(&pool->last_cpu_lock, flags_last_cpu); + spin_lock_irqsave(&pool->last_cpu_lock, flags); + cpu = next_cpu(pool->last_cpu, cpu_online_map); + if (cpu == NR_CPUS) + cpu = first_cpu(cpu_online_map); + pool->last_cpu = cpu; + spin_unlock_irqrestore(&pool->last_cpu_lock, flags); - return pool->last_cpu; + return cpu; } static void __queue_comp_task(struct ehca_cq *__cq, struct ehca_cpu_comp_task *cct) { - unsigned long flags_cct; - unsigned long flags_cq; + unsigned long flags; - spin_lock_irqsave(&cct->task_lock, flags_cct); - spin_lock_irqsave(&__cq->task_lock, flags_cq); + spin_lock_irqsave(&cct->task_lock, flags); + spin_lock(&__cq->task_lock); if (__cq->nr_callbacks == 0) { __cq->nr_callbacks++; @@ -575,8 +577,8 @@ static void __queue_comp_task(struct ehc else __cq->nr_callbacks++; - spin_unlock_irqrestore(&__cq->task_lock, flags_cq); - spin_unlock_irqrestore(&cct->task_lock, flags_cct); + spin_unlock(&__cq->task_lock); + spin_unlock_irqrestore(&cct->task_lock, flags); } static void queue_comp_task(struct ehca_cq *__cq) @@ -587,69 +589,69 @@ static void queue_comp_task(struct ehca_ cpu = get_cpu(); cpu_id = find_next_online_cpu(pool); - BUG_ON(!cpu_online(cpu_id)); cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id); + BUG_ON(!cct); if (cct->cq_jobs > 0) { cpu_id = find_next_online_cpu(pool); cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id); + BUG_ON(!cct); } __queue_comp_task(__cq, cct); - - put_cpu(); - - return; } static void run_comp_task(struct ehca_cpu_comp_task* cct) { struct ehca_cq *cq; - unsigned long flags_cct; - unsigned long flags_cq; + unsigned long flags; - spin_lock_irqsave(&cct->task_lock, flags_cct); + spin_lock_irqsave(&cct->task_lock, flags); while (!list_empty(&cct->cq_list)) { cq = list_entry(cct->cq_list.next, struct ehca_cq, entry); - spin_unlock_irqrestore(&cct->task_lock, flags_cct); + spin_unlock_irqrestore(&cct->task_lock, flags); comp_event_callback(cq); - spin_lock_irqsave(&cct->task_lock, flags_cct); + spin_lock_irqsave(&cct->task_lock, flags); - spin_lock_irqsave(&cq->task_lock, flags_cq); + spin_lock(&cq->task_lock); cq->nr_callbacks--; if (cq->nr_callbacks == 0) { list_del_init(cct->cq_list.next); cct->cq_jobs--; } - spin_unlock_irqrestore(&cq->task_lock, flags_cq); - + spin_unlock(&cq->task_lock); } - spin_unlock_irqrestore(&cct->task_lock, flags_cct); - - return; + spin_unlock_irqrestore(&cct->task_lock, flags); } static int comp_task(void *__cct) { struct ehca_cpu_comp_task* cct = __cct; + int cql_empty; DECLARE_WAITQUEUE(wait, current); set_current_state(TASK_INTERRUPTIBLE); while(!kthread_should_stop()) { add_wait_queue(&cct->wait_queue, &wait); - if (list_empty(&cct->cq_list)) + spin_lock_irq(&cct->task_lock); + cql_empty = list_empty(&cct->cq_list); + spin_unlock_irq(&cct->task_lock); + if (cql_empty) schedule(); else __set_current_state(TASK_RUNNING); remove_wait_queue(&cct->wait_queue, &wait); - if (!list_empty(&cct->cq_list)) + spin_lock_irq(&cct->task_lock); + cql_empty = list_empty(&cct->cq_list); + spin_unlock_irq(&cct->task_lock); + if (!cql_empty) run_comp_task(__cct); set_current_state(TASK_INTERRUPTIBLE); @@ -692,8 +694,6 @@ static void destroy_comp_task(struct ehc if (task) kthread_stop(task); - - return; } static void take_over_work(struct ehca_comp_pool *pool, @@ -812,6 +812,4 @@ #ifdef CONFIG_INFINIBAND_EHCA_SCALING destroy_comp_task(pool, i); } #endif - - return; } From hnguyen at linux.vnet.ibm.com Fri Mar 2 00:30:33 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Fri, 2 Mar 2007 09:30:33 +0100 Subject: [ofa-general] [PATCH ofed-1.2-beta 3/5] ehca: allow en/disabling scaling code via module parameter Message-ID: <200703020930.33797.hnguyen@linux.vnet.ibm.com> allow users to en/disable scaling code when loading ib_ehca module Signed-off-by: Hoang-Nam Nguyen --- Kconfig | 8 -------- ehca_classes.h | 1 + ehca_irq.c | 49 +++++++++++++++++++++++-------------------------- ehca_main.c | 4 ++++ 4 files changed, 28 insertions(+), 34 deletions(-) diff --git a/drivers/infiniband/hw/ehca/Kconfig b/drivers/infiniband/hw/ehca/Kconfig index 727b10d..1a85459 100644 --- a/drivers/infiniband/hw/ehca/Kconfig +++ b/drivers/infiniband/hw/ehca/Kconfig @@ -7,11 +7,3 @@ config INFINIBAND_EHCA To compile the driver as a module, choose M here. The module will be called ib_ehca. -config INFINIBAND_EHCA_SCALING - bool "Scaling support (EXPERIMENTAL)" - depends on IBMEBUS && INFINIBAND_EHCA && HOTPLUG_CPU && EXPERIMENTAL - default y - ---help--- - eHCA scaling support schedules the CQ callbacks to different CPUs. - - To enable this feature choose Y here. diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h index acf6705..10c32a1 100644 --- a/drivers/infiniband/hw/ehca/ehca_classes.h +++ b/drivers/infiniband/hw/ehca/ehca_classes.h @@ -276,6 +276,7 @@ extern struct idr ehca_cq_idr; extern int ehca_static_rate; extern int ehca_port_act_time; extern int ehca_use_hp_mr; +extern int ehca_scaling_code; struct ipzu_queue_resp { u32 qe_size; /* queue entry size */ diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index fa76b71..9a1c32e 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -63,15 +63,11 @@ #define NEQE_PORT_AVAILABILITY EHCA_BMAS #define ERROR_DATA_LENGTH EHCA_BMASK_IBM(52,63) #define ERROR_DATA_TYPE EHCA_BMASK_IBM(0,7) -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - static void queue_comp_task(struct ehca_cq *__cq); static struct ehca_comp_pool* pool; static struct notifier_block comp_pool_callback_nb; -#endif - static inline void comp_event_callback(struct ehca_cq *cq) { if (!cq->ib_cq.comp_handler) @@ -422,13 +418,13 @@ static inline void process_eqe(struct eh return; } reset_eq_pending(cq); -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - queue_comp_task(cq); - spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); -#else - spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); - comp_event_callback(cq); -#endif + if (ehca_scaling_code) { + queue_comp_task(cq); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + } else { + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + comp_event_callback(cq); + } } else { ehca_dbg(&shca->ib_device, "... non completion event"); @@ -508,13 +504,14 @@ void ehca_process_eq(struct ehca_shca *s spin_unlock_irqrestore(&eq->spinlock, flags); /* call completion handler for cached eqes */ for (i = 0; i < eqe_cnt; i++) - if (eq->eqe_cache[i].cq) -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - queue_comp_task(eq->eqe_cache[i].cq); -#else - comp_event_callback(eq->eqe_cache[i].cq); -#endif - else { + if (eq->eqe_cache[i].cq) { + if (ehca_scaling_code) { + spin_lock(&ehca_cq_idr_lock); + queue_comp_task(eq->eqe_cache[i].cq); + spin_unlock(&ehca_cq_idr_lock); + } else + comp_event_callback(eq->eqe_cache[i].cq); + } else { ehca_dbg(&shca->ib_device, "got non completion event"); parse_identifier(shca, eq->eqe_cache[i].eqe->entry); } @@ -539,8 +536,6 @@ void ehca_tasklet_eq(unsigned long data) ehca_process_eq((struct ehca_shca*)data, 1); } -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - static inline int find_next_online_cpu(struct ehca_comp_pool* pool) { int cpu; @@ -763,14 +758,14 @@ static int comp_pool_callback(struct not return NOTIFY_OK; } -#endif - int ehca_create_comp_pool(void) { -#ifdef CONFIG_INFINIBAND_EHCA_SCALING int cpu; struct task_struct *task; + if (!ehca_scaling_code) + return 0; + pool = kzalloc(sizeof(struct ehca_comp_pool), GFP_KERNEL); if (pool == NULL) return -ENOMEM; @@ -795,21 +790,23 @@ #ifdef CONFIG_INFINIBAND_EHCA_SCALING comp_pool_callback_nb.notifier_call = comp_pool_callback; comp_pool_callback_nb.priority =0; register_cpu_notifier(&comp_pool_callback_nb); -#endif + + printk(KERN_INFO "eHCA scaling code enabled\n"); return 0; } void ehca_destroy_comp_pool(void) { -#ifdef CONFIG_INFINIBAND_EHCA_SCALING int i; + if (!ehca_scaling_code) + return; + unregister_cpu_notifier(&comp_pool_callback_nb); for (i = 0; i < NR_CPUS; i++) { if (cpu_online(i)) destroy_comp_task(pool, i); } -#endif } diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 2af5225..40cace0 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -62,6 +62,7 @@ int ehca_use_hp_mr = 0; int ehca_port_act_time = 30; int ehca_poll_all_eqs = 1; int ehca_static_rate = -1; +int ehca_scaling_code = 1; module_param_named(open_aqp1, ehca_open_aqp1, int, 0); module_param_named(debug_level, ehca_debug_level, int, 0); @@ -71,6 +72,7 @@ module_param_named(use_hp_mr, ehca_u module_param_named(port_act_time, ehca_port_act_time, int, 0); module_param_named(poll_all_eqs, ehca_poll_all_eqs, int, 0); module_param_named(static_rate, ehca_static_rate, int, 0); +module_param_named(scaling_code, ehca_scaling_code, int, 0); MODULE_PARM_DESC(open_aqp1, "AQP1 on startup (0: no (default), 1: yes)"); @@ -91,6 +93,8 @@ MODULE_PARM_DESC(poll_all_eqs, " (0: no, 1: yes (default))"); MODULE_PARM_DESC(static_rate, "set permanent static rate (default: disabled)"); +MODULE_PARM_DESC(scaling_code, + "set scaling code (0: disabled, 1: enabled/default)"); spinlock_t ehca_qp_idr_lock; spinlock_t ehca_cq_idr_lock; From hnguyen at linux.vnet.ibm.com Fri Mar 2 00:31:12 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Fri, 2 Mar 2007 09:31:12 +0100 Subject: [ofa-general] [PATCH ofed-1.2-beta 4/5] ehca: query_port() returns LINK_UP instead UNKNOWN Message-ID: <200703020931.12840.hnguyen@linux.vnet.ibm.com> set port phys state as a result of ehca_query_port() to LINK_UP. On pSeries ehca actually represents a logical HCA, whose phys/link state always is LINK_UP. Signed-off-by: Hoang-Nam Nguyen --- ehca_hca.c | 3 +++ 1 files changed, 3 insertions(+) diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c index b7be950..30eb45d 100644 --- a/drivers/infiniband/hw/ehca/ehca_hca.c +++ b/drivers/infiniband/hw/ehca/ehca_hca.c @@ -162,6 +162,9 @@ int ehca_query_port(struct ib_device *ib props->active_width = IB_WIDTH_12X; props->active_speed = 0x1; + /* at the moment (logical) link state is always LINK_UP */ + props->phys_state = 0x5; + query_port1: ehca_free_fw_ctrlblock(rblock); From hnguyen at linux.vnet.ibm.com Fri Mar 2 00:33:15 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Fri, 2 Mar 2007 09:33:15 +0100 Subject: [ofa-general] [PATCH ofed-1.2-beta 5/5] ehca: fix mismatched sync between completion handler and destroy cq Message-ID: <200703020933.15260.hnguyen@linux.vnet.ibm.com> This patch fixes two issues reported by Roland and Christoph H.: - Mismatched sync/locking between completion handler and destroy cq We introduced a counter nr_events per cq to track number of irq events seen. This counter is incremented when an event queue entry is seen and decremented after completion handler has been called regardless if scaling code is active or not. Note that nr_callbacks tracks number of events assigned to a cpu and both counters can potentially diverge. The sync between running completion handler and destroy cq is done by using the global spin lock ehca_cq_idr_lock. - Replace yield by wait_event on the counter above to become zero Signed-off-by: Hoang-Nam Nguyen --- ehca_classes.h | 4 ++- ehca_cq.c | 16 +++++++++++++-- ehca_irq.c | 59 +++++++++++++++++++++++++++++++++++++-------------------- ehca_main.c | 4 +-- 4 files changed, 58 insertions(+), 25 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h index 10c32a1..84976b3 100644 --- a/drivers/infiniband/hw/ehca/ehca_classes.h +++ b/drivers/infiniband/hw/ehca/ehca_classes.h @@ -152,7 +154,9 @@ struct ehca_cq { spinlock_t cb_lock; struct hlist_head qp_hashtab[QP_HASHTAB_LEN]; struct list_head entry; - u32 nr_callbacks; + u32 nr_callbacks; /* #events assigned to cpu by scaling code */ + u32 nr_events; /* #events seen */ + wait_queue_head_t wait_completion; spinlock_t task_lock; u32 ownpid; /* mmap counter for resources mapped into user space */ diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c index 9291a86..a0fdbda 100644 --- a/drivers/infiniband/hw/ehca/ehca_cq.c +++ b/drivers/infiniband/hw/ehca/ehca_cq.c @@ -147,6 +147,7 @@ struct ib_cq *ehca_create_cq(struct ib_d spin_lock_init(&my_cq->spinlock); spin_lock_init(&my_cq->cb_lock); spin_lock_init(&my_cq->task_lock); + init_waitqueue_head(&my_cq->wait_completion); my_cq->ownpid = current->tgid; cq = &my_cq->ib_cq; @@ -303,6 +304,16 @@ create_cq_exit1: return cq; } +static int get_cq_nr_events(struct ehca_cq *my_cq) +{ + int ret; + unsigned long flags; + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + ret = my_cq->nr_events; + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + return ret; +} + int ehca_destroy_cq(struct ib_cq *cq) { u64 h_ret; @@ -330,10 +341,11 @@ int ehca_destroy_cq(struct ib_cq *cq) } spin_lock_irqsave(&ehca_cq_idr_lock, flags); - while (my_cq->nr_callbacks) { + while (my_cq->nr_events) { spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); - yield(); + wait_event(my_cq->wait_completion, !get_cq_nr_events(my_cq)); spin_lock_irqsave(&ehca_cq_idr_lock, flags); + /* recheck nr_events to assure no cqe has just arrived */ } idr_remove(&ehca_cq_idr, my_cq->token); diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index 9a1c32e..1ce0e9b 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -403,10 +403,11 @@ static inline void process_eqe(struct eh u32 token; unsigned long flags; struct ehca_cq *cq; + eqe_value = eqe->entry; ehca_dbg(&shca->ib_device, "eqe_value=%lx", eqe_value); if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) { - ehca_dbg(&shca->ib_device, "... completion event"); + ehca_dbg(&shca->ib_device, "Got completion event"); token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value); spin_lock_irqsave(&ehca_cq_idr_lock, flags); cq = idr_find(&ehca_cq_idr, token); @@ -418,16 +419,20 @@ static inline void process_eqe(struct eh return; } reset_eq_pending(cq); - if (ehca_scaling_code) { + cq->nr_events++; + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + if (ehca_scaling_code) queue_comp_task(cq); - spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); - } else { - spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + else { comp_event_callback(cq); + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + cq->nr_events--; + if (!cq->nr_events) + wake_up(&cq->wait_completion); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); } } else { - ehca_dbg(&shca->ib_device, - "... non completion event"); + ehca_dbg(&shca->ib_device, "Got non completion event"); parse_identifier(shca, eqe_value); } } @@ -479,6 +484,7 @@ void ehca_process_eq(struct ehca_shca *s "token=%x", token); continue; } + eqe_cache[eqe_cnt].cq->nr_events++; spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); } else eqe_cache[eqe_cnt].cq = NULL; @@ -505,12 +511,18 @@ void ehca_process_eq(struct ehca_shca *s /* call completion handler for cached eqes */ for (i = 0; i < eqe_cnt; i++) if (eq->eqe_cache[i].cq) { - if (ehca_scaling_code) { - spin_lock(&ehca_cq_idr_lock); + if (ehca_scaling_code) queue_comp_task(eq->eqe_cache[i].cq); - spin_unlock(&ehca_cq_idr_lock); - } else - comp_event_callback(eq->eqe_cache[i].cq); + else { + struct ehca_cq *cq = eq->eqe_cache[i].cq; + comp_event_callback(cq); + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + cq->nr_events--; + if (!cq->nr_events) + wake_up(&cq->wait_completion); + spin_unlock_irqrestore(&ehca_cq_idr_lock, + flags); + } } else { ehca_dbg(&shca->ib_device, "got non completion event"); parse_identifier(shca, eq->eqe_cache[i].eqe->entry); @@ -524,7 +536,6 @@ void ehca_process_eq(struct ehca_shca *s if (!eqe) break; process_eqe(shca, eqe); - eqe_cnt++; } while (1); unlock_irq_spinlock: @@ -568,8 +579,7 @@ static void __queue_comp_task(struct ehc list_add_tail(&__cq->entry, &cct->cq_list); cct->cq_jobs++; wake_up(&cct->wait_queue); - } - else + } else __cq->nr_callbacks++; spin_unlock(&__cq->task_lock); @@ -578,18 +588,21 @@ static void __queue_comp_task(struct ehc static void queue_comp_task(struct ehca_cq *__cq) { - int cpu; int cpu_id; struct ehca_cpu_comp_task *cct; + int cq_jobs; + unsigned long flags; - cpu = get_cpu(); cpu_id = find_next_online_cpu(pool); BUG_ON(!cpu_online(cpu_id)); cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id); BUG_ON(!cct); - if (cct->cq_jobs > 0) { + spin_lock_irqsave(&cct->task_lock, flags); + cq_jobs = cct->cq_jobs; + spin_unlock_irqrestore(&cct->task_lock, flags); + if (cq_jobs > 0) { cpu_id = find_next_online_cpu(pool); cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id); BUG_ON(!cct); @@ -609,11 +622,17 @@ static void run_comp_task(struct ehca_cp cq = list_entry(cct->cq_list.next, struct ehca_cq, entry); spin_unlock_irqrestore(&cct->task_lock, flags); comp_event_callback(cq); - spin_lock_irqsave(&cct->task_lock, flags); + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + cq->nr_events--; + if (!cq->nr_events) + wake_up(&cq->wait_completion); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + + spin_lock_irqsave(&cct->task_lock, flags); spin_lock(&cq->task_lock); cq->nr_callbacks--; - if (cq->nr_callbacks == 0) { + if (!cq->nr_callbacks) { list_del_init(cct->cq_list.next); cct->cq_jobs--; } diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 40cace0..6bac15d 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -52,7 +52,7 @@ #include "hcp_if.h" MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); -MODULE_VERSION("SVNEHCA_0021"); +MODULE_VERSION("SVNEHCA_0022"); int ehca_open_aqp1 = 0; int ehca_debug_level = 0; @@ -810,7 +810,7 @@ int __init ehca_module_init(void) int ret; printk(KERN_INFO "eHCA Infiniband Device Driver " - "(Rel.: SVNEHCA_0021)\n"); + "(Rel.: SVNEHCA_0022)\n"); idr_init(&ehca_qp_idr); idr_init(&ehca_cq_idr); spin_lock_init(&ehca_qp_idr_lock); From hnguyen at linux.vnet.ibm.com Fri Mar 2 00:37:46 2007 From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen) Date: Fri, 2 Mar 2007 09:37:46 +0100 Subject: [ofa-general] Re: [PATCH 2.6.21-rc2] ehca: fix mismatched sync between completion handler and destroy cq In-Reply-To: References: <200702281801.02747.hnguyen@linux.vnet.ibm.com> Message-ID: <200703020937.47012.hnguyen@linux.vnet.ibm.com> > > +#include > This can just be , because you're only using > wait_queue_head_t and not struct completion, right? > I fixed this up before merging. Yes, right. Thanks for your help! Regards Nam From vlad at lists.openfabrics.org Fri Mar 2 02:14:46 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Fri, 2 Mar 2007 02:14:46 -0800 (PST) Subject: [ofa-general] ofa_1_2_kernel 20070302-0200 daily build status Message-ID: <20070302101447.4F1AAE60898@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.20 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.17 Passed on powerpc with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on powerpc with linux-2.6.13 Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.16.21-0.8-default Failed: From HNGUYEN at de.ibm.com Fri Mar 2 02:38:13 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Fri, 2 Mar 2007 11:38:13 +0100 Subject: [ofa-general] ofed_1_2 git echa problems In-Reply-To: <20070302025248.GF27026@narn.hozed.org> Message-ID: Hi Troy! > I cloned vlad's ofed_1_2 git libibverbs and libehca trees yesterday, and > 32 bit builds with echa segfault, and 64 bit builds get me this: > p5l7:/usr/src/netpipe3-dev# ibv_rc_pingpong p5l9 > ctx: 0x10016e60 > ibv_create_cq( 0x10016f40, 501, NULL, 0x0, 0) > PID185b EHCA_ERR:write_rwqe Invalid number of WQE SGE. num_sqe=1 > max_nr_of_sg=0 > PID185b ehca0 EHCA_ERR:ehcau_post_recv Could not write WQE qp_num=c > Couldn't post receive (0) This appears to be a fundamental problem to me, because max_nr_of_sg=0. In ofed-1.2-alpha we removed do_mmap() from kernel space, which also affects user space in that both kernel and user space components have to be from the same code stream. Have you remove previous execs and libs from ofed-1.1.1 using ofed's uninstall.sh (located in /usr/local/ofed)? Did you download the daily build package from http://www.openfabrics.org/builds/ofed-1.2/ or from Vladimir's git? Did you build ofed-1.2 using the delivered install.sh script? I tested OFED-1.2-20070301-0600.tgz this morning on sles10 and did not encounter this problem. Regards Nam From HNGUYEN at de.ibm.com Fri Mar 2 02:43:59 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Fri, 2 Mar 2007 11:43:59 +0100 Subject: [ofa-general] ofed_1_2 git echa problems In-Reply-To: Message-ID: > > I cloned vlad's ofed_1_2 git libibverbs and libehca trees yesterday, and > > 32 bit builds with echa segfault, and 64 bit builds get me this: > Did you download the daily build package from > http://www.openfabrics.org/builds/ofed-1.2/ or from Vladimir's git? Omm, you told that above. Please download the daily build package e.g. from today: http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-20070301-0600.tgz Unpack it and call install.sh. Regards Nam From dotanb at dev.mellanox.co.il Fri Mar 2 03:12:44 2007 From: dotanb at dev.mellanox.co.il (dotanb at dev.mellanox.co.il) Date: Fri, 2 Mar 2007 13:12:44 +0200 (IST) Subject: [ofa-general] [PATCH, RFC] libibverbs: Add hooks for rereg_mr, memory windows In-Reply-To: References: Message-ID: <1334.85.65.223.188.1172833964.squirrel@dev.mellanox.co.il> i have 2 question about this email (and a request): Roland Dreier wrote: > +enum ibv_rereg_mr_flags { > + IBV_REREG_MR_CHANGE_TRANSLATION = (1 << 0), > + IBV_REREG_MR_CHANGE_PD = (1 << 1), > + IBV_REREG_MR_CHANGE_ACCESS = (1 << 2), > + IBV_REREG_MR_KEEP_VALID = (1 << 3) > +}; > 1) Why do you need the last enumeration (IBV_REREG_MR_KEEP_VALID)? > +struct ibv_mw_bind { > + struct ibv_mr *mr; > + uint64_t wr_id; > + uint64_t addr; > + uint64_t length; > + enum ibv_send_flags send_flags; > + enum ibv_access_flags mw_access_flags; > +}; > 2) I suggest the following structure for the ibv_mw_bind: struct ibv_mw_bind { uint64_t wr_id; uint64_t addr; uint64_t length; struct ibv_mr *mr; enum ibv_send_flags send_flags; enum ibv_access_flags mw_access_flags; }; to be optimal on 32 bit arch as well. 3) in the past we had a mail thread about destroying completion event channel: to return BUSY if one is trying to destroy a channel that being used by CQs. in order to do it i will need to do this check in kernel level: can you add a function to this issue in the uverbs (or do you want me to send you a patch to this issue). if we are changing the ABI, i would like to add this issue as well. thanks Dotan From mst at mellanox.co.il Fri Mar 2 03:16:58 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 2 Mar 2007 13:16:58 +0200 Subject: [ofa-general] [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy In-Reply-To: References: <20070301144139.GK14282@mellanox.co.il> Message-ID: <20070302111658.GE27542@mellanox.co.il> In QP destroy, reset QP before removing it from QP table: otherwise we get bogus QP/unknown QP warnings (and theoretically, crash, if the same slot is reused with the same QPN). This fixes openfabrics bugzilla 394. Signed-off-by: Michael S. Tsirkin --- > I'm not quite sure I understand why we have to synchronize against the > completion EQ's interrupt here. Hmm, I'm not sure myself, now. I'm still thinking about this - the patch below is clearly correct and seems sufficient to fix the issue pointed out by bugzilla. So let's get it merged and I'll try to think about and address other isses (if any) in a separate patch. Please queue for 2.6.21. Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_qp.c =================================================================== --- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_qp.c +++ linux-2.6/drivers/infiniband/hw/mthca/mthca_qp.c @@ -1393,6 +1393,10 @@ void mthca_free_qp(struct mthca_dev *dev send_cq = to_mcq(qp->ibqp.send_cq); recv_cq = to_mcq(qp->ibqp.recv_cq); + if (qp->state != IB_QPS_RESET) + mthca_MODIFY_QP(dev, qp->state, IB_QPS_RESET, qp->qpn, 0, + NULL, 0, &status); + /* * Lock CQs here, so that CQ polling code can do QP lookup * without taking a lock. @@ -1409,10 +1413,6 @@ void mthca_free_qp(struct mthca_dev *dev wait_event(qp->wait, !get_qp_refcount(dev, qp)); - if (qp->state != IB_QPS_RESET) - mthca_MODIFY_QP(dev, qp->state, IB_QPS_RESET, qp->qpn, 0, - NULL, 0, &status); - /* * If this is a userspace QP, the buffers, MR, CQs and so on * will be cleaned up in userspace, so all we have to do is -- MST From swise at opengridcomputing.com Fri Mar 2 06:17:25 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 08:17:25 -0600 Subject: [ofa-general] librdmacm build failure In-Reply-To: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com> References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com> Message-ID: <1172845045.21241.0.camel@stevo-desktop> On Thu, 2007-03-01 at 16:02 -0800, Sean Hefty wrote: > Can you try this patch and see if it works for you? works...thanks. > --- > diff --git a/Makefile.am b/Makefile.am > index 57dc0b3..2eb95c6 100644 > --- a/Makefile.am > +++ b/Makefile.am > @@ -6,7 +6,11 @@ AM_CFLAGS = -g -Wall -D_GNU_SOURCE > > src_librdmacm_la_CFLAGS = $(AM_CFLAGS) > > -librdmacm_version_script = @LIBRDMACM_VERSION_SCRIPT@ > +if HAVE_LD_VERSION_SCRIPT > + librdmacm_version_script = -Wl,--version-script=$(srcdir)/src/librdmacm.map > +else > + librdmacm_version_script = > +endif > > src_librdmacm_la_SOURCES = src/cma.c > src_librdmacm_la_LDFLAGS = -version-info 1 -export-dynamic \ > From swise at opengridcomputing.com Fri Mar 2 06:23:58 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 08:23:58 -0600 Subject: [ofa-general] rdma_cm issues in 2.6.21-rc1 In-Reply-To: <000001c75c59$86b852e0$ff0da8c0@amr.corp.intel.com> References: <000001c75c59$86b852e0$ff0da8c0@amr.corp.intel.com> Message-ID: <1172845438.21241.5.camel@stevo-desktop> On Thu, 2007-03-01 at 15:29 -0800, Sean Hefty wrote: > As just a note, I'm investigating two issues with the rdma_cm and 2.6.21-rc1. > > Running ucmatose twice results in a failure binding to an address the second > time that it's run. > > I'm also seeing a kernel crash if ucmatose is killed while waiting for a > connection. Sean, dunno if this is the same issue, but I'm running 2.6.21-rc2 + master branches of librdmacm, libibverbs, libmthca, and libcxgb3. Trying to rping results in this on the client: [57452.461045] rping: Corrupted page table at address 2aaaaaaad068 [57452.461122] PGD 76e85067 PUD 72e5b067 PMD 63e55067 PTE 6b6b6b6b6b6b6027 [57452.461382] Bad pagetable: 000d [2] SMP [57452.461544] CPU 1 [57452.461655] Modules linked in: nfs lockd nfs_acl sunrpc rdma_krping rdma_ucm rdma_cm ib_cm iw_cm ib_addr ib_uverbs ib_umad ib_ipoib ib_sa ib_mthca ib_mad iw_cxgb3 cxgb3 ib_core ipv6 af_packet button battery ac loop dm_mod e1000 parport_pc lp parport reiserfs edd fan thermal processor sg aic79xx scsi_transport_spi ata_piix libata piix sd_mod scsi_mod ide_disk ide_core [57452.463974] Pid: 21376, comm: rping Not tainted 2.6.21-rc2 #1 [57452.464046] RIP: 0033:[<00002b3ea945abba>] [<00002b3ea945abba>] [57452.464167] RSP: 002b:00007fff01dcaa50 EFLAGS: 00010246 [57452.464239] RAX: 00002aaaaaaad000 RBX: 00000000005093b0 RCX: 0000000000000000 [57452.464314] RDX: 0000000000000009 RSI: 00007fff01dcabb0 RDI: 00000000005094b0 [57452.464390] RBP: 00000000005094b0 R08: 0000000000000300 R09: 00002b3ea8df9c00 [57452.464465] R10: 0000000000000000 R11: 00002b3ea8f081f0 R12: 0000000000000009 [57452.464540] R13: 00007fff01dcabb0 R14: 0000000000505190 R15: 0000000000000000 [57452.464615] FS: 00002b3ea9456c50(0000) GS:ffff8100013943d8(0000) knlGS:0000000000000000 [57452.464705] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [57452.464778] CR2: 00002aaaaaaad068 CR3: 000000006344e000 CR4: 00000000000006e0 [57452.464855] Process rping (pid: 21376, threadinfo ffff81006c9d8000, task ffff8100721fc8a0) [57452.464946] [57452.465009] RIP [<00002b3ea945abba>] [57452.465124] RSP <00007fff01dcaa50> vic14:~ From swise at opengridcomputing.com Fri Mar 2 06:27:39 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 08:27:39 -0600 Subject: [ofa-general] rdma_cm issues in 2.6.21-rc1 In-Reply-To: <1172845438.21241.5.camel@stevo-desktop> References: <000001c75c59$86b852e0$ff0da8c0@amr.corp.intel.com> <1172845438.21241.5.camel@stevo-desktop> Message-ID: <1172845659.21241.7.camel@stevo-desktop> BTW: rping works over mthca/IB. The crash below is over chelsio/IW. I'm investigating... On Fri, 2007-03-02 at 08:23 -0600, Steve Wise wrote: > On Thu, 2007-03-01 at 15:29 -0800, Sean Hefty wrote: > > As just a note, I'm investigating two issues with the rdma_cm and 2.6.21-rc1. > > > > Running ucmatose twice results in a failure binding to an address the second > > time that it's run. > > > > I'm also seeing a kernel crash if ucmatose is killed while waiting for a > > connection. > > Sean, > > dunno if this is the same issue, but I'm running 2.6.21-rc2 + master > branches of librdmacm, libibverbs, libmthca, and libcxgb3. > > > Trying to rping results in this on the client: > > [57452.461045] rping: Corrupted page table at address 2aaaaaaad068 > [57452.461122] PGD 76e85067 PUD 72e5b067 PMD 63e55067 PTE 6b6b6b6b6b6b6027 > [57452.461382] Bad pagetable: 000d [2] SMP > [57452.461544] CPU 1 > [57452.461655] Modules linked in: nfs lockd nfs_acl sunrpc rdma_krping rdma_ucm rdma_cm ib_cm iw_cm ib_addr ib_uverbs ib_umad ib_ipoib ib_sa ib_mthca ib_mad iw_cxgb3 cxgb3 ib_core ipv6 af_packet button battery ac loop dm_mod e1000 parport_pc lp parport reiserfs edd fan thermal processor sg aic79xx scsi_transport_spi ata_piix libata piix sd_mod scsi_mod ide_disk ide_core > [57452.463974] Pid: 21376, comm: rping Not tainted 2.6.21-rc2 #1 > [57452.464046] RIP: 0033:[<00002b3ea945abba>] [<00002b3ea945abba>] > [57452.464167] RSP: 002b:00007fff01dcaa50 EFLAGS: 00010246 > [57452.464239] RAX: 00002aaaaaaad000 RBX: 00000000005093b0 RCX: 0000000000000000 > [57452.464314] RDX: 0000000000000009 RSI: 00007fff01dcabb0 RDI: 00000000005094b0 > [57452.464390] RBP: 00000000005094b0 R08: 0000000000000300 R09: 00002b3ea8df9c00 > [57452.464465] R10: 0000000000000000 R11: 00002b3ea8f081f0 R12: 0000000000000009 > [57452.464540] R13: 00007fff01dcabb0 R14: 0000000000505190 R15: 0000000000000000 > [57452.464615] FS: 00002b3ea9456c50(0000) GS:ffff8100013943d8(0000) knlGS:0000000000000000 > [57452.464705] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [57452.464778] CR2: 00002aaaaaaad068 CR3: 000000006344e000 CR4: 00000000000006e0 > [57452.464855] Process rping (pid: 21376, threadinfo ffff81006c9d8000, task ffff8100721fc8a0) > [57452.464946] > [57452.465009] RIP [<00002b3ea945abba>] > [57452.465124] RSP <00007fff01dcaa50> > vic14:~ > > > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From afriedle at open-mpi.org Fri Mar 2 07:47:02 2007 From: afriedle at open-mpi.org (Andrew Friedley) Date: Fri, 02 Mar 2007 10:47:02 -0500 Subject: [ofa-general] build failure on nightly tarball -- bonding Message-ID: <45E846F6.7070705@open-mpi.org> The chelsio build errors from yesterday appear to be gone, though now I'm seeing errors building the IB bonding code with the 3/2 alpha tarball -- error below. I'm wondering, is there a way to selectively avoid building things like this that seem to be optional, as a tarball user? Andrew In file included from /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:78: /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h: In function `bond_set_slave_inactive_flags': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:260: error: `IFF_SLAVE_INACTIVE' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:260: error: (Each undeclared identifier is reported only once /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:260: error: for each function it appears in.) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:262: error: `IFF_SLAVE_NEEDARP' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h: In function `bond_set_slave_active_flags': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:268: error: `IFF_SLAVE_INACTIVE' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:268: error: `IFF_SLAVE_NEEDARP' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h: In function `bond_set_master_3ad_flags': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:273: error: `IFF_MASTER_8023AD' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h: In function `bond_unset_master_3ad_flags': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:278: error: `IFF_MASTER_8023AD' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h: In function `bond_set_master_alb_flags': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:283: error: `IFF_MASTER_ALB' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h: In function `bond_unset_master_alb_flags': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:288: error: `IFF_MASTER_ALB' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: At top level: /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:129: error: invalid lvalue in unary `&' /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:129: error: initializer element is not constant /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:129: error: (near initialization for `__param_arr_arp_ip_target.num') /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:149: error: `BOND_XMIT_POLICY_LAYER2' undeclared here (not in a function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:171: error: initializer element is not constant /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:171: error: (near initialization for `xmit_hashtype_tbl[0].mode') /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:171: error: initializer element is not constant /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:171: error: (near initialization for `xmit_hashtype_tbl[0]') /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:172: error: `BOND_XMIT_POLICY_LAYER34' undeclared here (not in a function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:172: error: initializer element is not constant /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:172: error: (near initialization for `xmit_hashtype_tbl[1].mode') /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:172: error: initializer element is not constant /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:172: error: (near initialization for `xmit_hashtype_tbl[1]') /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:173: error: initializer element is not constant /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:173: error: (near initialization for `xmit_hashtype_tbl[2]') /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: In function `bond_compute_features': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1230: error: `NETIF_F_ALL_CSUM' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1230: error: `NETIF_F_UFO' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: In function `bond_enslave': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1428: warning: implicit declaration of function `dev_set_mac_address' /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1449: error: `IFF_BONDING' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: In function `bond_release': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1847: error: `IFF_MASTER_8023AD' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1847: error: `IFF_MASTER_ALB' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1848: error: `IFF_SLAVE_INACTIVE' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1848: error: `IFF_BONDING' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1849: error: `IFF_SLAVE_NEEDARP' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: In function `bond_release_all': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1939: error: `IFF_MASTER_8023AD' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1939: error: `IFF_MASTER_ALB' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1940: error: `IFF_SLAVE_INACTIVE' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: In function `bond_glean_dev_ip': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:2331: warning: implicit declaration of function `__in_dev_get_rcu' /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:2331: warning: assignment makes pointer from integer without a cast /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: In function `bond_arp_rcv': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:2548: error: `IFF_BONDING' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: In function `bond_slave_netdev_event': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:3364: error: `NETDEV_FEAT_CHANGE' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: In function `bond_netdev_event': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:3390: error: `IFF_BONDING' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: In function `bond_register_lacpdu': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:3477: warning: assignment from incompatible pointer type /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: In function `bond_register_arp': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:3494: warning: assignment from incompatible pointer type /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: At top level: /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4340: error: unknown field `get_ufo' specified in initializer /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4340: error: `ethtool_op_get_ufo' undeclared here (not in a function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4340: error: initializer element is not constant /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4340: error: (near initialization for `bond_ethtool_ops.set_tso') /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: In function `bond_init': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4374: warning: assignment discards qualifiers from pointer target type /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4386: error: `IFF_BONDING' undeclared (first use in this function) /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: In function `bond_create': /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4812: warning: implicit declaration of function `lockdep_set_class' /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4812: error: structure has no member named `_xmit_lock' /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: At top level: /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4774: error: storage size of `bonding_netdev_xmit_lock_key' isn't known make[1]: *** [/var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.o] Error 1 From afriedle at indiana.edu Fri Mar 2 07:54:33 2007 From: afriedle at indiana.edu (Andrew Friedley) Date: Fri, 02 Mar 2007 10:54:33 -0500 Subject: [ofa-general] build failure on nightly tarball -- bonding In-Reply-To: <45E846F6.7070705@open-mpi.org> References: <45E846F6.7070705@open-mpi.org> Message-ID: <45E848B9.3010007@indiana.edu> Sorry, should have mentioned this is on RHEL4U3 i686. Andrew Andrew Friedley wrote: > The chelsio build errors from yesterday appear to be gone, though now > I'm seeing errors building the IB bonding code with the 3/2 alpha > tarball -- error below. I'm wondering, is there a way to selectively > avoid building things like this that seem to be optional, as a tarball > user? From changquing.tang at hp.com Fri Mar 2 08:32:58 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Fri, 2 Mar 2007 16:32:58 -0000 Subject: [ofa-general] Is ibv_get_async_event() a blocking call ? In-Reply-To: <1172845045.21241.0.camel@stevo-desktop> References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com> <1172845045.21241.0.camel@stevo-desktop> Message-ID: <349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net> HI, I did not realize that ibv_get_async_event() is a blocking call, it forces me to call it in another thread. But if I don't want to use thread in my application, how do I use this function ? Thanks. --CQ > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Steve Wise > Sent: Friday, March 02, 2007 8:17 AM > To: Sean Hefty > Cc: General at lists.openfabrics.org > Subject: RE: [ofa-general] librdmacm build failure > > On Thu, 2007-03-01 at 16:02 -0800, Sean Hefty wrote: > > Can you try this patch and see if it works for you? > > works...thanks. > > > > > --- > > diff --git a/Makefile.am b/Makefile.am index 57dc0b3..2eb95c6 100644 > > --- a/Makefile.am > > +++ b/Makefile.am > > @@ -6,7 +6,11 @@ AM_CFLAGS = -g -Wall -D_GNU_SOURCE > > > > src_librdmacm_la_CFLAGS = $(AM_CFLAGS) > > > > -librdmacm_version_script = @LIBRDMACM_VERSION_SCRIPT@ > > +if HAVE_LD_VERSION_SCRIPT > > + librdmacm_version_script = > > +-Wl,--version-script=$(srcdir)/src/librdmacm.map > > +else > > + librdmacm_version_script = > > +endif > > > > src_librdmacm_la_SOURCES = src/cma.c > > src_librdmacm_la_LDFLAGS = -version-info 1 -export-dynamic \ > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Fri Mar 2 08:56:25 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 10:56:25 -0600 Subject: [ofa-general] Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net> References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com> <1172845045.21241.0.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net> Message-ID: <1172854585.21241.14.camel@stevo-desktop> On Fri, 2007-03-02 at 16:32 +0000, Tang, Changqing wrote: > > HI, > I did not realize that ibv_get_async_event() is a blocking call, > it forces me to call it in another thread. But if I don't want to use > thread in my application, how do I use this function ? > > Thanks. > > --CQ You can select() or poll() on the async file descriptor for the QP. Then you only call ibv_get_async_event() when poll/select indicates there is something to read. > > > > -----Original Message----- > > From: general-bounces at lists.openfabrics.org > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Steve Wise > > Sent: Friday, March 02, 2007 8:17 AM > > To: Sean Hefty > > Cc: General at lists.openfabrics.org > > Subject: RE: [ofa-general] librdmacm build failure > > > > On Thu, 2007-03-01 at 16:02 -0800, Sean Hefty wrote: > > > Can you try this patch and see if it works for you? > > > > works...thanks. > > > > > > > > > --- > > > diff --git a/Makefile.am b/Makefile.am index 57dc0b3..2eb95c6 100644 > > > --- a/Makefile.am > > > +++ b/Makefile.am > > > @@ -6,7 +6,11 @@ AM_CFLAGS = -g -Wall -D_GNU_SOURCE > > > > > > src_librdmacm_la_CFLAGS = $(AM_CFLAGS) > > > > > > -librdmacm_version_script = @LIBRDMACM_VERSION_SCRIPT@ > > > +if HAVE_LD_VERSION_SCRIPT > > > + librdmacm_version_script = > > > +-Wl,--version-script=$(srcdir)/src/librdmacm.map > > > +else > > > + librdmacm_version_script = > > > +endif > > > > > > src_librdmacm_la_SOURCES = src/cma.c > > > src_librdmacm_la_LDFLAGS = -version-info 1 -export-dynamic \ > > > > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > From swise at opengridcomputing.com Fri Mar 2 09:01:13 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 11:01:13 -0600 Subject: [ofa-general] Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: <1172854585.21241.14.camel@stevo-desktop> References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com> <1172845045.21241.0.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net> <1172854585.21241.14.camel@stevo-desktop> Message-ID: <1172854873.21241.19.camel@stevo-desktop> On Fri, 2007-03-02 at 10:56 -0600, Steve Wise wrote: > On Fri, 2007-03-02 at 16:32 +0000, Tang, Changqing wrote: > > > > HI, > > I did not realize that ibv_get_async_event() is a blocking call, > > it forces me to call it in another thread. But if I don't want to use > > thread in my application, how do I use this function ? > > > > Thanks. > > > > --CQ > > You can select() or poll() on the async file descriptor for the QP. > > Then you only call ibv_get_async_event() when poll/select indicates > there is something to read. Something like this: struct my_cxt { ... struct ibv_context *context; struct rdma_event_channel *rch; ... }; fds[0].fd = ctx->rch->fd; fds[0].events = POLLIN|POLLERR; fds[1].fd = ctx->context->async_fd; fds[1].events = POLLHUP|POLLNVAL|POLLPRI|POLLOUT|POLLIN|POLLERR; if (poll(fds, 2, -1) == -1) { perror("poll"); exit(1); } if (fds[0].revents) { struct rdma_cm_event *event; rdma_get_cm_event(ctx->rch, &event); printf("RDMA CM EVENT %d - %s!\n", event->event, rdma_str_event(event)); rdma_ack_cm_event(event); ... } if (fds[1].revents) { struct ibv_async_event event; ibv_get_async_event(ctx->context, &event); printf("ASYNC EVENT %d - %s!\n", event.event_type, ibv_str_async_event(&event)); ibv_ack_async_event(&event); ... } From changquing.tang at hp.com Fri Mar 2 09:07:47 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Fri, 2 Mar 2007 17:07:47 -0000 Subject: [ofa-general] Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: <1172854873.21241.19.camel@stevo-desktop> References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com> <1172845045.21241.0.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net> <1172854585.21241.14.camel@stevo-desktop> <1172854873.21241.19.camel@stevo-desktop> Message-ID: <349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net> Thank you very much. I wonder if libibverbs can do this way for application and make ibv_get_async_event() non-blocking. But I will try this way now. --CQ > -----Original Message----- > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Friday, March 02, 2007 11:01 AM > To: Tang, Changqing > Cc: General at lists.openfabrics.org > Subject: Re: [ofa-general] Re: Is ibv_get_async_event() a > blocking call ? > > On Fri, 2007-03-02 at 10:56 -0600, Steve Wise wrote: > > On Fri, 2007-03-02 at 16:32 +0000, Tang, Changqing wrote: > > > > > > HI, > > > I did not realize that ibv_get_async_event() is a > blocking call, it > > > forces me to call it in another thread. But if I don't > want to use > > > thread in my application, how do I use this function ? > > > > > > Thanks. > > > > > > --CQ > > > > You can select() or poll() on the async file descriptor for the QP. > > > > Then you only call ibv_get_async_event() when poll/select indicates > > there is something to read. > > Something like this: > > > struct my_cxt { > ... > struct ibv_context *context; > struct rdma_event_channel *rch; > ... > }; > > > > fds[0].fd = ctx->rch->fd; > fds[0].events = POLLIN|POLLERR; > fds[1].fd = ctx->context->async_fd; > fds[1].events = > POLLHUP|POLLNVAL|POLLPRI|POLLOUT|POLLIN|POLLERR; > if (poll(fds, 2, -1) == -1) { > perror("poll"); > exit(1); > } > if (fds[0].revents) { > struct rdma_cm_event *event; > > rdma_get_cm_event(ctx->rch, &event); > printf("RDMA CM EVENT %d - %s!\n", > event->event, rdma_str_event(event)); > rdma_ack_cm_event(event); > ... > } > if (fds[1].revents) { > struct ibv_async_event event; > > ibv_get_async_event(ctx->context, &event); > printf("ASYNC EVENT %d - %s!\n", > event.event_type, ibv_str_async_event(&event)); > ibv_ack_async_event(&event); > ... > } > > > From swise at opengridcomputing.com Fri Mar 2 09:22:34 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 11:22:34 -0600 Subject: [ofa-general] Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net> References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com> <1172845045.21241.0.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net> <1172854585.21241.14.camel@stevo-desktop> <1172854873.21241.19.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net> Message-ID: <1172856154.21241.34.camel@stevo-desktop> On Fri, 2007-03-02 at 17:07 +0000, Tang, Changqing wrote: > Thank you very much. > > I wonder if libibverbs can do this way for application and make > ibv_get_async_event() non-blocking. But I will try this way now. > I wonder what happens if you set the async file descriptor to non-blocking? Roland? Would that return EWOULDBLOCK if there are no events? > --CQ > > > > > -----Original Message----- > > From: Steve Wise [mailto:swise at opengridcomputing.com] > > Sent: Friday, March 02, 2007 11:01 AM > > To: Tang, Changqing > > Cc: General at lists.openfabrics.org > > Subject: Re: [ofa-general] Re: Is ibv_get_async_event() a > > blocking call ? > > > > On Fri, 2007-03-02 at 10:56 -0600, Steve Wise wrote: > > > On Fri, 2007-03-02 at 16:32 +0000, Tang, Changqing wrote: > > > > > > > > HI, > > > > I did not realize that ibv_get_async_event() is a > > blocking call, it > > > > forces me to call it in another thread. But if I don't > > want to use > > > > thread in my application, how do I use this function ? > > > > > > > > Thanks. > > > > > > > > --CQ > > > > > > You can select() or poll() on the async file descriptor for the QP. > > > > > > Then you only call ibv_get_async_event() when poll/select indicates > > > there is something to read. > > > > Something like this: > > > > > > struct my_cxt { > > ... > > struct ibv_context *context; > > struct rdma_event_channel *rch; > > ... > > }; > > > > > > > > fds[0].fd = ctx->rch->fd; > > fds[0].events = POLLIN|POLLERR; > > fds[1].fd = ctx->context->async_fd; > > fds[1].events = > > POLLHUP|POLLNVAL|POLLPRI|POLLOUT|POLLIN|POLLERR; > > if (poll(fds, 2, -1) == -1) { > > perror("poll"); > > exit(1); > > } > > if (fds[0].revents) { > > struct rdma_cm_event *event; > > > > rdma_get_cm_event(ctx->rch, &event); > > printf("RDMA CM EVENT %d - %s!\n", > > event->event, rdma_str_event(event)); > > rdma_ack_cm_event(event); > > ... > > } > > if (fds[1].revents) { > > struct ibv_async_event event; > > > > ibv_get_async_event(ctx->context, &event); > > printf("ASYNC EVENT %d - %s!\n", > > event.event_type, ibv_str_async_event(&event)); > > ibv_ack_async_event(&event); > > ... > > } > > > > > > From Sujal at Mellanox.com Fri Mar 2 09:26:04 2007 From: Sujal at Mellanox.com (Sujal Das) Date: Fri, 2 Mar 2007 09:26:04 -0800 Subject: [ofa-general] OFED 1.x (Gen 2) based SRP target code released! Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F6F91AB@mtiexch01.mti.com> Hello all, Mellanox is pleased to release the OFED 1.x (Gen 2) - based SRP Target source code to the OpenFabrics community, OEMs and end users. This release is an upgrade to the previously released SRP Target source code that was based on the Mellanox IBGold driver and Gen 1 software interface. The code has been tested to work with Mellanox InfiniBand adapters and is available under Open Fabrics open source license terms. The attached readme document has further details on the release. With the release of this version of this SRP Target software, OEMs and end users can enjoy the following advantages: * Base their initiator and target development efforts on the same OFED source/API base * Enjoy the generic benefits of the improved OGA Gen 2 architecture * Deploy multiple storage target solutions on the same box - e.g., SRP Target, iSER Target, NFS-RDMA server Best regards, Sujal Das Mellanox Technologies 2900 Stender Way, Santa Clara, CA 95054 408 916 0007 (Work) 408 970 3403 (Fax) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Gen2_SRPT_README.txt URL: From weiny2 at llnl.gov Fri Mar 2 10:17:02 2007 From: weiny2 at llnl.gov (Ira Weiny) Date: Fri, 2 Mar 2007 10:17:02 -0800 Subject: [ofa-general] OFED-1.2-20070302-0600 build failure Message-ID: <20070302101702.6064e017.weiny2@llnl.gov> I just got the latest OFED 1.2 build to see if it built. I got the following failure with this system: 09:09:43 > uname -a Linux wopri 2.6.9-61chaos #1 SMP Fri Feb 16 11:12:35 PST 2007 x86_64 x86_64 x86_64 GNU/Linux I was _NOT_ building the kernel modules as I know there are issues in that code and our custom kernel. This is basically a RHEL4U3 system. More information: 09:17:23 > file /usr/lib/libsysfs.so /usr/lib/libsysfs.so: symbolic link to `libsysfs.so.1.0.2' 09:18:05 > file /usr/lib/libsysfs.so.1.0.2 /usr/lib/libsysfs.so.1.0.2: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), stripped 09:18:15 > file /usr/lib64/libsysfs.so.1.0.2 /usr/lib64/libsysfs.so.1.0.2: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), stripped Thanks, Ira Weiny weiny2 at llnl.gov make[2]: Entering directory `/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/libibcm' if /bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I../libibverbs/include -g -Wall -D_GNU_SOURCE -g -O2 -MT cm.lo -MD -MP -MF ".deps/cm.Tpo" -c -o cm.lo `test -f 'src/cm.c' || echo './'`src/cm.c; \ then mv -f ".deps/cm.Tpo" ".deps/cm.Plo"; else rm -f ".deps/cm.Tpo"; exit 1; fi mkdir .libs gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I../libibverbs/include -g -Wall -D_GNU_SOURCE -g -O2 -MT cm.lo -MD -MP -MF .deps/cm.Tpo -c src/cm.c -fPIC -DPIC -o .libs/cm.o /bin/sh ./libtool --tag=CC --mode=link gcc -g -Wall -D_GNU_SOURCE -g -O2 -L../libibverbs/src -libverbs -lsysfs -L. -o src/libibcm.la -rpath /usr/lib64 -avoid-version cm.lo mkdir src/.libs gcc -shared .libs/cm.o -Wl,--rpath -Wl,/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/libibverbs/src/.libs /tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/libibverbs/src/.libs/libibverbs.so /usr/lib/libsysfs.so -L/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/libibcm -Wl,-soname -Wl,libibcm.so -o src/.libs/libibcm.so /usr/lib/libsysfs.so: could not read symbols: File in wrong format collect2: ld returned 1 exit status make[2]: *** [src/libibcm.la] Error 1 make[2]: Leaving directory `/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/libibcm' make[1]: *** [all] Error 2 make[1]: Leaving directory `/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/libibcm' make: *** [libibcm] Error 2 error: Bad exit status from /home/weiny2/rpm/tmp/rpm-tmp.16816 (%install) From ossrosch at linux.vnet.ibm.com Fri Mar 2 10:31:45 2007 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Fri, 2 Mar 2007 19:31:45 +0100 Subject: [ofa-general] [PATCH ofed 1.2 0/2] Patch to build 32-bit binaries on ppc64 Message-ID: <200703021931.45851.ossrosch@linux.vnet.ibm.com> Hi Vladimir, the following 2 Patches prepare OFED-1.2 to build only 32-bit binaries on ppc64. Note that the libraries are still built for both 32- and 64-bit versions. Kind regards Stefan Roscher From ossrosch at linux.vnet.ibm.com Fri Mar 2 10:32:04 2007 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Fri, 2 Mar 2007 19:32:04 +0100 Subject: [ofa-general] [PATCH ofed 1.2 1/2] Patch to build 32-bit binaries on ppc64 Message-ID: <200703021932.05511.ossrosch@linux.vnet.ibm.com> this patch set build_32bit variable to 1 in case of ppc64 Signed-off-by: Stefan Roscher --- --- OFED-1.2-20070301-0600_old/build.sh 2007-03-01 06:00:02.000000000 -0800 +++ OFED-1.2-20070301-0600_new/build.sh 2007-03-02 10:29:55.000000000 -0800 @@ -88,7 +88,11 @@ build_kernel_ib_devel=0 modprobe_update=1 include_ipoib_conf=1 apply_hpage_patch=1 +if [ ! $ARCH = "ppc64" ]; then build_32bit=0 +else +build_32bit=1 +fi # Environment variables definition BUILD_COUNTER=0 From ossrosch at linux.vnet.ibm.com Fri Mar 2 10:33:35 2007 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Fri, 2 Mar 2007 19:33:35 +0100 Subject: [ofa-general] [PATCH ofed 1.2 2/2] Patch to build 32-bit binaries on ppc64 Message-ID: <200703021933.35822.ossrosch@linux.vnet.ibm.com> this patch disabled restore of 64-bit binaries in case of ppc64 and allows 32-bit binaries to be packaged in rpm. Signed-off-by: Stefan Roscher --- --- OFED-1.2-20070301-0600_old/SOURCES/ofa_user-1.2/ofed_scripts/ofa_user.spec 2007-03-01 06:03:28.000000000 -0800 +++ OFED-1.2-20070301-0600_new/SOURCES/ofa_user-1.2/ofed_scripts/ofa_user.spec 2007-03-02 11:05:10.000000000 -0800 @@ -539,7 +539,8 @@ make DESTDIR=$RPM_BUILD_ROOT install_use ./configure --prefix=%{_prefix} --libdir=%{_libdir32} --without-patch %{configure_options32} make user make DESTDIR=$RPM_BUILD_ROOT install_user - # Backup 32 bit binaries + %ifarch x86_64 + # Backup 32 bit binaries if [ -d $RPM_BUILD_ROOT%{_prefix}/bin ]; then mv $RPM_BUILD_ROOT%{_prefix}/bin $RPM_BUILD_ROOT%{_prefix}/bin32 fi @@ -553,6 +554,7 @@ make DESTDIR=$RPM_BUILD_ROOT install_use if [ -d $RPM_BUILD_ROOT%{_prefix}/sbin64 ]; then mv $RPM_BUILD_ROOT%{_prefix}/sbin64 $RPM_BUILD_ROOT%{_prefix}/sbin fi + %endif if [ -f $RPM_BUILD_ROOT%{_prefix}/sbin32/tvflash ] && [ ! -f $RPM_BUILD_ROOT%{_prefix}/sbin/tvflash ]; then mkdir -p $RPM_BUILD_ROOT%{_prefix}/sbin install -m 0755 $RPM_BUILD_ROOT%{_prefix}/sbin32/tvflash $RPM_BUILD_ROOT%{_prefix}/sbin/tvflash From swise at opengridcomputing.com Fri Mar 2 12:38:15 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 14:38:15 -0600 Subject: [ofa-general] 2.6.21 map problem Message-ID: <1172867895.21241.51.camel@stevo-desktop> Roland, do you know if anything changed in 2.6.21 regarding remap_pfn_range()? The chelsio bypass code is failing. After mapping the WQ memory that was allocated via dma_alloc_coherent(), the library crashes when reading the mapped WQ memory. The dma address being mapped is 0x7373a000. It gets mapped to 0xaaaaaaad000. rping reads this address + 68 and we crash: Mar 2 14:18:39 vic14 kernel: [78930.705415] iw_cxgb3: iwch_create_qp sq_num_entries 16, rq_num_entries 15 qpid 0x16 qhp ffff810078c3d488 dma_addr 0x7373a000 size 32 Mar 2 14:18:39 vic14 kernel: [78930.705520] iw_cxgb3: iwch_mmap pgoff 0x2 key 0x2000 len 4096 Mar 2 14:18:39 vic14 kernel: [78930.709966] iw_cxgb3: remove_mmap key 0x2000 addr 0xa800b000 len 4096 Mar 2 14:18:39 vic14 kernel: [78930.710047] iw_cxgb3: iwch_mmap pgoff 0x1 key 0x1000 len 4096 Mar 2 14:18:39 vic14 kernel: [78930.710122] iw_cxgb3: remove_mmap key 0x1000 addr 0x7373a000 len 4096 Mar 2 14:19:07 vic14 kernel: [78959.395682] rping: Corrupted page table at address 2aaaaaaad068 Mar 2 14:19:07 vic14 kernel: [78959.395759] PGD 1f3e6067 PUD 7201a067 PMD 22e27067 PTE 6b6b6b6b6b6b6027 Mar 2 14:19:07 vic14 kernel: [78959.396021] Bad pagetable: 000d [9] SMP Mar 2 14:19:07 vic14 kernel: [78959.396182] CPU 2 Mar 2 14:19:07 vic14 kernel: [78959.396293] Modules linked in: iw_cxgb3 nfs lockd nfs_acl sunrpc rdma_krping rdma_ucm rdma_cm ib_cm iw_cm ib_addr ib_uverbs ib_umad ib_ipoib ib_sa ib_mthca ib_mad cxgb3 ib_core ipv6 af_packet button battery ac loop dm_mod e1000 parport_pc lp parport reiserfs edd fan thermal processor sg aic79xx scsi_transport_spi ata_piix libata piix sd_mod scsi_mod ide_disk ide_core Mar 2 14:19:07 vic14 kernel: [78959.398617] Pid: 27938, comm: rping Not tainted 2.6.21-rc2 #1 Mar 2 14:19:07 vic14 kernel: [78959.398690] RIP: 0033:[<00002b5b5f5f0809>] [<00002b5b5f5f0809>] Mar 2 14:19:07 vic14 kernel: [78959.398813] RSP: 002b:00007fff4bc37980 EFLAGS: 00010302 Mar 2 14:19:07 vic14 kernel: [78959.398885] RAX: 00002aaaaaaad000 RBX: 00000000005093b0 RCX: 0000000000000000 Mar 2 14:19:07 vic14 kernel: [78959.398959] RDX: 0000000000000009 RSI: 00007fff4bc37aa0 RDI: 0000000000509458 Mar 2 14:19:07 vic14 kernel: [78959.399034] RBP: 00007fff4bc37980 R08: 0000000000000300 R09: 00002b5b5ef8cc00 Mar 2 14:19:07 vic14 kernel: [78959.399110] R10: 0000000000000000 R11: 00002b5b5f09c1f0 R12: 00007fff4bc37aa0 Mar 2 14:19:07 vic14 kernel: [78959.399184] R13: 00007fff4bc37d80 R14: 0000000000000000 R15: 0000000000000000 Mar 2 14:19:07 vic14 kernel: [78959.399260] FS: 00002b5b5f5eac50(0000) GS:ffff810001394af8(0000) knlGS:0000000000000000 Mar 2 14:19:07 vic14 kernel: [78959.399352] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 2 14:19:07 vic14 kernel: [78959.399425] CR2: 00002aaaaaaad068 CR3: 000000001cfc2000 CR4: 00000000000006e0 Mar 2 14:19:07 vic14 kernel: [78959.399500] Process rping (pid: 27938, threadinfo ffff810030aa8000, task ffff81007d882820) Mar 2 14:19:07 vic14 kernel: [78959.399590] Mar 2 14:19:07 vic14 kernel: [78959.399653] RIP [<00002b5b5f5f0809>] Mar 2 14:19:07 vic14 kernel: [78959.399769] RSP <00007fff4bc37980> Mar 2 14:23:09 vic14 syslog-ng[3158]: STATS: dropped 0 Any thoughts? Thanks, Steve. From swise at opengridcomputing.com Fri Mar 2 13:58:08 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 15:58:08 -0600 Subject: [ofa-general] 2.6.21 map problem In-Reply-To: <1172867895.21241.51.camel@stevo-desktop> References: <1172867895.21241.51.camel@stevo-desktop> Message-ID: <1172872688.21241.59.camel@stevo-desktop> On Fri, 2007-03-02 at 14:38 -0600, Steve Wise wrote: > Roland, do you know if anything changed in 2.6.21 regarding > remap_pfn_range()? The chelsio bypass code is failing. After mapping > the WQ memory that was allocated via dma_alloc_coherent(), the library > crashes when reading the mapped WQ memory. never mind. i found it. my bad. :-\ From swise at opengridcomputing.com Fri Mar 2 14:06:36 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 16:06:36 -0600 Subject: [ofa-general] [PATCH 2.6.21-rc2] iw_cxgb3: Don't use mm after its freed in iwch_mmap(). Message-ID: <1172873196.21241.62.camel@stevo-desktop> Don't use mm after its freed in iwch_mmap(). Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 4af1c0f..f2774ae 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -331,6 +331,7 @@ static int iwch_mmap(struct ib_ucontext int ret = 0; struct iwch_mm_entry *mm; struct iwch_ucontext *ucontext; + u64 addr; PDBG("%s pgoff 0x%lx key 0x%x len %d\n", __FUNCTION__, vma->vm_pgoff, key, len); @@ -345,10 +346,11 @@ static int iwch_mmap(struct ib_ucontext mm = remove_mmap(ucontext, key, len); if (!mm) return -EINVAL; + addr = mm->addr; kfree(mm); - if ((mm->addr >= rdev_p->rnic_info.udbell_physbase) && - (mm->addr < (rdev_p->rnic_info.udbell_physbase + + if ((addr >= rdev_p->rnic_info.udbell_physbase) && + (addr < (rdev_p->rnic_info.udbell_physbase + rdev_p->rnic_info.udbell_len))) { /* @@ -362,7 +364,7 @@ static int iwch_mmap(struct ib_ucontext vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND; vma->vm_flags &= ~VM_MAYREAD; ret = io_remap_pfn_range(vma, vma->vm_start, - mm->addr >> PAGE_SHIFT, + addr >> PAGE_SHIFT, len, vma->vm_page_prot); } else { @@ -370,7 +372,7 @@ static int iwch_mmap(struct ib_ucontext * Map WQ or CQ contig dma memory... */ ret = remap_pfn_range(vma, vma->vm_start, - mm->addr >> PAGE_SHIFT, + addr >> PAGE_SHIFT, len, vma->vm_page_prot); } From rdreier at cisco.com Fri Mar 2 14:25:07 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 02 Mar 2007 14:25:07 -0800 Subject: [ofa-general] ofed_1_2 git echa problems In-Reply-To: (Hoang-Nam Nguyen's message of "Fri, 2 Mar 2007 11:38:13 +0100") References: Message-ID: > In ofed-1.2-alpha we removed do_mmap() from kernel space, which > also affects user space in that both kernel and user space components > have to be from the same code stream. I wasn't paying enough attention to this change. If you break compatibility between userspace and kernel space, you should adjust your kernel driver's uverbs_abi_ver (which I think you did with the do_mmap()) removal, and then test the version in the userspace library, so that it can fail gracefully when it detects a too-new kernel that it doesn't know how to handle. - R. From rdreier at cisco.com Fri Mar 2 14:29:33 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 02 Mar 2007 14:29:33 -0800 Subject: [ofa-general] Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: <1172856154.21241.34.camel@stevo-desktop> (Steve Wise's message of "Fri, 02 Mar 2007 11:22:34 -0600") References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com> <1172845045.21241.0.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net> <1172854585.21241.14.camel@stevo-desktop> <1172854873.21241.19.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net> <1172856154.21241.34.camel@stevo-desktop> Message-ID: > > I wonder if libibverbs can do this way for application and make > > ibv_get_async_event() non-blocking. But I will try this way now. > > > > I wonder what happens if you set the async file descriptor to > non-blocking? > > Roland? Would that return EWOULDBLOCK if there are no events? Yes, as you can see if you look at the libibverbs internals, ibv_get_async_event() is just doing read() on a file descriptor. So all the standard ways of handling finding out whether the descriptor is ready to read should work: block in read(), use poll(), epoll, SIGIO, async IO, etc. And using fcntl to set the descriptor to non-blocking mode would work too. - R. From rdreier at cisco.com Fri Mar 2 14:31:12 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 02 Mar 2007 14:31:12 -0800 Subject: [ofa-general] Re: 2.6.21 map problem In-Reply-To: <1172867895.21241.51.camel@stevo-desktop> (Steve Wise's message of "Fri, 02 Mar 2007 14:38:15 -0600") References: <1172867895.21241.51.camel@stevo-desktop> Message-ID: > Mar 2 14:19:07 vic14 kernel: [78959.395759] PGD 1f3e6067 PUD 7201a067 PMD 22e27067 PTE 6b6b6b6b6b6b6027 the PTE is basically 6b6b6b6b etc, and 6b is the use-after-free poison value. not sure if that helps or not... From changquing.tang at hp.com Fri Mar 2 14:45:49 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Fri, 2 Mar 2007 22:45:49 -0000 Subject: [ofa-general] Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com><1172845045.21241.0.camel@stevo-desktop><349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net><1172854585.21241.14.camel@stevo-desktop><1172854873.21241.19.camel@stevo-desktop><349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net><1172856154.21241.34.camel@stevo-desktop> Message-ID: <349DCDA352EACF42A0C49FA6DCEA84039979E4@G3W0634.americas.hpqcorp.net> Thanks. If I set fd to non-blocking mode, does ibv_get_async_event() return 0 is there is an event, And return non-0 if nothing ? --CQ > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Friday, March 02, 2007 4:30 PM > To: Steve Wise > Cc: Tang, Changqing; General at lists.openfabrics.org > Subject: Re: [ofa-general] Re: Is ibv_get_async_event() a > blocking call ? > > > > I wonder if libibverbs can do this way for application > and make > > ibv_get_async_event() non-blocking. But I will > try this way now. > > > > > > > I wonder what happens if you set the async file descriptor > to > non-blocking? > > > > Roland? Would that return EWOULDBLOCK if there are no events? > > Yes, as you can see if you look at the libibverbs internals, > ibv_get_async_event() is just doing read() on a file > descriptor. So all the standard ways of handling finding out > whether the descriptor is ready to read should work: block in > read(), use poll(), epoll, SIGIO, async IO, etc. And using > fcntl to set the descriptor to non-blocking mode would work too. > > - R. > From changquing.tang at hp.com Fri Mar 2 14:48:30 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Fri, 2 Mar 2007 22:48:30 -0000 Subject: [ofa-general] Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com><1172845045.21241.0.camel@stevo-desktop><349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net><1172854585.21241.14.camel@stevo-desktop><1172854873.21241.19.camel@stevo-desktop><349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net><1172856154.21241.34.camel@stevo-desktop> Message-ID: <349DCDA352EACF42A0C49FA6DCEA84039979EC@G3W0634.americas.hpqcorp.net> Roland: Back to my orignal question, if I don't call ibv_get_async_event() for a long time, and there are a lot of events generated during the time, do I loss any event when I eventually call ibv_get_async_event() ? (another way, how many events can you queue ) ? --CQ > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Friday, March 02, 2007 4:30 PM > To: Steve Wise > Cc: Tang, Changqing; General at lists.openfabrics.org > Subject: Re: [ofa-general] Re: Is ibv_get_async_event() a > blocking call ? > > > > I wonder if libibverbs can do this way for application > and make > > ibv_get_async_event() non-blocking. But I will > try this way now. > > > > > > > I wonder what happens if you set the async file descriptor > to > non-blocking? > > > > Roland? Would that return EWOULDBLOCK if there are no events? > > Yes, as you can see if you look at the libibverbs internals, > ibv_get_async_event() is just doing read() on a file > descriptor. So all the standard ways of handling finding out > whether the descriptor is ready to read should work: block in > read(), use poll(), epoll, SIGIO, async IO, etc. And using > fcntl to set the descriptor to non-blocking mode would work too. > > - R. > From rdreier at cisco.com Fri Mar 2 14:49:02 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 02 Mar 2007 14:49:02 -0800 Subject: [ofa-general] Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA84039979E4@G3W0634.americas.hpqcorp.net> (Changqing Tang's message of "Fri, 2 Mar 2007 22:45:49 -0000") References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com> <1172845045.21241.0.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net> <1172854585.21241.14.camel@stevo-desktop> <1172854873.21241.19.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net> <1172856154.21241.34.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039979E4@G3W0634.americas.hpqcorp.net> Message-ID: > If I set fd to non-blocking mode, does ibv_get_async_event() return 0 is > there is an event, > And return non-0 if nothing ? Yes. From rdreier at cisco.com Fri Mar 2 14:55:28 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 02 Mar 2007 14:55:28 -0800 Subject: [ofa-general] Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA84039979EC@G3W0634.americas.hpqcorp.net> (Changqing Tang's message of "Fri, 2 Mar 2007 22:48:30 -0000") References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com> <1172845045.21241.0.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net> <1172854585.21241.14.camel@stevo-desktop> <1172854873.21241.19.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net> <1172856154.21241.34.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039979EC@G3W0634.americas.hpqcorp.net> Message-ID: > Back to my orignal question, if I don't call > ibv_get_async_event() for a long time, and there are a lot of events > generated during the time, do I loss any event when I eventually call > ibv_get_async_event() ? > > (another way, how many events can you queue ) ? Actually this is not handled very well right now. There is no limit on the length of the queue so you can eventually use up all the memory in the system if you never pick up asyc events. - R. From rdreier at cisco.com Fri Mar 2 14:57:25 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 02 Mar 2007 14:57:25 -0800 Subject: [ofa-general] Re: What is the size of async event queue ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> (Changqing Tang's message of "Fri, 2 Mar 2007 03:16:47 -0000") References: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> Message-ID: > What is the default size of the async event queue ? Suppose I > create 1024 QP from one process to another process, > Somehow the remote process crashes, Can I get all the 1024 QP error > async event, how do I make sure I don't loss an event ? Which async event are you expecting to get? If you lose a connection on an RC QP, then you will get a completion with error status returned on your CQ -- I don't think there is any async event that is caused. > Also if I want to detect QP connection error, I can either use > completion error, or use ibv_get_async_event(), which way report error > faster ? Which async event do you get? - R. From rdreier at cisco.com Fri Mar 2 14:57:25 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 02 Mar 2007 14:57:25 -0800 Subject: [ofa-general] Re: What is the size of async event queue ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> (Changqing Tang's message of "Fri, 2 Mar 2007 03:16:47 -0000") References: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> Message-ID: > What is the default size of the async event queue ? Suppose I > create 1024 QP from one process to another process, > Somehow the remote process crashes, Can I get all the 1024 QP error > async event, how do I make sure I don't loss an event ? Which async event are you expecting to get? If you lose a connection on an RC QP, then you will get a completion with error status returned on your CQ -- I don't think there is any async event that is caused. > Also if I want to detect QP connection error, I can either use > completion error, or use ibv_get_async_event(), which way report error > faster ? Which async event do you get? - R. From rdreier at cisco.com Fri Mar 2 14:57:25 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 02 Mar 2007 14:57:25 -0800 Subject: [ofa-general] Re: What is the size of async event queue ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> (Changqing Tang's message of "Fri, 2 Mar 2007 03:16:47 -0000") References: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> Message-ID: > What is the default size of the async event queue ? Suppose I > create 1024 QP from one process to another process, > Somehow the remote process crashes, Can I get all the 1024 QP error > async event, how do I make sure I don't loss an event ? Which async event are you expecting to get? If you lose a connection on an RC QP, then you will get a completion with error status returned on your CQ -- I don't think there is any async event that is caused. > Also if I want to detect QP connection error, I can either use > completion error, or use ibv_get_async_event(), which way report error > faster ? Which async event do you get? - R. From swise at opengridcomputing.com Fri Mar 2 15:17:50 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 17:17:50 -0600 Subject: [ofa-general] [PATCH ofed_1_2 0/6] iw_cxgb3: Bug Fixes Message-ID: <20070302231750.2701.64219.stgit@dell3.ogc.int> Vlad, Here is a set of bug fixes for iw_cxgb3 that I'd like to roll into ofed_1_2 beta. They can be pulled from: http://staging.openfabrics.org/~swise/ofed_1_2 iw_cxgb3_fixes Thanks, Steve. From swise at opengridcomputing.com Fri Mar 2 15:17:53 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 17:17:53 -0600 Subject: [ofa-general] [PATCH ofed_1_2 1/6] iw_cxgb3: Fixes for "normal close" failures. In-Reply-To: <20070302231750.2701.64219.stgit@dell3.ogc.int> References: <20070302231750.2701.64219.stgit@dell3.ogc.int> Message-ID: <20070302231752.2701.48131.stgit@dell3.ogc.int> Fixes for "normal close" failures. - Start normal close timer when moving to CLOSING state. - Handle ABORTING state in close_con_rpl(). - Stop timer correctly on abort during a normal close. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 11 +++++++---- 1 files changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index 9466a50..dd006e3 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -1416,6 +1416,7 @@ static int peer_close(struct t3cdev *tde wake_up(&ep->com.waitq); break; case FPDU_MODE: + start_ep_timer(ep); __state_set(&ep->com, CLOSING); attrs.next_state = IWCH_QP_STATE_CLOSING; iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, @@ -1426,7 +1427,6 @@ static int peer_close(struct t3cdev *tde disconnect = 0; break; case CLOSING: - start_ep_timer(ep); __state_set(&ep->com, MORIBUND); disconnect = 0; break; @@ -1508,9 +1508,10 @@ static int peer_abort(struct t3cdev *tde get_ep(&ep->com); break; case MORIBUND: + case CLOSING: stop_ep_timer(ep); + /*FALLTHROUGH*/ case FPDU_MODE: - case CLOSING: if (ep->com.cm_id && ep->com.qp) { attrs.next_state = IWCH_QP_STATE_ERROR; ret = iwch_modify_qp(ep->com.qp->rhp, @@ -1571,7 +1572,6 @@ static int close_con_rpl(struct t3cdev * spin_lock_irqsave(&ep->com.lock, flags); switch (ep->com.state) { case CLOSING: - start_ep_timer(ep); __state_set(&ep->com, MORIBUND); break; case MORIBUND: @@ -1587,6 +1587,8 @@ static int close_con_rpl(struct t3cdev * __state_set(&ep->com, DEAD); release = 1; break; + case ABORTING: + break; case DEAD: default: BUG_ON(1); @@ -1660,6 +1662,7 @@ static void ep_timeout(unsigned long arg break; case MPA_REQ_WAIT: break; + case CLOSING: case MORIBUND: if (ep->com.cm_id && ep->com.qp) { attrs.next_state = IWCH_QP_STATE_ERROR; @@ -1958,11 +1961,11 @@ int iwch_ep_disconnect(struct iwch_ep *e case MPA_REQ_RCVD: case MPA_REP_SENT: case FPDU_MODE: + start_ep_timer(ep); ep->com.state = CLOSING; close = 1; break; case CLOSING: - start_ep_timer(ep); ep->com.state = MORIBUND; close = 1; break; From swise at opengridcomputing.com Fri Mar 2 15:17:56 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 17:17:56 -0600 Subject: [ofa-general] [PATCH ofed_1_2 2/6] iw_cxgb3: Move QP to error on destroy if the state is IDLE. In-Reply-To: <20070302231750.2701.64219.stgit@dell3.ogc.int> References: <20070302231750.2701.64219.stgit@dell3.ogc.int> Message-ID: <20070302231756.2701.85887.stgit@dell3.ogc.int> Move QP to error on destroy if the state is IDLE. Change iwch_destroy_qp() to always move the QP to ERROR and let iwch_modify_qp() decide what to do. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 6 ++---- 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 3f64dbf..3bd8195 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -723,10 +723,8 @@ static int iwch_destroy_qp(struct ib_qp qhp = to_iwch_qp(ib_qp); rhp = qhp->rhp; - if (qhp->attr.state == IWCH_QP_STATE_RTS) { - attrs.next_state = IWCH_QP_STATE_ERROR; - iwch_modify_qp(rhp, qhp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 0); - } + attrs.next_state = IWCH_QP_STATE_ERROR; + iwch_modify_qp(rhp, qhp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 0); wait_event(qhp->wait, !qhp->ep); remove_handle(rhp, &rhp->qpidr, qhp->wq.qpid); From swise at opengridcomputing.com Fri Mar 2 15:17:58 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 17:17:58 -0600 Subject: [ofa-general] [PATCH ofed_1_2 3/6] iw_cxgb3: Stop the endpoint timer when the MPA exchange is aborted by the peer. In-Reply-To: <20070302231750.2701.64219.stgit@dell3.ogc.int> References: <20070302231750.2701.64219.stgit@dell3.ogc.int> Message-ID: <20070302231758.2701.22972.stgit@dell3.ogc.int> Stop the endpoint timer when the MPA exchange is aborted by the peer. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index dd006e3..cc09589 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -1488,8 +1488,10 @@ static int peer_abort(struct t3cdev *tde case CONNECTING: break; case MPA_REQ_WAIT: + stop_ep_timer(ep); break; case MPA_REQ_SENT: + stop_ep_timer(ep); connect_reply_upcall(ep, -ECONNRESET); break; case MPA_REP_SENT: From swise at opengridcomputing.com Fri Mar 2 15:18:00 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 17:18:00 -0600 Subject: [ofa-general] [PATCH ofed_1_2 4/6] iw_cxgb3: Squelch logging AE errors. In-Reply-To: <20070302231750.2701.64219.stgit@dell3.ogc.int> References: <20070302231750.2701.64219.stgit@dell3.ogc.int> Message-ID: <20070302231800.2701.7334.stgit@dell3.ogc.int> Squelch logging AE errors. Only post one AE error for a given connection in the kernel log. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_ev.c | 12 ++++++------ 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_ev.c b/drivers/infiniband/hw/cxgb3/iwch_ev.c index f4cd5ec..64b94e8 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_ev.c +++ b/drivers/infiniband/hw/cxgb3/iwch_ev.c @@ -47,12 +47,6 @@ static void post_qp_event(struct iwch_de struct iwch_qp_attributes attrs; struct iwch_qp *qhp; - printk(KERN_ERR "%s - AE qpid 0x%x opcode %d status 0x%x " - "type %d wrid.hi 0x%x wrid.lo 0x%x \n", __FUNCTION__, - CQE_QPID(rsp_msg->cqe), CQE_OPCODE(rsp_msg->cqe), - CQE_STATUS(rsp_msg->cqe), CQE_TYPE(rsp_msg->cqe), - CQE_WRID_HI(rsp_msg->cqe), CQE_WRID_LOW(rsp_msg->cqe)); - spin_lock(&rnicp->lock); qhp = get_qhp(rnicp, CQE_QPID(rsp_msg->cqe)); @@ -73,6 +67,12 @@ static void post_qp_event(struct iwch_de return; } + printk(KERN_ERR "%s - AE qpid 0x%x opcode %d status 0x%x " + "type %d wrid.hi 0x%x wrid.lo 0x%x \n", __FUNCTION__, + CQE_QPID(rsp_msg->cqe), CQE_OPCODE(rsp_msg->cqe), + CQE_STATUS(rsp_msg->cqe), CQE_TYPE(rsp_msg->cqe), + CQE_WRID_HI(rsp_msg->cqe), CQE_WRID_LOW(rsp_msg->cqe)); + atomic_inc(&qhp->refcnt); spin_unlock(&rnicp->lock); From swise at opengridcomputing.com Fri Mar 2 15:18:03 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 17:18:03 -0600 Subject: [ofa-general] [PATCH ofed_1_2 5/6] iw_cxgb3: Don't reuse skbuffs that are non-linear or cloned. In-Reply-To: <20070302231750.2701.64219.stgit@dell3.ogc.int> References: <20070302231750.2701.64219.stgit@dell3.ogc.int> Message-ID: <20070302231802.2701.50703.stgit@dell3.ogc.int> Don't reuse skbuffs that are non-linear or cloned. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index cc09589..bbd34e7 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -306,8 +306,7 @@ static int status2errno(int status) */ static struct sk_buff *get_skb(struct sk_buff *skb, int len, gfp_t gfp) { - if (skb) { - BUG_ON(skb_cloned(skb)); + if (skb && !skb_is_nonlinear(skb) && !skb_cloned(skb)) { skb_trim(skb, 0); skb_get(skb); } else { From swise at opengridcomputing.com Fri Mar 2 15:18:05 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 02 Mar 2007 17:18:05 -0600 Subject: [ofa-general] [PATCH ofed_1_2 6/6] iw_cxgb3: Fix MR permission problems. In-Reply-To: <20070302231750.2701.64219.stgit@dell3.ogc.int> References: <20070302231750.2701.64219.stgit@dell3.ogc.int> Message-ID: <20070302231805.2701.29864.stgit@dell3.ogc.int> Fix MR permission problems. - remove useless and redundant iwch_mem_perms enum. - create ib_to_tpt_access_rights() for mapping ib access rights to T3 TPT permissions. - create ib_to_mwbind_access_rights() for mapping ib access rights to T3 MWBIND WR permissions. - fix up the mem reg code to utilize the new functions. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 26 +++------------------ drivers/infiniband/hw/cxgb3/iwch_provider.h | 33 +++++++++++---------------- drivers/infiniband/hw/cxgb3/iwch_qp.c | 2 +- 3 files changed, 18 insertions(+), 43 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 3bd8195..1388687 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -450,9 +450,6 @@ static struct ib_mr *iwch_register_phys_ php = to_iwch_pd(pd); rhp = php->rhp; - acc = iwch_convert_access(acc); - - mhp = kzalloc(sizeof(*mhp), GFP_KERNEL); if (!mhp) return ERR_PTR(-ENOMEM); @@ -477,16 +474,9 @@ static struct ib_mr *iwch_register_phys_ mhp->rhp = rhp; mhp->attr.pdid = php->pdid; mhp->attr.zbva = 0; - - /* NOTE: TPT perms are backwards from BIND WR perms! */ - mhp->attr.perms = (acc & 0x1) << 3; - mhp->attr.perms |= (acc & 0x2) << 1; - mhp->attr.perms |= (acc & 0x4) >> 1; - mhp->attr.perms |= (acc & 0x8) >> 3; - + mhp->attr.perms = iwch_ib_to_tpt_access(acc); mhp->attr.va_fbo = *iova_start; mhp->attr.page_size = shift - 12; - mhp->attr.len = (u32) total_size; mhp->attr.pbl_size = npages; ret = iwch_register_mem(rhp, php, mhp, shift, page_list); @@ -512,7 +502,6 @@ static int iwch_reregister_phys_mem(stru struct iwch_mr mh, *mhp; struct iwch_pd *php; struct iwch_dev *rhp; - int new_acc; __be64 *page_list = NULL; int shift = 0; u64 total_size; @@ -533,14 +522,12 @@ static int iwch_reregister_phys_mem(stru if (rhp != php->rhp) return -EINVAL; - new_acc = mhp->attr.perms; - memcpy(&mh, mhp, sizeof *mhp); if (mr_rereg_mask & IB_MR_REREG_PD) php = to_iwch_pd(pd); if (mr_rereg_mask & IB_MR_REREG_ACCESS) - mh.attr.perms = iwch_convert_access(acc); + mh.attr.perms = iwch_ib_to_tpt_access(acc); if (mr_rereg_mask & IB_MR_REREG_TRANS) ret = build_phys_page_list(buffer_list, num_phys_buf, iova_start, @@ -555,7 +542,7 @@ static int iwch_reregister_phys_mem(stru if (mr_rereg_mask & IB_MR_REREG_PD) mhp->attr.pdid = php->pdid; if (mr_rereg_mask & IB_MR_REREG_ACCESS) - mhp->attr.perms = acc; + mhp->attr.perms = iwch_ib_to_tpt_access(acc); if (mr_rereg_mask & IB_MR_REREG_TRANS) { mhp->attr.zbva = 0; mhp->attr.va_fbo = *iova_start; @@ -600,8 +587,6 @@ struct ib_mr *iwch_reg_user_mr(struct ib goto err; } - acc = iwch_convert_access(acc); - i = n = 0; list_for_each_entry(chunk, ®ion->chunk_list, list) @@ -617,10 +602,7 @@ struct ib_mr *iwch_reg_user_mr(struct ib mhp->rhp = rhp; mhp->attr.pdid = php->pdid; mhp->attr.zbva = 0; - mhp->attr.perms = (acc & 0x1) << 3; - mhp->attr.perms |= (acc & 0x2) << 1; - mhp->attr.perms |= (acc & 0x4) >> 1; - mhp->attr.perms |= (acc & 0x8) >> 3; + mhp->attr.perms = iwch_ib_to_tpt_access(acc); mhp->attr.va_fbo = region->virt_base; mhp->attr.page_size = shift - 12; mhp->attr.len = (u32) region->length; diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index 7322773..998b323 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -284,27 +284,20 @@ static inline int iwch_convert_state(enu } } -enum iwch_mem_perms { - IWCH_MEM_ACCESS_LOCAL_READ = 1 << 0, - IWCH_MEM_ACCESS_LOCAL_WRITE = 1 << 1, - IWCH_MEM_ACCESS_REMOTE_READ = 1 << 2, - IWCH_MEM_ACCESS_REMOTE_WRITE = 1 << 3, - IWCH_MEM_ACCESS_ATOMICS = 1 << 4, - IWCH_MEM_ACCESS_BINDING = 1 << 5, - IWCH_MEM_ACCESS_LOCAL = - (IWCH_MEM_ACCESS_LOCAL_READ | IWCH_MEM_ACCESS_LOCAL_WRITE), - IWCH_MEM_ACCESS_REMOTE = - (IWCH_MEM_ACCESS_REMOTE_WRITE | IWCH_MEM_ACCESS_REMOTE_READ) - /* cannot go beyond 1 << 31 */ -} __attribute__ ((packed)); - -static inline u32 iwch_convert_access(int acc) +static inline u32 iwch_ib_to_tpt_access(int acc) { - return (acc & IB_ACCESS_REMOTE_WRITE ? IWCH_MEM_ACCESS_REMOTE_WRITE : 0) - | (acc & IB_ACCESS_REMOTE_READ ? IWCH_MEM_ACCESS_REMOTE_READ : 0) | - (acc & IB_ACCESS_LOCAL_WRITE ? IWCH_MEM_ACCESS_LOCAL_WRITE : 0) | - (acc & IB_ACCESS_MW_BIND ? IWCH_MEM_ACCESS_BINDING : 0) | - IWCH_MEM_ACCESS_LOCAL_READ; + return (acc & IB_ACCESS_REMOTE_WRITE ? TPT_REMOTE_WRITE : 0) | + (acc & IB_ACCESS_REMOTE_READ ? TPT_REMOTE_READ : 0) | + (acc & IB_ACCESS_LOCAL_WRITE ? TPT_LOCAL_WRITE : 0) | + TPT_LOCAL_READ; +} + +static inline u32 iwch_ib_to_mwbind_access(int acc) +{ + return (acc & IB_ACCESS_REMOTE_WRITE ? T3_MEM_ACCESS_REM_WRITE : 0) | + (acc & IB_ACCESS_REMOTE_READ ? T3_MEM_ACCESS_REM_READ : 0) | + (acc & IB_ACCESS_LOCAL_WRITE ? T3_MEM_ACCESS_LOCAL_WRITE : 0) | + T3_MEM_ACCESS_LOCAL_READ; } enum iwch_mmid_state { diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index e1e35d9..25149a4 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -441,7 +441,7 @@ int iwch_bind_mw(struct ib_qp *qp, wqe->bind.type = T3_VA_BASED_TO; /* TBD: check perms */ - wqe->bind.perms = iwch_convert_access(mw_bind->mw_access_flags); + wqe->bind.perms = iwch_ib_to_mwbind_access(mw_bind->mw_access_flags); wqe->bind.mr_stag = cpu_to_be32(mw_bind->mr->lkey); wqe->bind.mw_stag = cpu_to_be32(mw->rkey); wqe->bind.mw_len = cpu_to_be32(mw_bind->length); From rdreier at cisco.com Fri Mar 2 15:34:21 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 02 Mar 2007 15:34:21 -0800 Subject: [ofa-general] Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy In-Reply-To: <20070302111658.GE27542@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 2 Mar 2007 13:16:58 +0200") References: <20070301144139.GK14282@mellanox.co.il> <20070302111658.GE27542@mellanox.co.il> Message-ID: > > I'm not quite sure I understand why we have to synchronize against the > > completion EQ's interrupt here. > > Hmm, I'm not sure myself, now. > I'm still thinking about this - the patch below is clearly correct > and seems sufficient to fix the issue pointed out by bugzilla. > So let's get it merged and I'll try to think about and address > other isses (if any) in a separate patch. The more I think about it, the more I think that synchronizing against the completion interrupt doesn't accomplish anything. The completion interrupt itself only looks at the CQ, so it doesn't matter what we do with the QP table or anything to do with QPs. And a consumer could poll any CQ at any time, in or out of interrupt context, so we're not protecting against anything that has to do with polling CQs. However, it does seem that we should also clean the CQs before removing the QP from the table, to avoid polling completions for a QP not in the QP table. And also synchronizing with the async event EQ's interrupts still makes sense to me. I guess I don't quite understand why this change is enough to fix bug #394 -- it seems it is just changing the timing without really closing the race window completely. - R. From rdreier at cisco.com Fri Mar 2 15:48:13 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 02 Mar 2007 15:48:13 -0800 Subject: [ofa-general] [PATCH, RFC] libibverbs: Add hooks for rereg_mr, memory windows In-Reply-To: <1334.85.65.223.188.1172833964.squirrel@dev.mellanox.co.il> (dotanb@dev.mellanox.co.il's message of "Fri, 2 Mar 2007 13:12:44 +0200 (IST)") References: <1334.85.65.223.188.1172833964.squirrel@dev.mellanox.co.il> Message-ID: > 1) Why do you need the last enumeration (IBV_REREG_MR_KEEP_VALID)? Some Open MPI developers have asked for an extension to the IBA definition of reregister memory region, which would allow an application to extend a memory region without invalidating the earlier registration. > 2) I suggest the following structure for the ibv_mw_bind: > struct ibv_mw_bind { > uint64_t wr_id; > uint64_t addr; > uint64_t length; > struct ibv_mr *mr; > enum ibv_send_flags send_flags; > enum ibv_access_flags mw_access_flags; > }; > to be optimal on 32 bit arch as well. Good point. In fact as Sean pointed out, I think addr should be void * and length should be size_t, to match the prototype of ibv_reg_mr() > 3) in the past we had a mail thread about destroying completion event > channel: to return BUSY if one is trying to > destroy a channel that being used by CQs. in order to do it i will need > to do this check in kernel level: > can you add a function to this issue in the uverbs (or do you want me to > send you a patch to this issue). > if we are changing the ABI, i would like to add this issue as well. Why do we need to do the check in the kernel? In fact it seems too late in the kernel, since the kernel only sees anything happen with the completion channel after the FD has been closed. It would make more sense to put a refcount in completion channels and add a pointer to a completion channel in struct ibv_cq. Or am I missing something? Anyway I would be open to adding the members to ibv_cq and ibv_comp_channel to support this check. - R. From changquing.tang at hp.com Fri Mar 2 16:04:29 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Sat, 3 Mar 2007 00:04:29 -0000 Subject: [ofa-general] RE: What is the size of async event queue ? In-Reply-To: References: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> Message-ID: <349DCDA352EACF42A0C49FA6DCEA8403997A8D@G3W0634.americas.hpqcorp.net> If a QP goes to error state, I will get completion error, do I still get any of IBV_EVENT_QP_FATAL/REQ_ERR/ACCESS_ERR event? --CQ > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Friday, March 02, 2007 4:57 PM > To: Tang, Changqing > Cc: openib-general at openib.org > Subject: Re: What is the size of async event queue ? > > > What is the default size of the async event queue ? Suppose I > > create 1024 QP from one process to another process, > > Somehow the remote process crashes, Can I get all the 1024 QP > error > async event, how do I make sure I don't loss an event ? > > Which async event are you expecting to get? If you lose a > connection on an RC QP, then you will get a completion with > error status returned on your CQ -- I don't think there is > any async event that is caused. > > > Also if I want to detect QP connection error, I can either use > > completion error, or use ibv_get_async_event(), which way > report error > faster ? > > Which async event do you get? > > - R. > From rdreier at cisco.com Fri Mar 2 16:12:40 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 02 Mar 2007 16:12:40 -0800 Subject: [ofa-general] Re: What is the size of async event queue ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA8403997A8D@G3W0634.americas.hpqcorp.net> (Changqing Tang's message of "Sat, 3 Mar 2007 00:04:29 -0000") References: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA8403997A8D@G3W0634.americas.hpqcorp.net> Message-ID: > If a QP goes to error state, I will get completion error, do I still get > any of IBV_EVENT_QP_FATAL/REQ_ERR/ACCESS_ERR event? It depends on why the QP is transitioned to the error state. You can look at section 11.6.3.2 of the IB spec for details on async errors, but the basic idea is that any error that can be reported in a work completion will be reported that way; only QP errors that don't have a way to be reported via a CQ will generate an async event. - R. From changquing.tang at hp.com Fri Mar 2 16:38:07 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Sat, 3 Mar 2007 00:38:07 -0000 Subject: [ofa-general] RE: What is the size of async event queue ? In-Reply-To: References: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net><349DCDA352EACF42A0C49FA6DCEA8403997A8D@G3W0634.americas.hpqcorp.net> Message-ID: <349DCDA352EACF42A0C49FA6DCEA8403997AC0@G3W0634.americas.hpqcorp.net> Thanks for the general idea. --CQ > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Friday, March 02, 2007 6:13 PM > To: Tang, Changqing > Cc: openib-general at openib.org > Subject: Re: What is the size of async event queue ? > > > If a QP goes to error state, I will get completion error, > do I still get > any of IBV_EVENT_QP_FATAL/REQ_ERR/ACCESS_ERR event? > > It depends on why the QP is transitioned to the error state. > You can look at section 11.6.3.2 of the IB spec for details > on async errors, but the basic idea is that any error that > can be reported in a work completion will be reported that > way; only QP errors that don't have a way to be reported via > a CQ will generate an async event. > > - R. > From changquing.tang at hp.com Fri Mar 2 16:40:07 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Sat, 3 Mar 2007 00:40:07 -0000 Subject: [ofa-general] Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA84039979E4@G3W0634.americas.hpqcorp.net> References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com><1172845045.21241.0.camel@stevo-desktop><349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net><1172854585.21241.14.camel@stevo-desktop><1172854873.21241.19.camel@stevo-desktop><349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net><1172856154.21241.34.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039979E4@G3W0634.americas.hpqcorp.net> Message-ID: <349DCDA352EACF42A0C49FA6DCEA8403997AC1@G3W0634.americas.hpqcorp.net> > If I set fd to non-blocking mode, does ibv_get_async_event() > return 0 is there is an event, and return non-0 if nothing ? Roland, Can you answer this question ? > > --CQ > > > > -----Original Message----- > > From: Roland Dreier [mailto:rdreier at cisco.com] > > Sent: Friday, March 02, 2007 4:30 PM > > To: Steve Wise > > Cc: Tang, Changqing; General at lists.openfabrics.org > > Subject: Re: [ofa-general] Re: Is ibv_get_async_event() a blocking > > call ? > > > > > > I wonder if libibverbs can do this way for application > and make > > > > ibv_get_async_event() non-blocking. But I will try this way now. > > > > > > > > > > I wonder what happens if you set the async file descriptor to > > > non-blocking? > > > > > > Roland? Would that return EWOULDBLOCK if there are no events? > > > > Yes, as you can see if you look at the libibverbs internals, > > ibv_get_async_event() is just doing read() on a file > descriptor. So > > all the standard ways of handling finding out whether the > descriptor > > is ready to read should work: block in read(), use poll(), epoll, > > SIGIO, async IO, etc. And using fcntl to set the descriptor to > > non-blocking mode would work too. > > > > - R. > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From rjwalsh at pathscale.com Fri Mar 2 17:14:02 2007 From: rjwalsh at pathscale.com (Robert Walsh) Date: Fri, 02 Mar 2007 17:14:02 -0800 Subject: [ofa-general] Question about ibv_poll_cq Message-ID: <45E8CBDA.7040205@pathscale.com> Has something changed in ibv_poll_cq recently? We're trying to to bring libipathverbs up to date WRT the changes made to libibverbs recently, and I'm getting this problem when I try to run ibv_ud_pingpong (or any of the pingpongs, for that matter): $ ibv_ud_pingpong local address: LID 0x0031, QPN 0x000010, PSN 0x4adb33 remote address: LID 0x002d, QPN 0x000018, PSN 0x7b51b8 Failed status 2 for wr_id 1565327360 Looks like it gets through the poll loop once OK, and fails on the second run through it. Regards, Robert. From jsquyres at cisco.com Fri Mar 2 17:42:22 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Fri, 2 Mar 2007 20:42:22 -0500 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <1172685419.4777.145.camel@fc6.xsintricity.com> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> Message-ID: <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> To be totally clear, there are three issues: 1. *NOT AN MPI ISSUE*: base location of the stack. Doug has repeatedly mentioned that /usr/local/ofed is not good. This is a group issue to decide. 2. *NOT AN MPI ISSUE*: how the RPMs are built is Bad(tm). Not deleting the buildroot is Bad; munging %build into %install is Bad; ...etc. This needs to change. 4 choices jump to mind: a. Keep the same scheme. Ick. b. Install while we build (i.e., the normal way to build a pile of interdependent RPMs) c. Use chroot (Red Hat does this in their internal setup, for example) d. Only distribute binary RPMs for supported platforms; source is available for those who want it. 3. Doug's final point about allowing multiple MPI's to play harmoniously on a single system is obviously an MPI issue. The /etc/ alternatives mechanism is not really good enough (IMHO) for this -- / etc/alternatives is about choosing one implementation and making everyone use it. The problem is that when multiple MPI's are installed on a single system, people need all of them (some users prefer one over the other, but much more important, some apps are only certified with one MPI or another). The mpi-selector tool we introduced in OFED 1.2 will likely be "good enough" for this purpose, but we can also work on integrating the /etc/alternatives stuff if desired, particularly for those who only need/want one MPI implementation. On Feb 28, 2007, at 12:56 PM, Doug Ledford wrote: > On Wed, 2007-02-28 at 16:10 +0200, Tziporet Koren wrote: >> * Improved RPM usage by the install will not be part of OFED >> 1.2 > > Since I first brought this up, you have added new libraries, iWARP > support, etc. These constitute new RPMs. And, because you guys have > been doing things contrary to standards like the file hierarchy > standard > in the original RPMs, it's been carried forward to these new RPMs. > This > is a snowball, and the longer you put off fixing it, the harder it > gets > to change. And not just in your RPMs either. The longer you put off > coming up with a reasonable standard for MPI library and executable > file > locations, the longer customers will hand roll their own site specific > setups, and the harder it will be to get them to switch over to the > standard once you *do* implement it. You may end up dooming Jeff to > maintaining those custom file location hacks in the OpenMPI spec > forever. > > Not to mention that interoperability is about more than one machine > talking to another machine. It's also about a customer's application > building properly on different versions of the stack, without the > customer needing to change all the include file locations and link > parameters. It's also about a customer being able to rest assured > that > if they tried to install two conflicting copies of libibverbs, it > would > in fact cause RPM to throw conflict errors (which it doesn't now > because > your libibverbs is in /usr/local, where I'm not allowed to put > ours, so > since the files are in different locations, rpm will happily let the > user install both your libibverbs and my libibverbs without a > conflict, > and a customer could waste large amounts of time trying to track > down a > bug in one library only to find out their application is linking > against > the other). > >> * The RPM usage will be enhanced for the next (1.3) >> release and we will decide on the correct way in >> Sonoma. > > > > There's not really much to decide. Either the stack is Linux File > Hierarchy Standard compliant or it isn't. The only leeway for > decisions > allowed by the standard is on things like where in /etc to put the > config files (since you guys are striving to be a generic RDMA stack, > not just an IB stack, I would suggest that all RDMA related config > files > go into /etc/rdma, and for those applications that can reasonably > be run > absent RDMA technology, like OpenMPI, I would separate their config > files off into either /etc or /etc/openmpi, ditto for the include > directories, /usr/include/rdma for the generic non-IB specific stuff, > and possibly /usr/include/rdma/infiniband for IB specific stuff, or > you > could put the IB stuff under /usr/include/infiniband, either way). > > The biggest variation from the spec that needs to be dealt with is the > need for multiple MPI installations, which is problematic if you just > use generic locations as it stands today, but with a few modifications > to the MPI stack it could be worked around. > > > -- > Doug Ledford > GPG KeyID: CFBFF194 > http://people.redhat.com/dledford > > Infiniband specific RPMs available at > http://people.redhat.com/dledford/Infiniband > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From vlad at lists.openfabrics.org Sat Mar 3 02:14:25 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sat, 3 Mar 2007 02:14:25 -0800 (PST) Subject: [ofa-general] ofa_1_2_kernel 20070303-0200 daily build status Message-ID: <20070303101425.9A5D7E60366@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.14 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.17 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.16.21-0.8-default Failed: From patrick at xentech.net Sat Mar 3 05:09:13 2007 From: patrick at xentech.net (Patrick (Xentech)) Date: Sat, 3 Mar 2007 14:09:13 +0100 Subject: [ofa-general] Question (sorry) Message-ID: <6CD26886-6516-4257-8380-A2DDD50DC1D0@xentech.net> Hi, I'm sorry to bother you with this, but there is no other place I could think off. We're selling some of our IB stuff that we're not using anymore. It is all Topspin 90, 120, 270, 360 and HBA's (128MB PCI-X). Please let me know if you might have any interest? Kind regards Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From sashak at voltaire.com Sat Mar 3 05:59:46 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 3 Mar 2007 15:59:46 +0200 Subject: [ofa-general] [PATCH] opensm: yet another up/down speedup Message-ID: <20070303135946.GC12388@sashak.voltaire.com> The idea of this optimization is to perform all time consuming up/down min hops calculation cycles only for switches, and when this is ready just to populate (in one pass) calculated min hops values for CAs and routers as its neighbour switch's min hops + 1. Tests show yet another 6-7 times speedup. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_ucast_updn.c | 160 ++++++++++++++++++++----------------------- 1 files changed, 73 insertions(+), 87 deletions(-) diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c index 72d943b..39f3bef 100644 --- a/osm/opensm/osm_ucast_updn.c +++ b/osm/opensm/osm_ucast_updn.c @@ -161,86 +161,26 @@ static int __updn_bfs_by_node( IN osm_log_t *p_log, IN osm_subn_t *p_subn, - IN osm_port_t *p_port ) + IN osm_switch_t *p_sw ) { - osm_switch_t *p_self_node = NULL; uint8_t pn, pn_rem; - osm_physp_t *p_physp, *p_remote_physp; cl_qlist_t list; - uint16_t root_lid; + uint16_t lid; struct updn_node *u; updn_switch_dir_t next_dir, current_dir; OSM_LOG_ENTER( p_log, __updn_bfs_by_node ); - p_physp = osm_port_get_default_phys_ptr(p_port); - /* Check valid pointer */ - if (!p_physp || !osm_physp_is_valid( p_physp )) - { - OSM_LOG_EXIT( p_log ); - return 1; - } + lid = osm_node_get_base_lid(p_sw->p_node, 0); + lid = cl_ntoh16(lid); + osm_switch_set_hops(p_sw, lid, 0, 0); - /* The Root BFS - lid */ - root_lid = cl_ntoh16(osm_physp_get_base_lid( p_physp )); - /* printf ("-V- BFS through lid : 0x%x\n", root_lid); */ osm_log( p_log, OSM_LOG_DEBUG, "__updn_bfs_by_node: " - "Starting lid : 0x%x \n", root_lid ); + "Starting from switch - port GUID 0x%" PRIx64 " lid %u\n", + cl_ntoh64(p_sw->p_node->node_info.port_guid), lid ); - if (p_port->p_node->sw) - { - p_self_node = p_port->p_node->sw; - /* Update its Min Hop Table */ - osm_log( p_log, OSM_LOG_DEBUG, - "__updn_bfs_by_node: " - "Update Min Hop Table of GUID 0x%" PRIx64 "\n", - cl_ntoh64(p_port->guid) ); - osm_switch_set_hops(p_self_node, root_lid, 0, 0); - } - else - { - /* This is a CA or router - need to take its remote port */ - p_remote_physp = p_physp->p_remote_physp; - /* - make sure that the following occur: - 1. The port isn't NULL - 2. The port is a valid port - */ - if ( p_remote_physp && osm_physp_is_valid( p_remote_physp )) - { - /* Check if the remote port is a switch, and if it is, - update root_lid and Min Hop Table */ - if (!p_remote_physp->p_node->sw) - { - osm_log( p_log, OSM_LOG_ERROR, - "__updn_bfs_by_node: ERR AA07: " - "This is not a switched subnet OR valid connection, cannot perform UPDN algorithm\n" ); - OSM_LOG_EXIT( p_log ); - return 1; - } - else - { - p_self_node = p_remote_physp->p_node->sw; - /* Update its Min Hop Table */ - /* NOTE : Check if there is a function which prints the Min Hop Table */ - osm_log( p_log, OSM_LOG_DEBUG, - "__updn_bfs_by_node: " - "Update Min Hop Table of GUID 0x%" PRIx64 "\n", - cl_ntoh64(p_remote_physp->port_guid) ); - osm_switch_set_hops(p_self_node, root_lid, p_remote_physp->port_num, 1); - } - } - } - - CL_ASSERT(p_self_node); - - osm_log( p_log, OSM_LOG_DEBUG, - "__updn_bfs_by_node: " - "Starting from switch - port GUID 0x%" PRIx64 "\n", - cl_ntoh64(p_self_node->p_node->node_info.port_guid) ); - - u = p_self_node->priv; + u = p_sw->priv; u->dir = UP; /* Update list with the new element */ @@ -290,13 +230,13 @@ __updn_bfs_by_node( continue; } /* Set MinHop value for the current lid */ - current_min_hop = osm_switch_get_least_hops(u->sw, root_lid); + current_min_hop = osm_switch_get_least_hops(u->sw, lid); /* Check hop count if better insert into list && update the remote node Min Hop Table */ - remote_min_hop = osm_switch_get_hop_count(p_remote_sw, root_lid, pn_rem); + remote_min_hop = osm_switch_get_hop_count(p_remote_sw, lid, pn_rem); if (current_min_hop + 1 < remote_min_hop) { - set_hop_return_value = osm_switch_set_hops(p_remote_sw, root_lid, pn_rem, current_min_hop + 1); + set_hop_return_value = osm_switch_set_hops(p_remote_sw, lid, pn_rem, current_min_hop + 1); if (set_hop_return_value) { osm_log( p_log, OSM_LOG_ERROR, @@ -569,14 +509,60 @@ updn_subn_rank( /********************************************************************** **********************************************************************/ static int +populate_min_hops_for_cas( + osm_subn_t *p_subn, + osm_switch_t *p_sw ) +{ + osm_port_t *p_next_port,*p_port; + osm_physp_t *p_physp; + uint16_t lid, sw_lid; + uint8_t i, hops; + + p_next_port = (osm_port_t*)cl_qmap_head( &p_subn->port_guid_tbl ); + while( p_next_port != (osm_port_t*)cl_qmap_end( &p_subn->port_guid_tbl ) ) + { + p_port = p_next_port; + p_next_port = (osm_port_t*)cl_qmap_next( &p_port->map_item ); + + if (p_port->p_node->sw) + continue; + p_physp = osm_port_get_default_phys_ptr(p_port); + if (!p_physp || !p_physp->p_remote_physp || + !p_physp->p_remote_physp->p_node->sw) + continue; + + lid = osm_physp_get_base_lid(p_physp); + lid = cl_ntoh16(lid); + + if (p_physp->p_remote_physp->p_node->sw == p_sw) + { + osm_switch_set_hops(p_sw, lid, p_physp->p_remote_physp->port_num, 1); + continue; + } + + sw_lid = osm_node_get_base_lid(p_physp->p_remote_physp->p_node, 0); + sw_lid = cl_ntoh16(sw_lid); + + for (i = 1 ; i < p_sw->num_ports ; i++) + { + hops = osm_switch_get_hop_count(p_sw, sw_lid, i); + if (hops == OSM_NO_PATH) + continue; + osm_switch_set_hops(p_sw, lid, i, hops + 1); + } + } + return 0; +} + +/********************************************************************** + **********************************************************************/ +static int __osm_subn_set_up_down_min_hop_table( IN updn_t* p_updn ) { osm_subn_t *p_subn = &p_updn->p_osm->subn; osm_log_t *p_log = &p_updn->p_osm->log; osm_switch_t *p_next_sw,*p_sw; - osm_port_t *p_next_port,*p_port; - ib_net64_t port_guid; OSM_LOG_ENTER( p_log, __osm_subn_set_up_down_min_hop_table ); @@ -603,21 +589,21 @@ __osm_subn_set_up_down_min_hop_table( osm_log( p_log, OSM_LOG_VERBOSE, "__osm_subn_set_up_down_min_hop_table: " "BFS through all port guids in the subnet [\n" ); - p_next_port = (osm_port_t*)cl_qmap_head( &p_subn->port_guid_tbl ); - while( p_next_port != (osm_port_t*)cl_qmap_end( &p_subn->port_guid_tbl ) ) + + p_next_sw = (osm_switch_t*)cl_qmap_head( &p_subn->sw_guid_tbl ); + while( p_next_sw != (osm_switch_t*)cl_qmap_end( &p_subn->sw_guid_tbl ) ) { - p_port = p_next_port; - p_next_port = (osm_port_t*)cl_qmap_next( &p_port->map_item ); - port_guid = cl_qmap_key(&(p_port->map_item)); - osm_log( p_log, OSM_LOG_DEBUG, - "__osm_subn_set_up_down_min_hop_table: " - "BFS through port GUID 0x%" PRIx64 "\n", - cl_ntoh64(port_guid) ); - if(__updn_bfs_by_node(p_log, p_subn, p_port)) - { - OSM_LOG_EXIT( p_log ); - return 1; - } + p_sw = p_next_sw; + p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item ); + __updn_bfs_by_node(p_log, p_subn, p_sw); + } + + p_next_sw = (osm_switch_t*)cl_qmap_head( &p_subn->sw_guid_tbl ); + while( p_next_sw != (osm_switch_t*)cl_qmap_end( &p_subn->sw_guid_tbl ) ) + { + p_sw = p_next_sw; + p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item ); + populate_min_hops_for_cas(p_subn, p_sw); } osm_log( p_log, OSM_LOG_VERBOSE, -- 1.5.0.1.40.gb40d From roland.list at gmail.com Sat Mar 3 09:37:39 2007 From: roland.list at gmail.com (Roland Dreier) Date: Sat, 3 Mar 2007 09:37:39 -0800 Subject: [ofa-general] IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! In-Reply-To: <45E552FC.4040305@mellanox.co.il> References: <45E552FC.4040305@mellanox.co.il> Message-ID: > Feb 27 17:47:52 sw169 kernel: [] _spin_lock_irqsave+0x15/0x24 > Feb 27 17:47:52 sw169 kernel: [] :ib_ipoib:ipoib_neigh_destructor+0xc2/0x139 It looks like this is deadlocking trying to take priv->lock in ipoib_neigh_destructor(). One idea I just had would be to build a kernel with CONFIG_PROVE_LOCKING turned on, and then rerun this test. There's a good chance that this would diagnose the deadlock. (I don't have good access to my test machines right now, or else I would do it myself) - R. From mst at mellanox.co.il Sat Mar 3 13:51:50 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 3 Mar 2007 23:51:50 +0200 Subject: [ofa-general] Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy In-Reply-To: References: <20070302111658.GE27542@mellanox.co.il> Message-ID: <20070303215150.GB17950@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy > > > > I'm not quite sure I understand why we have to synchronize against the > > > completion EQ's interrupt here. > > > > Hmm, I'm not sure myself, now. > > I'm still thinking about this - the patch below is clearly correct > > and seems sufficient to fix the issue pointed out by bugzilla. > > So let's get it merged and I'll try to think about and address > > other isses (if any) in a separate patch. > > The more I think about it, the more I think that synchronizing against > the completion interrupt doesn't accomplish anything. The completion > interrupt itself only looks at the CQ, so it doesn't matter what we do > with the QP table or anything to do with QPs. And a consumer could > poll any CQ at any time, in or out of interrupt context, so we're not > protecting against anything that has to do with polling CQs. > > However, it does seem that we should also clean the CQs before > removing the QP from the table, to avoid polling completions for a QP > not in the QP table. > > And also synchronizing with the async event EQ's interrupts still > makes sense to me. > > I guess I don't quite understand why this change is enough to fix bug > #394 -- it seems it is just changing the timing without really closing > the race window completely. With current code, when we destroy a QP, we remove it from table first, and move QP to reset. This is clearly wrong, and this patch fixes this. To fix the issue completely, the simplest approach is to use the same EQ for completion and async events and for command interface. I plan to send such a patch next week. -- MST From jgunthorpe at obsidianresearch.com Sat Mar 3 14:37:02 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Sat, 3 Mar 2007 15:37:02 -0700 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: References: Message-ID: <20070303223702.GN25553@obsidianresearch.com> On Thu, Mar 01, 2007 at 05:04:43PM -0800, Shirley Ma wrote: > IPv6 ND doesn't prevent the duplicate IPv6 link-local address in the network. > It only saves a warning in /var/log/messages to indicate that this address is > duplicated in the network. ND can detect this when sending packets. That isn't quite accurate, if IPv6 DAD detects a duplicate then the address never leaves the tentative state and won't be usable. There is also a similar problem here with the timing of IPv6 router solicitation. Maybe the solution here is that IPv6 core should be waiting for the multicast groups it requests to register before doing DAD/RS? Jason From monisonlists at gmail.com Sun Mar 4 02:06:00 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Sun, 04 Mar 2007 12:06:00 +0200 Subject: [ofa-general] build failure on nightly tarball -- bonding In-Reply-To: <45E846F6.7070705@open-mpi.org> References: <45E846F6.7070705@open-mpi.org> Message-ID: <45EA9A08.1090500@gmail.com> Andrew Friedley wrote: > The chelsio build errors from yesterday appear to be gone, though now > I'm seeing errors building the IB bonding code with the 3/2 alpha > tarball -- error below. I'm wondering, is there a way to selectively > avoid building things like this that seem to be optional, as a tarball > user? > > Andrew For the error messages.... It seems to me that the problem is one that I have already fixed. The corrected source RPM is in my home dir. > > In file included from > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:78: > > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h: > In function `bond_set_slave_inactive_flags': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:260: > error: `IFF_SLAVE_INACTIVE' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:260: > error: (Each undeclared identifier is reported only once > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:260: > error: for each function it appears in.) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:262: > error: `IFF_SLAVE_NEEDARP' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h: > In function `bond_set_slave_active_flags': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:268: > error: `IFF_SLAVE_INACTIVE' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:268: > error: `IFF_SLAVE_NEEDARP' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h: > In function `bond_set_master_3ad_flags': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:273: > error: `IFF_MASTER_8023AD' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h: > In function `bond_unset_master_3ad_flags': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:278: > error: `IFF_MASTER_8023AD' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h: > In function `bond_set_master_alb_flags': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:283: > error: `IFF_MASTER_ALB' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h: > In function `bond_unset_master_alb_flags': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:288: > error: `IFF_MASTER_ALB' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: > At top level: > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:129: > error: invalid lvalue in unary `&' > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:129: > error: initializer element is not constant > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:129: > error: (near initialization for `__param_arr_arp_ip_target.num') > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:149: > error: `BOND_XMIT_POLICY_LAYER2' undeclared here (not in a function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:171: > error: initializer element is not constant > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:171: > error: (near initialization for `xmit_hashtype_tbl[0].mode') > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:171: > error: initializer element is not constant > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:171: > error: (near initialization for `xmit_hashtype_tbl[0]') > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:172: > error: `BOND_XMIT_POLICY_LAYER34' undeclared here (not in a function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:172: > error: initializer element is not constant > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:172: > error: (near initialization for `xmit_hashtype_tbl[1].mode') > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:172: > error: initializer element is not constant > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:172: > error: (near initialization for `xmit_hashtype_tbl[1]') > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:173: > error: initializer element is not constant > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:173: > error: (near initialization for `xmit_hashtype_tbl[2]') > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: > In function `bond_compute_features': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1230: > error: `NETIF_F_ALL_CSUM' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1230: > error: `NETIF_F_UFO' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: > In function `bond_enslave': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1428: > warning: implicit declaration of function `dev_set_mac_address' > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1449: > error: `IFF_BONDING' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: > In function `bond_release': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1847: > error: `IFF_MASTER_8023AD' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1847: > error: `IFF_MASTER_ALB' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1848: > error: `IFF_SLAVE_INACTIVE' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1848: > error: `IFF_BONDING' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1849: > error: `IFF_SLAVE_NEEDARP' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: > In function `bond_release_all': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1939: > error: `IFF_MASTER_8023AD' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1939: > error: `IFF_MASTER_ALB' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:1940: > error: `IFF_SLAVE_INACTIVE' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: > In function `bond_glean_dev_ip': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:2331: > warning: implicit declaration of function `__in_dev_get_rcu' > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:2331: > warning: assignment makes pointer from integer without a cast > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: > In function `bond_arp_rcv': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:2548: > error: `IFF_BONDING' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: > In function `bond_slave_netdev_event': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:3364: > error: `NETDEV_FEAT_CHANGE' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: > In function `bond_netdev_event': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:3390: > error: `IFF_BONDING' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: > In function `bond_register_lacpdu': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:3477: > warning: assignment from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: > In function `bond_register_arp': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:3494: > warning: assignment from incompatible pointer type > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: > At top level: > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4340: > error: unknown field `get_ufo' specified in initializer > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4340: > error: `ethtool_op_get_ufo' undeclared here (not in a function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4340: > error: initializer element is not constant > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4340: > error: (near initialization for `bond_ethtool_ops.set_tso') > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: > In function `bond_init': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4374: > warning: assignment discards qualifiers from pointer target type > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4386: > error: `IFF_BONDING' undeclared (first use in this function) > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: > In function `bond_create': > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4812: > warning: implicit declaration of function `lockdep_set_class' > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4812: > error: structure has no member named `_xmit_lock' > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c: > At top level: > /var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.c:4774: > error: storage size of `bonding_netdev_xmit_lock_key' isn't known > make[1]: *** > [/var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bond_main.o] > Error 1 > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From vlad at lists.openfabrics.org Sun Mar 4 02:15:32 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sun, 4 Mar 2007 02:15:32 -0800 (PST) Subject: [ofa-general] ofa_1_2_kernel 20070304-0200 daily build status Message-ID: <20070304101532.73A7DE6080E@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.16 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.13 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.16.21-0.8-default Failed: From monisonlists at gmail.com Sun Mar 4 02:32:15 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Sun, 04 Mar 2007 12:32:15 +0200 Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45E313D2.70909@voltaire.com> References: <45E313D2.70909@voltaire.com> Message-ID: <45EAA02F.4000108@gmail.com> This version of the patch tracks the allocs and releases of ipoib_neigh and keeps a list of them. Before IPoIB net device unregisters the list is passed to destroy ipoib_neighs that ride on on a bond neighbour. This is a replacement to the method of scanning the arp and ndisc tables. Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-04 12:20:54.749932751 +0200 +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-04 12:21:58.547593677 +0200 @@ -218,6 +218,7 @@ struct ipoib_neigh { struct neighbour *neighbour; struct net_device *dev; + struct list_head all_neigh_list; struct list_head list; }; Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-04 12:21:52.720629356 +0200 +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-04 12:21:58.548593499 +0200 @@ -66,6 +66,7 @@ MODULE_PARM_DESC(recv_queue_size, "Numbe #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG int ipoib_debug_level; +static int ipoib_at_exit = 0; module_param_named(debug_level, ipoib_debug_level, int, 0644); MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0"); #endif @@ -85,6 +86,9 @@ struct workqueue_struct *ipoib_workqueue struct ib_sa_client ipoib_sa_client; +static DEFINE_SPINLOCK(ipoib_all_neigh_list_lock); +static LIST_HEAD(ipoib_all_neigh_list); + static void ipoib_add_one(struct ib_device *device); static void ipoib_remove_one(struct ib_device *device); @@ -792,6 +796,24 @@ static void ipoib_neigh_destructor(struc ipoib_put_ah(ah); } +static void ipoib_neigh_cleanup_bond(struct net_device* master, + struct net_device* slave) +{ + struct ipoib_neigh *nn, *tn; + + spin_lock(&ipoib_all_neigh_list_lock); + list_for_each_entry_safe(nn, tn, &ipoib_all_neigh_list, all_neigh_list){ + if ((nn->neighbour->dev == master) && (nn->dev == slave)) { + if (ipoib_at_exit) + nn->neighbour->parms->neigh_destructor = NULL; + spin_unlock(&ipoib_all_neigh_list_lock); + ipoib_neigh_destructor(nn->neighbour); + spin_lock(&ipoib_all_neigh_list_lock); + } + } + spin_unlock(&ipoib_all_neigh_list_lock); +} + struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, struct net_device *dev) { @@ -806,6 +828,9 @@ struct ipoib_neigh *ipoib_neigh_alloc(st *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(&neigh->queue); + spin_lock(&ipoib_all_neigh_list_lock); + list_add_tail(&neigh->all_neigh_list, &ipoib_all_neigh_list); + spin_unlock(&ipoib_all_neigh_list_lock); return neigh; } @@ -818,6 +843,9 @@ void ipoib_neigh_free(struct net_device ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); } + spin_lock(&ipoib_all_neigh_list_lock); + list_del(&neigh->all_neigh_list); + spin_unlock(&ipoib_all_neigh_list_lock); kfree(neigh); } @@ -874,6 +902,8 @@ void ipoib_dev_cleanup(struct net_device /* Delete any child interfaces first */ list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs, list) { + if (cpriv->dev->master) + ipoib_neigh_cleanup_bond(cpriv->dev->master,priv->dev); unregister_netdev(cpriv->dev); ipoib_dev_cleanup(cpriv->dev); free_netdev(cpriv->dev); @@ -1158,6 +1188,8 @@ static void ipoib_remove_one(struct ib_d list_for_each_entry_safe(priv, tmp, dev_list, list) { ib_unregister_event_handler(&priv->event_handler); flush_scheduled_work(); + if (priv->dev->master) + ipoib_neigh_cleanup_bond(priv->dev->master,priv->dev); unregister_netdev(priv->dev); ipoib_dev_cleanup(priv->dev); @@ -1217,6 +1249,7 @@ err_fs: static void __exit ipoib_cleanup_module(void) { + ipoib_at_exit = 1; ib_unregister_client(&ipoib_client); ib_sa_unregister_client(&ipoib_sa_client); ipoib_unregister_debugfs(); From myopenib at gmail.com Sun Mar 4 04:05:27 2007 From: myopenib at gmail.com (Moni Levy) Date: Sun, 04 Mar 2007 14:05:27 +0200 Subject: [ofa-general] Re: [PATCHv3] IB/ipoib: Fix ipoib handling for pkey reordering In-Reply-To: <20070301145644.GL14282@mellanox.co.il> References: <45E6E7A0.7070902@voltaire.com> <20070301145644.GL14282@mellanox.co.il> Message-ID: <45EAB607.4010904@gmail.com> On 3/1/07, Michael S. Tsirkin wrote: > > SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence > > does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP. > > Please limit line length to 80 chars or so. Do you want me to change anything else? We really need that for OFED 1.2. Here is an updated patch. This issue was found during partitioning & SM fail over testing. The fix was tested over the weekend with pkey reshuffling, removal and addition every few seconds concurrent with OFED restart. Changes from v1: * added flush flag to ipoib_ib_dev_stop(), ipoib_ib_dev_down() alike * fixed a bug in device extraction from the work struct * removed some warnings in case they are caused due to missing PKEY as this seems like a valid flow now. Changes from v2: * less/fixed debug prints * flush_restart_qp stuff renamed to just restart_qp * the patch now depends on Roland's "IPoIB: Only handle async events for one port" SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP. Signed-off-by: Moni Levy --- drivers/infiniband/ulp/ipoib/ipoib.h | 4 + drivers/infiniband/ulp/ipoib/ipoib_ib.c | 51 ++++++++++++++++++++----- drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 11 ++--- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 7 ++- 5 files changed, 59 insertions(+), 19 deletions(-) Index: b/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-01 14:11:43.698307017 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-01 14:43:04.624119588 +0200 @@ -205,6 +205,7 @@ struct ipoib_dev_priv { struct delayed_work pkey_task; struct delayed_work mcast_task; struct work_struct flush_task; + struct work_struct restart_qp_task; struct work_struct restart_task; struct delayed_work ah_reap_task; @@ -334,12 +335,13 @@ struct ipoib_dev_priv *ipoib_intf_alloc( int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_ib_dev_flush(struct work_struct *work); +void ipoib_ib_dev_restart_qp(struct work_struct *work); void ipoib_ib_dev_cleanup(struct net_device *dev); int ipoib_ib_dev_open(struct net_device *dev); int ipoib_ib_dev_up(struct net_device *dev); int ipoib_ib_dev_down(struct net_device *dev, int flush); -int ipoib_ib_dev_stop(struct net_device *dev); +int ipoib_ib_dev_stop(struct net_device *dev, int flush); int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_dev_cleanup(struct net_device *dev); Index: b/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-01 14:11:43.713304355 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-01 16:14:17.003881103 +0200 @@ -415,21 +415,22 @@ int ipoib_ib_dev_open(struct net_device ret = ipoib_init_qp(dev); if (ret) { - ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); + if (ret != -ENOENT) + ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); return -1; } ret = ipoib_ib_post_receives(dev); if (ret) { ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -1; } ret = ipoib_cm_dev_open(dev); if (ret) { ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -1; } @@ -508,7 +509,7 @@ static int recvs_pending(struct net_devi return pending; } -int ipoib_ib_dev_stop(struct net_device *dev) +int ipoib_ib_dev_stop(struct net_device *dev, int flush) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_attr qp_attr; @@ -581,7 +582,8 @@ timeout: /* Wait for all AHs to be reaped */ set_bit(IPOIB_STOP_REAPER, &priv->flags); cancel_delayed_work(&priv->ah_reap_task); - flush_workqueue(ipoib_workqueue); + if (flush) + flush_workqueue(ipoib_workqueue); begin = jiffies; @@ -622,13 +624,17 @@ int ipoib_ib_dev_init(struct net_device return 0; } -void ipoib_ib_dev_flush(struct work_struct *work) +static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, int restart_qp) { - struct ipoib_dev_priv *cpriv, *priv = - container_of(work, struct ipoib_dev_priv, flush_task); + struct ipoib_dev_priv *cpriv; struct net_device *dev = priv->dev; - if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) ) { + /* + * ipoib_ib_dev_stop() below may not find the PKey and leave the + * IPOIB_FLAG_INITIALIZED flag off so flush in that case with restart_qp + * flag on is Ok. + */ + if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) && !restart_qp) { ipoib_dbg(priv, "Not flushing - IPOIB_FLAG_INITIALIZED not set.\n"); return; } @@ -642,6 +648,13 @@ void ipoib_ib_dev_flush(struct work_stru ipoib_ib_dev_down(dev, 0); + if (restart_qp) { + ipoib_dbg(priv, "restarting the device QP\n"); + if (test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) ) + ipoib_ib_dev_stop(dev, 0); + ipoib_ib_dev_open(dev); + } + /* * The device could have been brought down between the start and when * we get here, don't bring it back up if it's not configured up @@ -655,11 +668,29 @@ void ipoib_ib_dev_flush(struct work_stru /* Flush any child interfaces too */ list_for_each_entry(cpriv, &priv->child_intfs, list) - ipoib_ib_dev_flush(&cpriv->flush_task); + __ipoib_ib_dev_flush(cpriv, restart_qp); mutex_unlock(&priv->vlan_mutex); } +void ipoib_ib_dev_flush(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, flush_task); + /* We only restart the QP in case of pkey change event */ + ipoib_dbg(priv, "Flushing %s\n", priv->dev->name); + __ipoib_ib_dev_flush(priv, 0); +} + +void ipoib_ib_dev_restart_qp(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, restart_qp_task); + /* We only restart the QP in case of pkey change event */ + ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", priv->dev->name); + __ipoib_ib_dev_flush(priv, 1); +} + void ipoib_ib_dev_cleanup(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); Index: b/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-01 14:11:43.729301517 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-01 14:43:04.666112093 +0200 @@ -107,7 +107,7 @@ int ipoib_open(struct net_device *dev) return -EINVAL; if (ipoib_ib_dev_up(dev)) { - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -EINVAL; } @@ -152,7 +152,7 @@ static int ipoib_stop(struct net_device flush_workqueue(ipoib_workqueue); ipoib_ib_dev_down(dev, 1); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { struct ipoib_dev_priv *cpriv; @@ -993,6 +993,7 @@ static void ipoib_setup(struct net_devic INIT_DELAYED_WORK(&priv->pkey_task, ipoib_pkey_poll); INIT_DELAYED_WORK(&priv->mcast_task, ipoib_mcast_join_task); INIT_WORK(&priv->flush_task, ipoib_ib_dev_flush); + INIT_WORK(&priv->restart_qp_task, ipoib_ib_dev_restart_qp); INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); } Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-01 14:11:43.743299033 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-01 16:21:43.128181147 +0200 @@ -232,9 +232,10 @@ static int ipoib_mcast_join_finish(struc ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), &mcast->mcmember.mgid); if (ret < 0) { - ipoib_warn(priv, "couldn't attach QP to multicast group " - IPOIB_GID_FMT "\n", - IPOIB_GID_ARG(mcast->mcmember.mgid)); + if (ret != -ENXIO) /* No pkey found */ + ipoib_warn(priv, "couldn't attach QP to multicast group " + IPOIB_GID_FMT "\n", + IPOIB_GID_ARG(mcast->mcmember.mgid)); clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags); return ret; @@ -312,7 +313,7 @@ ipoib_mcast_sendonly_join_complete(int s status = ipoib_mcast_join_finish(mcast, &multicast->rec); if (status) { - if (mcast->logcount++ < 20) + if (mcast->logcount++ < 20 && status != -ENXIO) ipoib_dbg_mcast(netdev_priv(dev), "multicast join failed for " IPOIB_GID_FMT ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), status); @@ -416,7 +417,7 @@ static int ipoib_mcast_join_complete(int ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), status); - } else { + } else if (status != -ENXIO) { ipoib_warn(priv, "multicast join failed for " IPOIB_GID_FMT ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), Index: b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-03-01 14:39:46.712444790 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-03-01 16:12:55.069541201 +0200 @@ -52,8 +52,10 @@ int ipoib_mcast_attach(struct net_device if (ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) { clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); ret = -ENXIO; + ipoib_dbg(priv, "pkey %X not found\n", priv->pkey); goto out; } + ipoib_dbg(priv, "pkey %X found at index %d\n", priv->pkey, pkey_index); set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); /* set correct QKey for QP */ @@ -260,7 +262,6 @@ void ipoib_event(struct ib_event_handler container_of(handler, struct ipoib_dev_priv, event_handler); if ((record->event == IB_EVENT_PORT_ERR || - record->event == IB_EVENT_PKEY_CHANGE || record->event == IB_EVENT_PORT_ACTIVE || record->event == IB_EVENT_LID_CHANGE || record->event == IB_EVENT_SM_CHANGE || @@ -268,5 +269,9 @@ void ipoib_event(struct ib_event_handler record->element.port_num == priv->port) { ipoib_dbg(priv, "Port state change event\n"); queue_work(ipoib_workqueue, &priv->flush_task); + } else if (record->event == IB_EVENT_PKEY_CHANGE && + record->element.port_num == priv->port) { + ipoib_dbg(priv, "pkey change event on port:%d\n", priv->port); + queue_work(ipoib_workqueue, &priv->restart_qp_task); } } From dotanb at dev.mellanox.co.il Sun Mar 4 04:55:23 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Sun, 04 Mar 2007 14:55:23 +0200 Subject: [ofa-general] [libibumad] is the libibumad a thread safe library? Message-ID: <45EAC1BB.8080001@dev.mellanox.co.il> Hi Hal. I reviewed the libibumad and i noticed a static array which is called "ports", but i didn't notice any mutex that protect it. Can one use the libibumad in a multithreaded application? thanks Dotan From sashak at voltaire.com Sun Mar 4 08:13:14 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 04 Mar 2007 18:13:14 +0200 Subject: [ofa-general] [libibumad] is the libibumad a thread safe library? In-Reply-To: <45EAC1BB.8080001@dev.mellanox.co.il> References: <45EAC1BB.8080001@dev.mellanox.co.il> Message-ID: <1173024795.27423.4.camel@localhost> On Sun, 2007-03-04 at 14:55 +0200, Dotan Barak wrote: > Hi Hal. > > I reviewed the libibumad and i noticed a static array which is called > "ports", but i didn't notice any mutex that protect it. > Can one use the libibumad in a multithreaded application? Basically libibumad is not thread-safe. The racing part is umad_open_port()/umad_close_port(), the rest should be fine I guess. Sasha From dotanb at dev.mellanox.co.il Sun Mar 4 07:28:46 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Sun, 04 Mar 2007 17:28:46 +0200 Subject: [ofa-general] [libibumad] is the libibumad a thread safe library? In-Reply-To: <1173024795.27423.4.camel@localhost> References: <45EAC1BB.8080001@dev.mellanox.co.il> <1173024795.27423.4.camel@localhost> Message-ID: <45EAE5AE.5@dev.mellanox.co.il> Sasha Khapyorsky wrote: > On Sun, 2007-03-04 at 14:55 +0200, Dotan Barak wrote: > >> Hi Hal. >> >> I reviewed the libibumad and i noticed a static array which is called >> "ports", but i didn't notice any mutex that protect it. >> Can one use the libibumad in a multithreaded application? >> > > Basically libibumad is not thread-safe. The racing part is > umad_open_port()/umad_close_port(), the rest should be fine I guess. > > Sasha > Is this issue written in the release notes? Is there is any plan to fix this behavior? thanks Dotan From halr at voltaire.com Sun Mar 4 07:49:15 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Mar 2007 10:49:15 -0500 Subject: [ofa-general] [libibumad] is the libibumad a thread safe library? In-Reply-To: <45EAE5AE.5@dev.mellanox.co.il> References: <45EAC1BB.8080001@dev.mellanox.co.il> <1173024795.27423.4.camel@localhost> <45EAE5AE.5@dev.mellanox.co.il> Message-ID: <1173023348.4546.176270.camel@hal.voltaire.com> On Sun, 2007-03-04 at 10:28, Dotan Barak wrote: > Sasha Khapyorsky wrote: > > On Sun, 2007-03-04 at 14:55 +0200, Dotan Barak wrote: > > > >> Hi Hal. > >> > >> I reviewed the libibumad and i noticed a static array which is called > >> "ports", but i didn't notice any mutex that protect it. > >> Can one use the libibumad in a multithreaded application? > >> > > > > Basically libibumad is not thread-safe. The racing part is > > umad_open_port()/umad_close_port(), the rest should be fine I guess. > > > > Sasha > > > Is this issue written in the release notes? libibumad does not currently have release notes. This can easily be added for the next release. I'm not sure whether the OFED release notes mentions this or not. > Is there is any plan to fix this behavior? Not currently. -- Hal > > thanks > Dotan From sashak at voltaire.com Sun Mar 4 08:21:15 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 4 Mar 2007 18:21:15 +0200 Subject: [ofa-general] [PATCH TRIVIAL] opensm: remove unused osm_switch_is_leaf_lid() function Message-ID: <20070304162115.GA15777@sashak.voltaire.com> Remove unused osm_switch_is_leaf_lid() function. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_switch.h | 36 ------------------------------------ 1 files changed, 0 insertions(+), 36 deletions(-) diff --git a/osm/include/opensm/osm_switch.h b/osm/include/opensm/osm_switch.h index 4270904..7edacc4 100644 --- a/osm/include/opensm/osm_switch.h +++ b/osm/include/opensm/osm_switch.h @@ -213,42 +213,6 @@ osm_switch_new( * Switch object, osm_switch_delete *********/ -/****f* OpenSM: Switch/osm_switch_is_leaf_lid -* NAME -* osm_switch_is_leaf_lid -* -* DESCRIPTION -* Indicates if the specified LID is the switch's LID, or is a leaf -* of the switch. -* -* SYNOPSIS -*/ -static inline boolean_t -osm_switch_is_leaf_lid( - IN const osm_switch_t* const p_sw, - IN const uint16_t lid_ho ) -{ - return (lid_ho > p_sw->max_lid_ho || !p_sw->hops[lid_ho]) ? FALSE : - (p_sw->hops[lid_ho][0] <= 1); -} -/* -* PARAMETERS -* p_sw -* [in] Pointer to an osm_switch_t object. -* -* lid_ho -* [in] LID (host order) to compare. -* -* RETURN VALUES -* TRUE if the LID is the switch's LID or is a leaf of the switch, -* FALSE otherwise. -* -* NOTES -* -* SEE ALSO -* Switch object -*********/ - /****f* OpenSM: Switch/osm_switch_get_hop_count * NAME * osm_switch_get_hop_count -- 1.5.0.1.26.gf5a92 From bugzilla-daemon at lists.openfabrics.org Sun Mar 4 08:49:38 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Sun, 4 Mar 2007 08:49:38 -0800 (PST) Subject: [ofa-general] [Bug 410] New: compilation of srptools fails on all platfroms (x86_64, i686, ppc64, sles10 & rh4u3) Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=410 Summary: compilation of srptools fails on all platfroms (x86_64, i686, ppc64, sles10 & rh4u3) Product: OpenFabrics Linux Version: 1.2 Platform: All OS/Version: Other Status: NEW Severity: normal Priority: P1 Component: SRP AssignedTo: bugzilla at openib.org ReportedBy: yosefe at voltaire.com This is the bottom of OFED log file: make[1]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/srptools' make all-am make[2]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/srptools' if gcc -DHAVE_CONFIG_H -I. -I. -I. -I../libibverbs/include -I../management/libibumad/include -I../management/libibcommon/include -Wall -m64 -g -O2 -MT src_ibsrpdm-srp-dm.o -MD -MP -MF ".deps/src_ibsrpdm-srp-dm.Tpo" -c -o src_ibsrpdm-srp-dm.o `test -f 'src/srp-dm.c' || echo './'`src/srp-dm.c; \ then mv -f ".deps/src_ibsrpdm-srp-dm.Tpo" ".deps/src_ibsrpdm-srp-dm.Po"; else rm -f ".deps/src_ibsrpdm-srp-dm.Tpo"; exit 1; fi src/srp-dm.c: In function ‘send_and_get’: src/srp-dm.c:101: warning: dereferencing type-punned pointer will break strict-aliasing rules gcc -m64 -g -O2 -L../libibverbs/src/.libs -libverbs -L../management/libibumad/.libs -libumad -L../management/libibcommon/.libs -libcommon -L. -o src/ibsrpdm src_ibsrpdm-srp-dm.o -lpthread if gcc -DHAVE_CONFIG_H -I. -I. -I. -I../libibverbs/include -I../management/libibumad/include -I../management/libibcommon/include -Wall -I /usr/local/ofed/include -fno-strict-aliasing -m64 -g -O2 -MT srp_daemon_srp_daemon-srp_daemon.o -MD -MP -MF ".deps/srp_daemon_srp_daemon-srp_daemon.Tpo" -c -o srp_daemon_srp_daemon-srp_daemon.o `test -f 'srp_daemon/srp_daemon.c' || echo './'`srp_daemon/srp_daemon.c; \ then mv -f ".deps/srp_daemon_srp_daemon-srp_daemon.Tpo" ".deps/srp_daemon_srp_daemon-srp_daemon.Po"; else rm -f ".deps/srp_daemon_srp_daemon-srp_daemon.Tpo"; exit 1; fi if gcc -DHAVE_CONFIG_H -I. -I. -I. -I../libibverbs/include -I../management/libibumad/include -I../management/libibcommon/include -Wall -I /usr/local/ofed/include -fno-strict-aliasing -m64 -g -O2 -MT srp_daemon_srp_daemon-srp_handle_traps.o -MD -MP -MF ".deps/srp_daemon_srp_daemon-srp_handle_traps.Tpo" -c -o srp_daemon_srp_daemon-srp_handle_traps.o `test -f 'srp_daemon/srp_handle_traps.c' || echo './'`srp_daemon/srp_handle_traps.c; \ then mv -f ".deps/srp_daemon_srp_daemon-srp_handle_traps.Tpo" ".deps/srp_daemon_srp_daemon-srp_handle_traps.Po"; else rm -f ".deps/srp_daemon_srp_daemon-srp_handle_traps.Tpo"; exit 1; fi if gcc -DHAVE_CONFIG_H -I. -I. -I. -I../libibverbs/include -I../management/libibumad/include -I../management/libibcommon/include -Wall -I /usr/local/ofed/include -fno-strict-aliasing -m64 -g -O2 -MT srp_daemon_srp_daemon-srp_sync.o -MD -MP -MF ".deps/srp_daemon_srp_daemon-srp_sync.Tpo" -c -o srp_daemon_srp_daemon-srp_sync.o `test -f 'srp_daemon/srp_sync.c' || echo './'`srp_daemon/srp_sync.c; \ then mv -f ".deps/srp_daemon_srp_daemon-srp_sync.Tpo" ".deps/srp_daemon_srp_daemon-srp_sync.Po"; else rm -f ".deps/srp_daemon_srp_daemon-srp_sync.Tpo"; exit 1; fi gcc -m64 -g -O2 -L../libibverbs/src/.libs -libverbs -L../management/libibumad/.libs -libumad -L../management/libibcommon/.libs -libcommon -L. -o srp_daemon/srp_daemon srp_daemon_srp_daemon-srp_daemon.o srp_daemon_srp_daemon-srp_handle_traps.o srp_daemon_srp_daemon-srp_sync.o -libumad -libcommon -libverbs -lpthread -lpthread if gcc -DHAVE_CONFIG_H -I. -I. -I. -I../libibverbs/include -I../management/libibumad/include -I../management/libibcommon/include -Wall -m64 -g -O2 -MT emulate_udev_srp_post_multipath-srp_post_multipath.o -MD -MP -MF ".deps/emulate_udev_srp_post_multipath-srp_post_multipath.Tpo" -c -o emulate_udev_srp_post_multipath-srp_post_multipath.o `test -f 'emulate_udev/srp_post_multipath.c' || echo './'`emulate_udev/srp_post_multipath.c; \ then mv -f ".deps/emulate_udev_srp_post_multipath-srp_post_multipath.Tpo" ".deps/emulate_udev_srp_post_multipath-srp_post_multipath.Po"; else rm -f ".deps/emulate_udev_srp_post_multipath-srp_post_multipath.Tpo"; exit 1; fi emulate_udev/srp_post_multipath.c:8:37: error: srp_dm_multipath_daemon.h: No such file or directory emulate_udev/srp_post_multipath.c: In function ‘main’: emulate_udev/srp_post_multipath.c:19: error: storage size of ‘send_st’ isn’t known emulate_udev/srp_post_multipath.c:29: error: ‘KEY_FILE’ undeclared (first use in this function) emulate_udev/srp_post_multipath.c:29: error: (Each undeclared identifier is reported only once emulate_udev/srp_post_multipath.c:29: error: for each function it appears in.) emulate_udev/srp_post_multipath.c:57: error: ‘MAX_PATH_LENGTH’ undeclared (first use in this function) emulate_udev/srp_post_multipath.c:70: error: invalid application of ‘sizeof’ to incomplete type ‘struct msg’ emulate_udev/srp_post_multipath.c:19: warning: unused variable ‘send_st’ make[2]: *** [emulate_udev_srp_post_multipath-srp_post_multipath.o] Error 1 make[2]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/srptools' make[1]: *** [all] Error 2 make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/srptools' make: *** [srptools] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.25872 (%install) RPM build errors: Bad exit status from /var/tmp/rpm-tmp.25872 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-sdpnetstat --with-srptools --with-perftest --mandir=/usr/local/ofed/man' --define 'configure_options32 --with-mstflint --with-tvflash' --define 'build_32bit 1' /tmp/regtest/OFED-1.2-reg-20070304-1740/SRPMS/ofa_user-1.2-alpha1.src.rpm" -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sashak at voltaire.com Sun Mar 4 09:24:45 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 4 Mar 2007 19:24:45 +0200 Subject: [ofa-general] [PATCH TRIVIAL] opensm: don't alloc lmc related structures when lmc = 0 Message-ID: <20070304172445.GD15777@sashak.voltaire.com> Don't allocate memory for lmc optimizations buffers when lmc = 0. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_ucast_mgr.c | 38 ++++++++++++++++++++------------------ 1 files changed, 20 insertions(+), 18 deletions(-) diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index 47c7ef7..d6af997 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -709,27 +709,29 @@ __osm_ucast_mgr_process_port( OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_process_port ); - remote_sys_guids = malloc( sizeof(uint64_t) * lids_per_port ); - if( remote_sys_guids == NULL ) - { - osm_log( p_mgr->p_log, OSM_LOG_ERROR, - "__osm_ucast_mgr_process_port: ERR 3A09: " - "Cannot allocate array. Insufficient memory\n"); - goto Exit; - } + if (lids_per_port > 1) { + remote_sys_guids = malloc( sizeof(uint64_t) * lids_per_port ); + if( remote_sys_guids == NULL ) + { + osm_log( p_mgr->p_log, OSM_LOG_ERROR, + "__osm_ucast_mgr_process_port: ERR 3A09: " + "Cannot allocate array. Insufficient memory\n"); + goto Exit; + } - memset( remote_sys_guids, 0, sizeof(uint64_t) * lids_per_port ); + memset( remote_sys_guids, 0, sizeof(uint64_t) * lids_per_port ); - remote_node_guids = malloc( sizeof(uint64_t) * lids_per_port ); - if( remote_node_guids == NULL ) - { - osm_log( p_mgr->p_log, OSM_LOG_ERROR, - "__osm_ucast_mgr_process_port: ERR 3A0A: " - "Cannot allocate array. Insufficient memory\n"); - goto Exit; - } + remote_node_guids = malloc( sizeof(uint64_t) * lids_per_port ); + if( remote_node_guids == NULL ) + { + osm_log( p_mgr->p_log, OSM_LOG_ERROR, + "__osm_ucast_mgr_process_port: ERR 3A0A: " + "Cannot allocate array. Insufficient memory\n"); + goto Exit; + } - memset( remote_node_guids, 0, sizeof(uint64_t) * lids_per_port ); + memset( remote_node_guids, 0, sizeof(uint64_t) * lids_per_port ); + } osm_port_get_lid_range_ho( p_port, &min_lid_ho, &max_lid_ho ); -- 1.5.0.1.26.gf5a92 From mst at mellanox.co.il Sun Mar 4 09:59:15 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Mar 2007 19:59:15 +0200 Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45EAA02F.4000108@gmail.com> References: <45E313D2.70909@voltaire.com> <45EAA02F.4000108@gmail.com> Message-ID: <20070304175915.GG17950@mellanox.co.il> > Quoting Moni Shoua : > Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB > > This version of the patch tracks the allocs and releases of ipoib_neigh and > keeps a list of them. Before IPoIB net device unregisters the list is passed > to destroy ipoib_neighs that ride on on a bond neighbour. > > This is a replacement to the method of scanning the arp and ndisc > tables. Why does the list need to be global? We already have a per-device list of paths, and each of these in turn has a list of neighbours. Can't this be used? -- MST From mst at mellanox.co.il Sun Mar 4 10:20:48 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Mar 2007 20:20:48 +0200 Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45EAA02F.4000108@gmail.com> References: <45E313D2.70909@voltaire.com> <45EAA02F.4000108@gmail.com> Message-ID: <20070304182048.GG19828@mellanox.co.il> Some more issues: > This version of the patch tracks the allocs and releases of ipoib_neigh and > keeps a list of them. Before IPoIB net device unregisters the list is passed > to destroy ipoib_neighs that ride on on a bond neighbour. > > This is a replacement to the method of scanning the arp and ndisc > tables. > > > Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib.h > =================================================================== > --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-04 12:20:54.749932751 +0200 > +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-04 12:21:58.547593677 +0200 > @@ -218,6 +218,7 @@ struct ipoib_neigh { > struct neighbour *neighbour; > struct net_device *dev; > > + struct list_head all_neigh_list; > struct list_head list; > }; > > Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c > =================================================================== > --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-04 12:21:52.720629356 +0200 > +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-04 12:21:58.548593499 +0200 > @@ -66,6 +66,7 @@ MODULE_PARM_DESC(recv_queue_size, "Numbe > #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG > int ipoib_debug_level; > > +static int ipoib_at_exit = 0; > module_param_named(debug_level, ipoib_debug_level, int, 0644); > MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0"); > #endif This at_exit trick looks ugly. Ideally, hotplug removing all devices and module removal should act identically. The fact that they do not is suspicious. Consider hotplug removing all devices. It seems no code will test ipoib_at_exit then. Is that right? > @@ -85,6 +86,9 @@ struct workqueue_struct *ipoib_workqueue > > struct ib_sa_client ipoib_sa_client; > > +static DEFINE_SPINLOCK(ipoib_all_neigh_list_lock); > +static LIST_HEAD(ipoib_all_neigh_list); > + > static void ipoib_add_one(struct ib_device *device); > static void ipoib_remove_one(struct ib_device *device); > > @@ -792,6 +796,24 @@ static void ipoib_neigh_destructor(struc > ipoib_put_ah(ah); > } > > +static void ipoib_neigh_cleanup_bond(struct net_device* master, > + struct net_device* slave) > +{ > + struct ipoib_neigh *nn, *tn; > + > + spin_lock(&ipoib_all_neigh_list_lock); > + list_for_each_entry_safe(nn, tn, &ipoib_all_neigh_list, all_neigh_list){ > + if ((nn->neighbour->dev == master) && (nn->dev == slave)) { Extra ()'s not really needed here: logic ops have lower precedence than math (IIRC, only comma and assignments have lower precedence than logic). > + if (ipoib_at_exit) > + nn->neighbour->parms->neigh_destructor = NULL; Is it safe to do this without locking? Could the destructor be in progress when we do this? > + spin_unlock(&ipoib_all_neigh_list_lock); > + ipoib_neigh_destructor(nn->neighbour); > + spin_lock(&ipoib_all_neigh_list_lock); > + } > + } > + spin_unlock(&ipoib_all_neigh_list_lock); > +} > + > struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, > struct net_device *dev) > { > @@ -806,6 +828,9 @@ struct ipoib_neigh *ipoib_neigh_alloc(st > *to_ipoib_neigh(neighbour) = neigh; > skb_queue_head_init(&neigh->queue); > > + spin_lock(&ipoib_all_neigh_list_lock); > + list_add_tail(&neigh->all_neigh_list, &ipoib_all_neigh_list); > + spin_unlock(&ipoib_all_neigh_list_lock); > return neigh; > } > > @@ -818,6 +843,9 @@ void ipoib_neigh_free(struct net_device > ++priv->stats.tx_dropped; > dev_kfree_skb_any(skb); > } > + spin_lock(&ipoib_all_neigh_list_lock); > + list_del(&neigh->all_neigh_list); > + spin_unlock(&ipoib_all_neigh_list_lock); > kfree(neigh); > } > > @@ -874,6 +902,8 @@ void ipoib_dev_cleanup(struct net_device > > /* Delete any child interfaces first */ > list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs, list) { > + if (cpriv->dev->master) > + ipoib_neigh_cleanup_bond(cpriv->dev->master,priv->dev); whitespace broken here. > unregister_netdev(cpriv->dev); > ipoib_dev_cleanup(cpriv->dev); > free_netdev(cpriv->dev); > @@ -1158,6 +1188,8 @@ static void ipoib_remove_one(struct ib_d > list_for_each_entry_safe(priv, tmp, dev_list, list) { > ib_unregister_event_handler(&priv->event_handler); > flush_scheduled_work(); > + if (priv->dev->master) > + ipoib_neigh_cleanup_bond(priv->dev->master,priv->dev); and here. > > unregister_netdev(priv->dev); > ipoib_dev_cleanup(priv->dev); > @@ -1217,6 +1249,7 @@ err_fs: > > static void __exit ipoib_cleanup_module(void) > { > + ipoib_at_exit = 1; > ib_unregister_client(&ipoib_client); > ib_sa_unregister_client(&ipoib_sa_client); > ipoib_unregister_debugfs(); -- MST From mst at mellanox.co.il Sun Mar 4 12:10:10 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Mar 2007 22:10:10 +0200 Subject: [ofa-general] Re: [PATCH] IB/mthca: recv poll cq optimization In-Reply-To: References: <20070228210235.GC8564@mellanox.co.il> Message-ID: <20070304201010.GI19828@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH] IB/mthca: recv poll cq optimization > > > All good recv work requests generate HW completions in FIFO order, so we can use > > rq->tail rather than hardware data. In this way, we save a branch on data path > > for recv completions (branch is still there for send completions). > > > > Roland, what do you think? This increases the overall code size but I think the > > extra code is on the error CQE handling path. BTW, since most kernel QPs seem > > not to use selective signaling, it might be worth it to optimize send > > completions in a similiar way in case selective singaling is disabled on QP. > > Do you have any measurements that say this helps? Having a bigger > I-cache footprint is really globally worse for the system, so I don't > like this part: > > > + if (unlikely(is_error)) { > > + if (!is_send && !(*cur_qp)->ibqp.srq) { > > + s32 wqe = be32_to_cpu(cqe->wqe); > > + wqe_index = wqe >> wq->wqe_shift; > > + /* > > + * WQE addr == base - 1 might be reported in receive completion > > + * with error instead of (rq size - 1) by Sinai FW 1.0.800 and > > + * Arbel FW 5.1.400. This bug should be fixed in later FW revs. > > + */ > > + if (unlikely(wqe_index < 0)) > > + wqe_index = wq->max - 1; > > + } > > > > - if (is_error) { > > is there any way to move that into handle_error_cqe() so that it's > definitely out of the way for the normal path? I'll look into that. > In fact, why do we > need this code at all with your change -- aren't RQ error completions > reported in FIFO order too? Good point. > I'm not sure that it's worth testing whether a SQ has selective > signaling or not -- after all, that's just changing one conditional > branch for another. And in fact, looking at the code, I think we > could rewrite > > if (wq->last_comp < wqe_index) > wq->tail += wqe_index - wq->last_comp; > else > wq->tail += wqe_index + wq->max - wq->last_comp; > > as just > > wq->tail += (wqe_index + wq->max - wq->last_comp) & (wq->max - 1); > > and avoid the conditional in a simpler way. (I've not checked that > match but it looks right to me) This won't work for Tavor where wq.max is not a power of 2. -- MST From dstromberg at datallegro.com Sun Mar 4 17:21:21 2007 From: dstromberg at datallegro.com (Dan Stromberg) Date: Sun, 04 Mar 2007 17:21:21 -0800 Subject: [ofa-general] MAC addresses unique on infiniband networks? Message-ID: Hi folks. I don't see a generic infiniband discussion list anywhere, so I figured I'd try a Linux infiniband list. Does anyone know for sure if MAC addresses are unique in infiniband networks, with or without ipoib? I know with ethernet, they're supposed to be, but: 1) Sun sometimes likes to put the same MAC address on more than one subnet iun machines with more than one NIC 2) Some ethernet drivers let people change the MAC address Is it about the same with infiniband? I tried googling, but didn't find my answer. Thanks! From weikuan.yu at gmail.com Sun Mar 4 18:08:37 2007 From: weikuan.yu at gmail.com (Weikuan Yu) Date: Sun, 4 Mar 2007 21:08:37 -0500 Subject: [ofa-general] HotI 2007 Call for Papers -- Updated TPC In-Reply-To: <9ca1c1890703041737k28f4a24cp82ae4a7cac911897@mail.gmail.com> References: <9ca1c1890703041737k28f4a24cp82ae4a7cac911897@mail.gmail.com> Message-ID: <9ca1c1890703041808n3dbe32c2i341c74a8977c3b43@mail.gmail.com> -------------------------------------------------------------------- We apologize if you received multiple copies of this posting. Please feel free to distribute it to those who might be interested. -------------------------------------------------------------------- Hot Interconnects 15 IEEE Symposium on High-Performance Interconnects August 22-24, 2007 Stanford University Palo Alto, California, USA Hot Interconnects is the premier international forum for researchers and developers of state-of-the-art hardware and software architectures and implementations for interconnection networks of all scales, ranging from on-chip processor�~@~Smemory interconnects to wide-area networks. This yearly conference is very well attended by leaders in industry and academia. The atmosphere provides for a wealth of opportunities to interact with individuals at the forefront of this field. Themes include cross-cutting issues spanning computer systems, networking technologies, and communication protocols. This conference is directed particularly at new and exciting technology and product innovations in these areas. Contributions should focus on real experimental systems, prototypes, or leading-edge products and their performance evaluation. In addition to those subscribing to the main theme of the conference, contributions are also solicited in the topics listed below. * Novel and innovative interconnect architectures * Multi-core processor interconnects * System-on-Chip Interconnects * Advanced chip-to-chip communication technologies * Optical interconnects * Protocol and interfaces for interprocessor communication * Survivability and fault-tolerance of interconnects * High-speed packet processing engines and network processors * System and storage area network architectures and protocols * High-performance host-network interface architectures * High-bandwidth and low-latency I/O * Tb/s switching and routing technologies * Innovative architectures for supporting collective communication * Novel communication architectures to support grid computing Submission Guideline o Submission deadline: March 31, 2007 o Notification of acceptance: May 15, 2007 o Papers need sufficient technical detail to judge quality and suitability for presentation. o Submit title, author, abstract, and full paper (six pages, double-column, IEEE format). o Papers should be submitted electronically at the specified link location found on http://www.hoti.org o For further information please see http://www.hoti.org/hoti15/cfp.html About the Conference - Conference held at the William Hewlett Teaching Center at Stanford University. - Papers selected will be published in proceedings by the IEEE Computer Society. - Presentations are 30-minute talks in a single-track format. - Online information at http://www.hoti.org GENERAL CO-CHAIRS * John W. Lockwood, Washington University in St. Louis * Fabrizio Petrini, Pacific Northwest National Laboratory TECHNICAL CO-CHAIRS * Ron Brightwell, Sandia National Laboratories * Dhabaleswar (DK) Panda, The Ohio State University LOCAL ARRANGEMENTS CHAIR * Songkrant Muneenaem, Washington University in St. Louis PANEL CHAIR * Daniel Pitt, Santa Clara University PUBLICITY CO-CHAIRS * Weikuan Yu, Oak Ridge National Laboratory PUBLICATION CHAIR * Luca Valcarenghi, Scuola Superiore Sant'Anna FINANCE CHAIR * Herzel Ashkenazi, XilinxREGISTRATION CHAIR * Songkrant Muneenaem, Washington University in St. Louis TUTORIAL CO-CHAIRS - TBA REGISTRATION CHAIR * Songkrant Muneenaem, Washington University in St. Louis Webmaster * Liz Rogers, LRD Group Steering Committee o Allen Baum, Intel o Lily Jow, Hewlett Packard o Mark Laubach, Broadband Physics o John Lockwood, Stanford University o Daniel Pitt, Santa Clara University Technical Program Committee * Dennis Abts Cray, Inc. * Adnan Aziz University of Texas, Austin * Alan Benner IBM * Keren Bergman Columbia University * Andrea Bianco Politecnico di Torino * Piero Castoldi Scuola Superiore Sant'Anna * Sarang Dharmapurikar Nuova Systems * Hans Eberle Sun * Wu-chun Feng Virginia Tech * Juan Fernandez University of Murcia * Ada Gavrilovska Georgia Institute of Technology * Paolo Giaccone Politecnico di Torino * Mitchell Gusat IBM Zurich Research Laboratory * Ron Ho Sun Microsystems Laboratories * Doan Hoang University of Technology, Sydney * Jayasimha Jay Intel * Isaac Keslassy Technion * Venkata Krishnan Stargen Inc. * Tal Lavian Nortel Networks Labs, UC Berkeley * Bill Lin University of California, San Diego * Olav Lysne Simula Research Laboratory * Pankaj Mehra HP Labs * Rami Melhem University of Pittsburgh * Cyriel Minkenberg IBM Zurich Research Laboratory * Gregory Pfister IBM * Craig Stunkel IBM T.J. Watson Research Center * Anujan Varma University of California at Santa Cruz -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Sun Mar 4 23:00:10 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 5 Mar 2007 09:00:10 +0200 Subject: [ofa-general] Fwd: [ANNOUNCE] GIT 1.5.0.3 Message-ID: <20070305070010.GK19828@mellanox.co.il> FYI ----- Forwarded message from Junio C Hamano ----- Subject: [ANNOUNCE] GIT 1.5.0.3 Date: Mon, 5 Mar 2007 05:17:11 +0200 In-Reply-To: <7vabz0j7td.fsf at assigned-by-dhcp.cox.net> (Junio C. Hamano'smessage of "Tue, 27 Feb 2007 00:58:22 -0800") References: <7vwt2ec32p.fsf at assigned-by-dhcp.cox.net><7vabz0j7td.fsf at assigned-by-dhcp.cox.net> From: Junio C Hamano The latest maintenance release GIT 1.5.0.3 is available at the usual places: http://www.kernel.org/pub/software/scm/git/ git-1.5.0.3.tar.{gz,bz2} (tarball) git-htmldocs-1.5.0.3.tar.{gz,bz2} (preformatted docs) git-manpages-1.5.0.3.tar.{gz,bz2} (preformatted docs) RPMS/$arch/git-*-1.5.0.3-1.$arch.rpm (RPM) GIT v1.5.0.3 Release Notes ========================== Fixes since v1.5.0.2 -------------------- * Bugfixes - 'git.el' honors the commit coding system from the configuration. - 'blameview' in contrib/ correctly digs deeper when a line is clicked. - 'http-push' correctly makes sure the remote side has leading path. Earlier it started in the middle of the path, and incorrectly. - 'git-merge' did not exit with non-zero status when the working tree was dirty and cannot fast forward. It does now. - 'cvsexportcommit' does not lose yet-to-be-used message file. - int-vs-size_t typefix when running combined diff on files over 2GB long. - 'git apply --whitespace=strip' should not touch unmodified lines. - 'git-mailinfo' choke when a logical header line was too long. - 'git show A..B' did not error out. Negative ref ("not A" in this example) does not make sense for the purpose of the command, so now it errors out. - 'git fmt-merge-msg --file' without file parameter did not correctly error out. - 'git archimport' barfed upon encountering a commit without summary. - 'git index-pack' did not protect itself from getting a short read out of pread(2). - 'git http-push' had a few buffer overruns. - Build dependency fixes to rebuild fetch.o when other headers change. * Documentation updates - user-manual updates. - Options to 'git remote add' were described insufficiently. - Configuration format.suffix was not documented. - Other formatting and spelling fixes. ---------------------------------------------------------------- Shortlog since v1.5.0.2 ----------------------- Alexandre Julliard (1): git.el: Set the default commit coding system from the repository config. Aneesh Kumar K.V (1): blameview: Fix the browse behavior in blameview Christian Schlotter (1): Documentation: Correct minor typo in git-add documentation. Eygene Ryabinkin (2): http-push.c::lock_remote(): validate all remote refs. Another memory overrun in http-push.c Gerrit Pape (2): git-cvsexportcommit: don't cleanup .msg if not yet committed to cvs. Fix quoting in update hook template J. Bruce Fields (6): Documentation: mention module option to git-cvsimport user-manual: reset to ORIG_HEAD not HEAD to undo merge user-manual: ensure generated manual references stylesheet user-manual: insert earlier of mention content-addressable architecture user-manual: how to replace commits older than most recent user-manual: more detailed merge discussion Jim Meyering (1): diff --cc: integer overflow given a 2GB-or-larger file Johannes Schindelin (3): fetch.o depends on the headers, too. builtin-archive: use RUN_SETUP Document the config variable format.suffix Junio C Hamano (5): git-apply: do not fix whitespaces on context lines. Documentation: git-remote add [-t ] [-m ] [-f] name url Start preparing Release Notes for 1.5.0.3 git-merge: fail correctly when we cannot fast forward. GIT 1.5.0.3 Linus Torvalds (2): mailinfo: do not get confused with logical lines that are too long. git-show: Reject native ref Matthias Kestenholz (1): Fix git-gc usage note Michael Coleman (2): Fix minor typos/grammar in user-manual.txt builtin-fmt-merge-msg: fix bugs in --file option Michael Poole (1): Correct ordering in git-cvsimport's option documentation Paolo Bonzini (1): git-archimport: support empty summaries, put summary on a single line. Ramsay Allan Jones (5): Fix a "label defined but unreferenced" warning. Fix an "implicit function definition" warning. Fix some "comparison is always true/false" warnings. Fix a "pointer type missmatch" warning. Unset NO_C99_FORMAT on Cygwin. Sergey Vlasov (3): Documentation/build-docdep.perl: Fix dependencies for included asciidoc files Documentation/git-quiltimport.txt: Fix labeled list formatting Documentation/git-send-email.txt: Fix labeled list formatting Shawn O. Pearce (1): index-pack: Loop over pread until data loading is complete. Theodore Ts'o (1): Fix git-show man page formatting in the EXAMPLES section Uwe Kleine-König (1): Include config.mak in doc/Makefile Yasushi SHOJI (1): glossary: Add definitions for dangling and unreachable objects - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ----- End forwarded message ----- -- MST From ogerlitz at voltaire.com Sun Mar 4 23:19:00 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 05 Mar 2007 09:19:00 +0200 Subject: [ofa-general] MAC addresses unique on infiniband networks? In-Reply-To: References: Message-ID: <45EBC464.4000905@voltaire.com> Dan Stromberg wrote: > Hi folks. > > I don't see a generic infiniband discussion list anywhere, so I figured > I'd try a Linux infiniband list. > > Does anyone know for sure if MAC addresses are unique in infiniband > networks, with or without ipoib? sticking to IPoIB, please take a look at section 9.1.1. "Link-Layer Address/Hardware Address" of RFC 4391, and elsewhere in this doc to get answers to your questions below. Or. > I know with ethernet, they're supposed to be, but: > 1) Sun sometimes likes to put the same MAC address on more than one subnet > iun machines with more than one NIC > 2) Some ethernet drivers let people change the MAC address > > Is it about the same with infiniband? From vlad at dev.mellanox.co.il Mon Mar 5 01:27:26 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 05 Mar 2007 11:27:26 +0200 Subject: [ofa-general] [PATCH ofed_1_2 0/6] iw_cxgb3: Bug Fixes In-Reply-To: <20070302231750.2701.64219.stgit@dell3.ogc.int> References: <20070302231750.2701.64219.stgit@dell3.ogc.int> Message-ID: <45EBE27E.5050301@dev.mellanox.co.il> Steve Wise wrote: > Vlad, > > Here is a set of bug fixes for iw_cxgb3 that I'd like to roll into > ofed_1_2 beta. > > They can be pulled from: > > http://staging.openfabrics.org/~swise/ofed_1_2 iw_cxgb3_fixes > > Thanks, > > Steve. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general Merged into ofed_1_2. Regards, Vladimir From or.gerlitz at gmail.com Mon Mar 5 01:37:45 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Mon, 5 Mar 2007 11:37:45 +0200 Subject: [ofa-general] What is the size of async event queue ? In-Reply-To: <15ddcffd0703050136x4f0814c3n30e03d849521a9b7@mail.gmail.com> References: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> <15ddcffd0703050136x4f0814c3n30e03d849521a9b7@mail.gmail.com> Message-ID: <15ddcffd0703050137u73782630gbfe0fc9e2912a60c@mail.gmail.com> added general at lists.openfabrics.org to the thread, sorry for the double post. On 3/5/07, Or Gerlitz wrote: > On 3/2/07, Tang, Changqing wrote: > > > What is the default size of the async event queue ? Suppose I > > create 1024 QP from one process to another process, > > Somehow the remote process crashes, Can I get all the 1024 QP error > > async event, how do I make sure I don't loss an event ? > > CQ, > > I want to understand what is the exact fearure you need. > > for example, if TCP is used the equivalent of this is that following a > remote process crash the remote node/s TCP stack close the TCP > connections and when ever the local process attempts to use the socket > it get an errno telling this connection was closed ?! > > Since you use RC QP, --if-- you attempt doing post_send (or rdma) to a > QP whose connected peer QP is not responding, you will get CQ > completion with "retry exceeded" error. > > If the above case (notification following post send) is not enough, > the IB CM which you can use through libibcm or librdmacm provides the > same functionality (sends DREQ if the process crashes) with the > distinction that over TCP the same primitive (socket) is use for conn > management and conn data xfer, where over IB, the QP is used for data > and the IB CM Id (or the RDMA CM Id) is used for conn management. > > Combining possibilities: if you want to get a notification on every > peer process crash, you would need to either poll/select once a while > the libibcm/librdmacm event queue or implement some keep a live of > your own protocol. For instance, I think the IB spec mentions doing > zero length rdma write once in a while as a mean for implementing such > protocol. > > Or. > From or.gerlitz at gmail.com Mon Mar 5 01:36:34 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Mon, 5 Mar 2007 11:36:34 +0200 Subject: [ofa-general] What is the size of async event queue ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> References: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> Message-ID: <15ddcffd0703050136x4f0814c3n30e03d849521a9b7@mail.gmail.com> On 3/2/07, Tang, Changqing wrote: > What is the default size of the async event queue ? Suppose I > create 1024 QP from one process to another process, > Somehow the remote process crashes, Can I get all the 1024 QP error > async event, how do I make sure I don't loss an event ? CQ, I want to understand what is the exact fearure you need. for example, if TCP is used the equivalent of this is that following a remote process crash the remote node/s TCP stack close the TCP connections and when ever the local process attempts to use the socket it get an errno telling this connection was closed ?! Since you use RC QP, --if-- you attempt doing post_send (or rdma) to a QP whose connected peer QP is not responding, you will get CQ completion with "retry exceeded" error. If the above case (notification following post send) is not enough, the IB CM which you can use through libibcm or librdmacm provides the same functionality (sends DREQ if the process crashes) with the distinction that over TCP the same primitive (socket) is use for conn management and conn data xfer, where over IB, the QP is used for data and the IB CM Id (or the RDMA CM Id) is used for conn management. Combining possibilities: if you want to get a notification on every peer process crash, you would need to either poll/select once a while the libibcm/librdmacm event queue or implement some keep a live of your own protocol. For instance, I think the IB spec mentions doing zero length rdma write once in a while as a mean for implementing such protocol. Or. From vlad at dev.mellanox.co.il Mon Mar 5 02:04:49 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 05 Mar 2007 12:04:49 +0200 Subject: [ofa-general] [PATCH ofed-1.2-beta 0/5] ehca: bug fixes for kernel space In-Reply-To: <200703020924.33999.hnguyen@linux.vnet.ibm.com> References: <200703020924.33999.hnguyen@linux.vnet.ibm.com> Message-ID: <45EBEB41.40104@dev.mellanox.co.il> Hoang-Nam Nguyen wrote: > Hello Vladimir! > This is a patch set for ehca with various bug fixes (being queued for 2.6.21) > I'd like to incorporate in ofed-1.2-beta. > - reworked irq handler to avoid/reduce missed irq events > For this I've sent a patch that you have pushed in your git tree ofed_1_2 as > kernel_patches/fixes/ehca_2_rework_irq_handler.patch > However I realized that it's not applicable since I created it without > applying the previous patch. Therefore I'm sending it again. Please > replace kernel_patches/fixes/ehca_2_rework_irq_handler.patch in your > git tree by this new one. Sorry for this inconvenience! > BTW: The build script does not recognize such one mistake and just > compiles and installs without error, since nothing is broken without > this patch. > - fix race condition/locking issues in scaling code > - allow en/disabling scaling code via module parameter > - query_port() returns LINK_UP instead UNKNOWN > - fix mismatched sync between completion handler and destroy cq > Thanks! > Nam > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > Added to kernel_patches/fixes: kernel_patches/fixes/ehca_2_rework_irq_handler.patch (replaced ehca_2_rework_irq_handler.patch by patch 1/5) kernel_patches/fixes/ehca_3_fix_race_condition_locking_issues.patch (2/5) kernel_patches/fixes/ehca_4_allow_en_disabling.patch (3/5) kernel_patches/fixes/ehca_5_query_port.patch (4/5) kernel_patches/fixes/ehca_6_fix_mismatched_sync.patch (5/5) Regards, Vladimir From ogerlitz at voltaire.com Mon Mar 5 02:13:28 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 05 Mar 2007 12:13:28 +0200 Subject: [ofa-general] RE: [OFA General] List Address Change Completed In-Reply-To: <3D84A59A1AD3584DA02AEAD240E8863F0366949B@ES22SNLNT.srn.sandia.gov> References: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov> <20070228072341.GB22246@mellanox.co.il> <3D84A59A1AD3584DA02AEAD240E8863F0366949B@ES22SNLNT.srn.sandia.gov> Message-ID: <45EBED48.3040008@voltaire.com> > > Quoting Lee, Michael Paichi : > > Subject: [OFA General] List Address Change Completed > > > > This list has been migrated to the new server, > lists.openfabrics.org. Please update any address book or filter > settings to reflect the new mailing list address. Future messages and > replies should be sent to this address: > > > > general at lists.openfabrics.org > > > > The new web address for this list is: > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > If you have any questions, please contact me at mplee at sandia.gov There is no link to the general list archive from the "openfabrics.org Mailing Lists" page @ https://openfabrics.org/mailman/admin Or. From vlad at lists.openfabrics.org Mon Mar 5 02:15:16 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Mon, 5 Mar 2007 02:15:16 -0800 (PST) Subject: [ofa-general] ofa_1_2_kernel 20070305-0200 daily build status Message-ID: <20070305101517.0EDB3E60818@openfabrics.org> This email was generated automatically, please do not reply Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.12 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.20 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.16 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on powerpc with linux-2.6.15 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.16.21-0.8-default Failed: From vlad at dev.mellanox.co.il Mon Mar 5 02:28:25 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 05 Mar 2007 12:28:25 +0200 Subject: [ofa-general] Re: [ewg] [PATCH ofed 1.2 1/2] Patch to build 32-bit binaries on ppc64 In-Reply-To: <200703021932.05511.ossrosch@linux.vnet.ibm.com> References: <200703021932.05511.ossrosch@linux.vnet.ibm.com> Message-ID: <45EBF0C9.4000000@dev.mellanox.co.il> Stefan Roscher wrote: > this patch set build_32bit variable to 1 in case of ppc64 > > > Signed-off-by: Stefan Roscher > --- > > > --- OFED-1.2-20070301-0600_old/build.sh 2007-03-01 06:00:02.000000000 -0800 > +++ OFED-1.2-20070301-0600_new/build.sh 2007-03-02 10:29:55.000000000 -0800 > @@ -88,7 +88,11 @@ build_kernel_ib_devel=0 > modprobe_update=1 > include_ipoib_conf=1 > apply_hpage_patch=1 > +if [ ! $ARCH = "ppc64" ]; then > build_32bit=0 > +else > +build_32bit=1 > +fi > > # Environment variables definition > BUILD_COUNTER=0 > > > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > Applied. Regards, Vladimir From vlad at dev.mellanox.co.il Mon Mar 5 02:29:06 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 05 Mar 2007 12:29:06 +0200 Subject: [ofa-general] Re: [ewg] [PATCH ofed 1.2 2/2] Patch to build 32-bit binaries on ppc64 In-Reply-To: <200703021933.35822.ossrosch@linux.vnet.ibm.com> References: <200703021933.35822.ossrosch@linux.vnet.ibm.com> Message-ID: <45EBF0F2.3010301@dev.mellanox.co.il> Stefan Roscher wrote: > this patch disabled restore of 64-bit binaries in case of ppc64 > and allows 32-bit binaries to be packaged in rpm. > > > Signed-off-by: Stefan Roscher > --- > > > --- OFED-1.2-20070301-0600_old/SOURCES/ofa_user-1.2/ofed_scripts/ofa_user.spec 2007-03-01 06:03:28.000000000 -0800 > +++ OFED-1.2-20070301-0600_new/SOURCES/ofa_user-1.2/ofed_scripts/ofa_user.spec 2007-03-02 11:05:10.000000000 -0800 > @@ -539,7 +539,8 @@ make DESTDIR=$RPM_BUILD_ROOT install_use > ./configure --prefix=%{_prefix} --libdir=%{_libdir32} --without-patch %{configure_options32} > make user > make DESTDIR=$RPM_BUILD_ROOT install_user > - # Backup 32 bit binaries > + %ifarch x86_64 > + # Backup 32 bit binaries > if [ -d $RPM_BUILD_ROOT%{_prefix}/bin ]; then > mv $RPM_BUILD_ROOT%{_prefix}/bin $RPM_BUILD_ROOT%{_prefix}/bin32 > fi > @@ -553,6 +554,7 @@ make DESTDIR=$RPM_BUILD_ROOT install_use > if [ -d $RPM_BUILD_ROOT%{_prefix}/sbin64 ]; then > mv $RPM_BUILD_ROOT%{_prefix}/sbin64 $RPM_BUILD_ROOT%{_prefix}/sbin > fi > + %endif > if [ -f $RPM_BUILD_ROOT%{_prefix}/sbin32/tvflash ] && [ ! -f $RPM_BUILD_ROOT%{_prefix}/sbin/tvflash ]; then > mkdir -p $RPM_BUILD_ROOT%{_prefix}/sbin > install -m 0755 $RPM_BUILD_ROOT%{_prefix}/sbin32/tvflash $RPM_BUILD_ROOT%{_prefix}/sbin/tvflash > > > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > Applied. Regards, Vladimir From halr at voltaire.com Mon Mar 5 05:17:31 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Mar 2007 08:17:31 -0500 Subject: [ofa-general] Re: [PATCH TRIVIAL] opensm: remove unused osm_switch_is_leaf_lid() function In-Reply-To: <20070304162115.GA15777@sashak.voltaire.com> References: <20070304162115.GA15777@sashak.voltaire.com> Message-ID: <1173100643.4546.256008.camel@hal.voltaire.com> On Sun, 2007-03-04 at 11:21, Sasha Khapyorsky wrote: > Remove unused osm_switch_is_leaf_lid() function. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to both master and ofed_1_2). -- Hal From halr at voltaire.com Mon Mar 5 05:18:56 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Mar 2007 08:18:56 -0500 Subject: [ofa-general] Re: [PATCH TRIVIAL] opensm: don't alloc lmc related structures when lmc = 0 In-Reply-To: <20070304172445.GD15777@sashak.voltaire.com> References: <20070304172445.GD15777@sashak.voltaire.com> Message-ID: <1173100654.4546.256010.camel@hal.voltaire.com> On Sun, 2007-03-04 at 12:24, Sasha Khapyorsky wrote: > Don't allocate memory for lmc optimizations buffers when lmc = 0. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to both master and ofed_1_2). -- Hal From monisonlists at gmail.com Mon Mar 5 06:28:28 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Mon, 05 Mar 2007 16:28:28 +0200 Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <20070304175915.GG17950@mellanox.co.il> References: <45E313D2.70909@voltaire.com> <45EAA02F.4000108@gmail.com> <20070304175915.GG17950@mellanox.co.il> Message-ID: <45EC290C.9000702@gmail.com> Michael S. Tsirkin wrote: >> Quoting Moni Shoua : >> Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB >> >> This version of the patch tracks the allocs and releases of ipoib_neigh and >> keeps a list of them. Before IPoIB net device unregisters the list is passed >> to destroy ipoib_neighs that ride on on a bond neighbour. >> >> This is a replacement to the method of scanning the arp and ndisc >> tables. > > Why does the list need to be global? > We already have a per-device list of paths, and each of these in turn > has a list of neighbours. Can't this be used? > OK, It's a good point but coming to think of it now I have a question When a device unregisters ipoib_stop() is called and all ipoib_neighs are destroyed. Isn't it enough to ensure that ipoib_neigh_destructor will not try to "touch" one of the ib devs? or in other words: Isn't it that the work to clean ipoib_neighs is unnecessary? BTW: I guess that idea of global list was influenced from the ipoib_8111... patch. Why was it used there? From monisonlists at gmail.com Mon Mar 5 06:35:50 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Mon, 05 Mar 2007 16:35:50 +0200 Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <20070304182048.GG19828@mellanox.co.il> References: <45E313D2.70909@voltaire.com> <45EAA02F.4000108@gmail.com> <20070304182048.GG19828@mellanox.co.il> Message-ID: <45EC2AC6.2090805@gmail.com> Michael S. Tsirkin wrote: > Some more issues: > >> This version of the patch tracks the allocs and releases of ipoib_neigh and >> keeps a list of them. Before IPoIB net device unregisters the list is passed >> to destroy ipoib_neighs that ride on on a bond neighbour. >> >> This is a replacement to the method of scanning the arp and ndisc >> tables. >> >> >> Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib.h >> =================================================================== >> --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-04 12:20:54.749932751 +0200 >> +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-04 12:21:58.547593677 +0200 >> @@ -218,6 +218,7 @@ struct ipoib_neigh { >> struct neighbour *neighbour; >> struct net_device *dev; >> >> + struct list_head all_neigh_list; >> struct list_head list; >> }; >> >> Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c >> =================================================================== >> --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-04 12:21:52.720629356 +0200 >> +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-04 12:21:58.548593499 +0200 >> @@ -66,6 +66,7 @@ MODULE_PARM_DESC(recv_queue_size, "Numbe >> #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG >> int ipoib_debug_level; >> >> +static int ipoib_at_exit = 0; >> module_param_named(debug_level, ipoib_debug_level, int, 0644); >> MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0"); >> #endif > > This at_exit trick looks ugly. Ideally, hotplug removing all devices and module > removal should act identically. The fact that they do not is suspicious. > Consider hotplug removing all devices. It seems no code will test > ipoib_at_exit then. Is that right? > > I still see a difference between removing all devices and unloading a module. In the first case, the function ipoib_neigh_destructor is still accessible but not in the second. I can't always set the neighbour destructor to NULL because it is shared among other neighbours. I can do it only when I don't want the destructor to be called ever (which is the case of module unloading) >> @@ -85,6 +86,9 @@ struct workqueue_struct *ipoib_workqueue >> >> struct ib_sa_client ipoib_sa_client; >> >> +static DEFINE_SPINLOCK(ipoib_all_neigh_list_lock); >> +static LIST_HEAD(ipoib_all_neigh_list); >> + >> static void ipoib_add_one(struct ib_device *device); >> static void ipoib_remove_one(struct ib_device *device); >> >> @@ -792,6 +796,24 @@ static void ipoib_neigh_destructor(struc >> ipoib_put_ah(ah); >> } >> >> +static void ipoib_neigh_cleanup_bond(struct net_device* master, >> + struct net_device* slave) >> +{ >> + struct ipoib_neigh *nn, *tn; >> + >> + spin_lock(&ipoib_all_neigh_list_lock); >> + list_for_each_entry_safe(nn, tn, &ipoib_all_neigh_list, all_neigh_list){ >> + if ((nn->neighbour->dev == master) && (nn->dev == slave)) { > > Extra ()'s not really needed here: logic ops have lower precedence than math > (IIRC, only comma and assignments have lower precedence than logic). > >> + if (ipoib_at_exit) >> + nn->neighbour->parms->neigh_destructor = NULL; > > Is it safe to do this without locking? > Could the destructor be in progress when we do this? I think you're right. Maybe I need to attack the issue in a different way. I need to do some rethinking. > >> + spin_unlock(&ipoib_all_neigh_list_lock); >> + ipoib_neigh_destructor(nn->neighbour); >> + spin_lock(&ipoib_all_neigh_list_lock); >> + } >> + } >> + spin_unlock(&ipoib_all_neigh_list_lock); >> +} >> + >> struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, >> struct net_device *dev) >> { >> @@ -806,6 +828,9 @@ struct ipoib_neigh *ipoib_neigh_alloc(st >> *to_ipoib_neigh(neighbour) = neigh; >> skb_queue_head_init(&neigh->queue); >> >> + spin_lock(&ipoib_all_neigh_list_lock); >> + list_add_tail(&neigh->all_neigh_list, &ipoib_all_neigh_list); >> + spin_unlock(&ipoib_all_neigh_list_lock); >> return neigh; >> } >> >> @@ -818,6 +843,9 @@ void ipoib_neigh_free(struct net_device >> ++priv->stats.tx_dropped; >> dev_kfree_skb_any(skb); >> } >> + spin_lock(&ipoib_all_neigh_list_lock); >> + list_del(&neigh->all_neigh_list); >> + spin_unlock(&ipoib_all_neigh_list_lock); >> kfree(neigh); >> } >> >> @@ -874,6 +902,8 @@ void ipoib_dev_cleanup(struct net_device >> >> /* Delete any child interfaces first */ >> list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs, list) { >> + if (cpriv->dev->master) >> + ipoib_neigh_cleanup_bond(cpriv->dev->master,priv->dev); > > whitespace broken here. > >> unregister_netdev(cpriv->dev); >> ipoib_dev_cleanup(cpriv->dev); >> free_netdev(cpriv->dev); >> @@ -1158,6 +1188,8 @@ static void ipoib_remove_one(struct ib_d >> list_for_each_entry_safe(priv, tmp, dev_list, list) { >> ib_unregister_event_handler(&priv->event_handler); >> flush_scheduled_work(); >> + if (priv->dev->master) >> + ipoib_neigh_cleanup_bond(priv->dev->master,priv->dev); > > and here. > >> >> unregister_netdev(priv->dev); >> ipoib_dev_cleanup(priv->dev); >> @@ -1217,6 +1249,7 @@ err_fs: >> >> static void __exit ipoib_cleanup_module(void) >> { >> + ipoib_at_exit = 1; >> ib_unregister_client(&ipoib_client); >> ib_sa_unregister_client(&ipoib_sa_client); >> ipoib_unregister_debugfs(); From mst at mellanox.co.il Mon Mar 5 07:14:11 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 5 Mar 2007 17:14:11 +0200 Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45EC290C.9000702@gmail.com> References: <45E313D2.70909@voltaire.com> <45EAA02F.4000108@gmail.com> <20070304175915.GG17950@mellanox.co.il> <45EC290C.9000702@gmail.com> Message-ID: <20070305151411.GF5311@mellanox.co.il> > Quoting Moni Shoua : > Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB > > Michael S. Tsirkin wrote: > >> Quoting Moni Shoua : > >> Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB > >> > >> This version of the patch tracks the allocs and releases of ipoib_neigh and > >> keeps a list of them. Before IPoIB net device unregisters the list is passed > >> to destroy ipoib_neighs that ride on on a bond neighbour. > >> > >> This is a replacement to the method of scanning the arp and ndisc > >> tables. > > > > Why does the list need to be global? > > We already have a per-device list of paths, and each of these in turn > > has a list of neighbours. Can't this be used? > > > OK, It's a good point but coming to think of it now I have a question > > When a device unregisters ipoib_stop() is called and all ipoib_neighs are destroyed. > Isn't it enough to ensure that ipoib_neigh_destructor will not try to > "touch" one of the ib devs? or in other words: Isn't it that the work to > clean ipoib_neighs is unnecessary? > > BTW: I guess that idea of global list was influenced from the ipoib_8111... patch. > Why was it used there? AFAIK, the point is to check whether (in pre-2.6.17 kernels) some neighbour shares the same ops pointer. Only after no such neighbours are left is it safe to set the destructor to NULL. This backport is not raceless BTW - some neighbour not related to IPoIB could be running the destructor. But I think it's the best I could come up with for these old kernels. -- MST From swise at opengridcomputing.com Mon Mar 5 07:22:56 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 05 Mar 2007 09:22:56 -0600 Subject: [ofa-general] dapl compile problem Message-ID: <1173108176.14159.17.camel@stevo-desktop> Hey Arlin, I cloned the latest dapl tree Friday and checked out the ofed_1_2 branch, then did a ./autogen.sh && ./configure && make. I get errors in dtest: make[2]: Leaving directory `/usr/local/src/git/dapl' Making all in test/dtest make[2]: Entering directory `/usr/local/src/git/dapl/test/dtest' if gcc -DHAVE_CONFIG_H -I. -I. -I../.. -g -O2 -MT dtest.o -MD -MP -MF ".deps/dtest.Tpo" -c -o dtest.o dtest.c; \ then mv -f ".deps/dtest.Tpo" ".deps/dtest.Po"; else rm -f ".deps/dtest.Tpo"; exit 1; fi dtest.c:55:25: error: dat/udat.h: No such file or directory Have you seen this? I must be doing something dumb. Thanks, Steve. From benjamin.thery at bull.net Mon Mar 5 07:29:49 2007 From: benjamin.thery at bull.net (Benjamin Thery) Date: Mon, 05 Mar 2007 16:29:49 +0100 Subject: [ofa-general] Re: [PATCH RFC 17/31] net: Factor out __dev_alloc_name from dev_alloc_name In-Reply-To: <11697516361051-git-send-email-ebiederm@xmission.com> References: <11697516361051-git-send-email-ebiederm@xmission.com> Message-ID: <45EC376D.9080808@bull.net> Hello Eric, See comments about __dev_alloc_name() below. Regards, Benjamin Eric W. Biederman wrote: > From: Eric W. Biederman - unquoted > > When forcibly changing the network namespace of a device > I need something that can generate a name for the device > in the new namespace without overwriting the old name. > > __dev_alloc_name provides me that functionality. > > Signed-off-by: Eric W. Biederman > --- > net/core/dev.c | 44 +++++++++++++++++++++++++++++++++----------- > 1 files changed, 33 insertions(+), 11 deletions(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 32fe905..fc0d2af 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -655,9 +655,10 @@ int dev_valid_name(const char *name) > } > > /** > - * dev_alloc_name - allocate a name for a device > - * @dev: device > + * __dev_alloc_name - allocate a name for a device > + * @net: network namespace to allocate the device name in > * @name: name format string > + * @buf: scratch buffer and result name string > * > * Passed a format string - eg "lt%d" it will try and find a suitable > * id. It scans list of devices to build up a free map, then chooses > @@ -668,18 +669,13 @@ int dev_valid_name(const char *name) > * Returns the number of the unit assigned or a negative errno code. > */ > > -int dev_alloc_name(struct net_device *dev, const char *name) > +static int __dev_alloc_name(net_t net, const char *name, char buf[IFNAMSIZ]) IMHO the third parameter should be: char *buf Indeed using "char buf[IFNAMSIZ]" is misleading because later in the routine sizeof(buf) is used (with an expected result of IFNAMSIZ). Unfortunately this is no longer the case: sizeof(buf) value is only 4 now (buf is pointer parameter). This corrupts the registration of network devices (now I understand why only one of my e1000 showed up after each reboot :). Also sizeof(buf) should be replaced by IFNAMSIZ in this new routine. (See below) > { > int i = 0; > - char buf[IFNAMSIZ]; > const char *p; > const int max_netdevices = 8*PAGE_SIZE; > long *inuse; > struct net_device *d; > - net_t net; > - > - BUG_ON(null_net(dev->nd_net)); > - net = dev->nd_net; > > p = strnchr(name, IFNAMSIZ-1, '%'); > if (p) { > @@ -713,10 +709,8 @@ int dev_alloc_name(struct net_device *dev, const char *name) > } > > snprintf(buf, sizeof(buf), name, i); Replace "snprintf(buf, IFNAMSIZ, name, i);" or i will never be appended to name and all your ethernet devices will all try to register the name "eth". There is another occurence of "snprintf(buf, sizeof(buf), ...)" to replace in the for loop above. > - if (!__dev_get_by_name(net, buf)) { > - strlcpy(dev->name, buf, IFNAMSIZ); > + if (!__dev_get_by_name(net, buf)) > return i; > - } > > /* It is possible to run out of possible slots > * when the name is long and there isn't enough space left > @@ -725,6 +719,34 @@ int dev_alloc_name(struct net_device *dev, const char *name) > return -ENFILE; > } > > +/** > + * dev_alloc_name - allocate a name for a device > + * @dev: device > + * @name: name format string > + * > + * Passed a format string - eg "lt%d" it will try and find a suitable > + * id. It scans list of devices to build up a free map, then chooses > + * the first empty slot. The caller must hold the dev_base or rtnl lock > + * while allocating the name and adding the device in order to avoid > + * duplicates. > + * Limited to bits_per_byte * page size devices (ie 32K on most platforms). > + * Returns the number of the unit assigned or a negative errno code. > + */ > + > +int dev_alloc_name(struct net_device *dev, const char *name) > +{ > + char buf[IFNAMSIZ]; > + net_t net; > + int ret; > + > + BUG_ON(null_net(dev->nd_net)); > + net = dev->nd_net; > + ret = __dev_alloc_name(net, name, buf); > + if (ret >= 0) > + strlcpy(dev->name, buf, IFNAMSIZ); > + return ret; > +} > + > > /** > * dev_change_name - change name of a device -- B e n j a m i n T h e r y - BULL/DT/Open Software R&D http://www.bull.com From swise at opengridcomputing.com Mon Mar 5 07:37:11 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 05 Mar 2007 09:37:11 -0600 Subject: [ofa-general] dapl compile problem In-Reply-To: <1173108176.14159.17.camel@stevo-desktop> References: <1173108176.14159.17.camel@stevo-desktop> Message-ID: <1173109031.14159.29.camel@stevo-desktop> BTW: The 1.2 branch compiled fine. On Mon, 2007-03-05 at 09:22 -0600, Steve Wise wrote: > Hey Arlin, > > I cloned the latest dapl tree Friday and checked out the ofed_1_2 > branch, then did a ./autogen.sh && ./configure && make. > > I get errors in dtest: > > > make[2]: Leaving directory `/usr/local/src/git/dapl' > Making all in test/dtest > make[2]: Entering directory `/usr/local/src/git/dapl/test/dtest' > if gcc -DHAVE_CONFIG_H -I. -I. -I../.. -g -O2 -MT dtest.o -MD -MP -MF ".deps/dtest.Tpo" -c -o dtest.o dtest.c; \ > then mv -f ".deps/dtest.Tpo" ".deps/dtest.Po"; else rm -f ".deps/dtest.Tpo"; exit 1; fi > dtest.c:55:25: error: dat/udat.h: No such file or directory > > > Have you seen this? I must be doing something dumb. > > Thanks, > > Steve. > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From bugzilla-daemon at lists.openfabrics.org Mon Mar 5 07:45:12 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 5 Mar 2007 07:45:12 -0800 (PST) Subject: [ofa-general] [Bug 410] compilation of srptools fails on all platfroms (x86_64, i686, ppc64, sles10 & rh4u3) In-Reply-To: Message-ID: <20070305154512.8D9C0E60811@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=410 ------- Comment #1 from sweitzen at cisco.com 2007-03-05 07:45 ------- OFED-1.2-20070304-0600 srptools compiled fine for me on RHEL4 U3 x86_64. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Mon Mar 5 07:50:56 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 5 Mar 2007 07:50:56 -0800 (PST) Subject: [ofa-general] [Bug 410] compilation of srptools fails on all platfroms (x86_64, i686, ppc64, sles10 & rh4u3) In-Reply-To: Message-ID: <20070305155056.A1C14E60808@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=410 mst at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |ishai at mellanox.co.il -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From changquing.tang at hp.com Mon Mar 5 08:17:18 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Mon, 5 Mar 2007 16:17:18 -0000 Subject: [ofa-general] What is the size of async event queue ? In-Reply-To: <15ddcffd0703050136x4f0814c3n30e03d849521a9b7@mail.gmail.com> References: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> <15ddcffd0703050136x4f0814c3n30e03d849521a9b7@mail.gmail.com> Message-ID: <349DCDA352EACF42A0C49FA6DCEA8403998162@G3W0634.americas.hpqcorp.net> Or: Thank you for the description. I have read the spec carefully and got some idea. But here is a case I don't know. I have 1024 QPs on a single port/cable. There is NO receive posted because I use pure RDMA write. And also there is no pending send. At this point I pull the cable out. I will get the port error event(right ?). Do I also get 1024 QP error events ? Because there is no way to report through completion status. Or the QPs are still in good state even though I pull out cable ? --CQ > -----Original Message----- > From: Or Gerlitz [mailto:or.gerlitz at gmail.com] > Sent: Monday, March 05, 2007 3:37 AM > To: Tang, Changqing > Cc: Roland Dreier; openib-general at openib.org > Subject: Re: [ofa-general] What is the size of async event queue ? > > On 3/2/07, Tang, Changqing wrote: > > > What is the default size of the async event queue ? > Suppose I > > create 1024 QP from one process to another process, Somehow > the remote > > process crashes, Can I get all the 1024 QP error async > event, how do I > > make sure I don't loss an event ? > > CQ, > > I want to understand what is the exact fearure you need. > > for example, if TCP is used the equivalent of this is that > following a remote process crash the remote node/s TCP stack > close the TCP connections and when ever the local process > attempts to use the socket it get an errno telling this > connection was closed ?! > > Since you use RC QP, --if-- you attempt doing post_send (or > rdma) to a QP whose connected peer QP is not responding, you > will get CQ completion with "retry exceeded" error. > > If the above case (notification following post send) is not > enough, the IB CM which you can use through libibcm or > librdmacm provides the same functionality (sends DREQ if the > process crashes) with the distinction that over TCP the same > primitive (socket) is use for conn management and conn data > xfer, where over IB, the QP is used for data and the IB CM Id > (or the RDMA CM Id) is used for conn management. > > Combining possibilities: if you want to get a notification on > every peer process crash, you would need to either > poll/select once a while the libibcm/librdmacm event queue or > implement some keep a live of your own protocol. For > instance, I think the IB spec mentions doing zero length rdma > write once in a while as a mean for implementing such protocol. > > Or. > From changquing.tang at hp.com Mon Mar 5 08:28:31 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Mon, 5 Mar 2007 16:28:31 -0000 Subject: [ofa-general] What is the size of async event queue ? In-Reply-To: <15ddcffd0703050136x4f0814c3n30e03d849521a9b7@mail.gmail.com> References: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> <15ddcffd0703050136x4f0814c3n30e03d849521a9b7@mail.gmail.com> Message-ID: <349DCDA352EACF42A0C49FA6DCEA84039981BA@G3W0634.americas.hpqcorp.net> > > CQ, > > I want to understand what is the exact fearure you need. I want our MPI code can survive from connection loss, or peer process/machine crash. This process can detect any IB error, and then clean that connection, use healthy connections only, and possibly make new connections. If the error is global to this process, not just to a single connection, then we just abort this process. --CQ > > for example, if TCP is used the equivalent of this is that > following a remote process crash the remote node/s TCP stack > close the TCP connections and when ever the local process > attempts to use the socket it get an errno telling this > connection was closed ?! > > Since you use RC QP, --if-- you attempt doing post_send (or > rdma) to a QP whose connected peer QP is not responding, you > will get CQ completion with "retry exceeded" error. > > If the above case (notification following post send) is not > enough, the IB CM which you can use through libibcm or > librdmacm provides the same functionality (sends DREQ if the > process crashes) with the distinction that over TCP the same > primitive (socket) is use for conn management and conn data > xfer, where over IB, the QP is used for data and the IB CM Id > (or the RDMA CM Id) is used for conn management. > > Combining possibilities: if you want to get a notification on > every peer process crash, you would need to either > poll/select once a while the libibcm/librdmacm event queue or > implement some keep a live of your own protocol. For > instance, I think the IB spec mentions doing zero length rdma > write once in a while as a mean for implementing such protocol. > > Or. > From afriedle at open-mpi.org Mon Mar 5 08:32:58 2007 From: afriedle at open-mpi.org (Andrew Friedley) Date: Mon, 05 Mar 2007 11:32:58 -0500 Subject: [ofa-general] build failure on nightly tarball -- bonding In-Reply-To: <45EA9A08.1090500@gmail.com> References: <45E846F6.7070705@open-mpi.org> <45EA9A08.1090500@gmail.com> Message-ID: <45EC463A.6000805@open-mpi.org> Moni Shoua wrote: > Andrew Friedley wrote: >> The chelsio build errors from yesterday appear to be gone, though now >> I'm seeing errors building the IB bonding code with the 3/2 alpha >> tarball -- error below. I'm wondering, is there a way to selectively >> avoid building things like this that seem to be optional, as a tarball >> user? >> >> Andrew > For the error messages.... It seems to me that the problem is one that I have already fixed. > The corrected source RPM is in my home dir. Is there a reason the fix is not in the nightly alpha tarballs? Where do I find your home directory? Andrew From caitlinb at broadcom.com Mon Mar 5 08:56:38 2007 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Mon, 5 Mar 2007 08:56:38 -0800 Subject: [ofa-general] What is the size of async event queue ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA8403998162@G3W0634.americas.hpqcorp.net> Message-ID: <1EF1E44200D82B47BD5BA61171E8CE9D0302000F@NT-IRVA-0750.brcm.ad.broadcom.com> general-bounces at lists.openfabrics.org wrote: > Or: > Thank you for the description. I have read the spec > carefully and got some idea. But here is a case I don't know. > > I have 1024 QPs on a single port/cable. There is NO > receive posted because I use pure RDMA write. And also there is no > pending send. At this point I pull the cable out. > > I will get the port error event(right ?). Do I also get > 1024 QP error events ? Because there is no way to report > through completion status. Or the QPs are still in good state > even though I pull out cable ? > > If you have 1024 QPs you should provision resources to handle the case where all 1024 abruptly terminate. From changquing.tang at hp.com Mon Mar 5 09:06:34 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Mon, 5 Mar 2007 17:06:34 -0000 Subject: [ofa-general] What is the size of async event queue ? In-Reply-To: <1EF1E44200D82B47BD5BA61171E8CE9D0302000F@NT-IRVA-0750.brcm.ad.broadcom.com> References: <349DCDA352EACF42A0C49FA6DCEA8403998162@G3W0634.americas.hpqcorp.net> <1EF1E44200D82B47BD5BA61171E8CE9D0302000F@NT-IRVA-0750.brcm.ad.broadcom.com> Message-ID: <349DCDA352EACF42A0C49FA6DCEA8403998492@G3W0634.americas.hpqcorp.net> > > Or: > > Thank you for the description. I have read the spec > carefully and got > > some idea. But here is a case I don't know. > > > > I have 1024 QPs on a single port/cable. There is NO > receive posted > > because I use pure RDMA write. And also there is no pending > send. At > > this point I pull the cable out. > > > > I will get the port error event(right ?). Do I also get > > 1024 QP error events ? Because there is no way to report through > > completion status. Or the QPs are still in good state even though I > > pull out cable ? > > > > > > If you have 1024 QPs you should provision resources to handle > the case where all 1024 abruptly terminate. > Yes, I just want a way to detect all QP failaures and clean all the QP connections, either through completion status, or async event. But in this case I don't know if I can get all the events. --CQ > From changquing.tang at hp.com Mon Mar 5 09:50:56 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Mon, 5 Mar 2007 17:50:56 -0000 Subject: [ofa-general] What is the size of async event queue ? In-Reply-To: <15ddcffd0703050136x4f0814c3n30e03d849521a9b7@mail.gmail.com> References: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> <15ddcffd0703050136x4f0814c3n30e03d849521a9b7@mail.gmail.com> Message-ID: <349DCDA352EACF42A0C49FA6DCEA84039985D6@G3W0634.americas.hpqcorp.net> > Combining possibilities: if you want to get a notification on > every peer process crash, you would need to either > poll/select once a while the libibcm/librdmacm event queue or > implement some keep a live of your own protocol. For > instance, I think the IB spec mentions doing zero length rdma > write once in a while as a mean for implementing such protocol. Can you point me the spec page talking about zero-length rdma-write and send ? If I use zero-length rdma write, does it generate something on wire to let me detect broken connection, and there is no effect on remote buffer ? If I use zero-length send, do I get the same thing? How about if I don't have receive posted on remote side ? --CQ > > Or. > From mst at mellanox.co.il Mon Mar 5 09:58:25 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 5 Mar 2007 19:58:25 +0200 Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45EC2AC6.2090805@gmail.com> References: <45E313D2.70909@voltaire.com> <45EAA02F.4000108@gmail.com> <20070304182048.GG19828@mellanox.co.il> <45EC2AC6.2090805@gmail.com> Message-ID: <20070305175825.GC4264@mellanox.co.il> > >> + if (ipoib_at_exit) > >> + nn->neighbour->parms->neigh_destructor = NULL; > > > > Is it safe to do this without locking? > > Could the destructor be in progress when we do this? > I think you're right. Maybe I need to attack the issue in a different way. > I need to do some rethinking. The basic problem seems to be that bonding code is taking a pointer into the module (neighbour setup) without taking reference on the module. -- MST From sashak at voltaire.com Mon Mar 5 10:25:53 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 5 Mar 2007 20:25:53 +0200 Subject: [ofa-general] [PATCH TRIVIAL] diags/mcm_rereg_test: more info messages Message-ID: <20070305182553.GI19019@sashak.voltaire.com> Add more info messages. Change number of max clients. Signed-off-by: Sasha Khapyorsky --- diags/src/mcm_rereg_test.c | 14 +++++++++++--- 1 files changed, 11 insertions(+), 3 deletions(-) diff --git a/diags/src/mcm_rereg_test.c b/diags/src/mcm_rereg_test.c index f0582a0..57c8db6 100644 --- a/diags/src/mcm_rereg_test.c +++ b/diags/src/mcm_rereg_test.c @@ -183,6 +183,8 @@ static int rereg_send_all(int port, int agent, ib_portid_t *dport, IB_MAD_TRID_F); } + info("rereg_send_all: sent %u requests\n", cnt*2); + free(umad); return 0; @@ -245,7 +247,7 @@ static int rereg_recv_all(int port, int agent, ib_portid_t *dport, uint8_t *umad, *mad; int len = umad_size() + 256; uint64_t trid; - unsigned method, status; + unsigned n, method, status; int i; info("rereg_recv_all...\n"); @@ -256,8 +258,10 @@ static int rereg_recv_all(int port, int agent, ib_portid_t *dport, return -1; } + n = 0; while (rereg_recv(port, agent, dport, umad, len, TMO) > 0) { - dbg("rereg_recv_all: done %d\n", cnt++); + dbg("rereg_recv_all: done %d\n", n); + n++; mad = umad_get_mad(umad); method = mad_get_field(mad, 0, IB_MAD_METHOD_F); @@ -286,6 +290,8 @@ static int rereg_recv_all(int port, int agent, ib_portid_t *dport, } } + info("rereg_recv_all: got %u responses\n", n); + free(umad); return 0; } @@ -330,6 +336,8 @@ static int rereg_query_all(int port, int agent, ib_portid_t *dport, ntohll(list[i].guid), status, method); } + info("rereg_query_all: %u queried.\n", cnt); + free(umad); return 0; } @@ -370,7 +378,7 @@ static int rereg_mcm_rec_recv(int port, int agent, int cnt) } #endif -#define MAX_CLIENTS 100 +#define MAX_CLIENTS 50 static int rereg_and_test_port(char *guid_file, int port, int agent, ib_portid_t *dport, int timeout) { -- 1.5.0.1.40.gb40d From sashak at voltaire.com Mon Mar 5 10:58:11 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 5 Mar 2007 20:58:11 +0200 Subject: [ofa-general] [PATCH] opensm: draft of the Coding Style doc for OpenSM Message-ID: <20070305185811.GK19019@sashak.voltaire.com> Initial writeup about OpenSM Coding Style recommendations. Signed-off-by: Sasha Khapyorsky --- osm/doc/opensm-coding-style.txt | 34 ++++++++++++++++++++++++++++++++++ 1 files changed, 34 insertions(+), 0 deletions(-) create mode 100644 osm/doc/opensm-coding-style.txt diff --git a/osm/doc/opensm-coding-style.txt b/osm/doc/opensm-coding-style.txt new file mode 100644 index 0000000..379042c --- /dev/null +++ b/osm/doc/opensm-coding-style.txt @@ -0,0 +1,34 @@ +This short (hopefully) memo is about to define the coding style +recommended for OpenSM development. + +The goal of this is to make OpenSM code base to be standard in terms of +the rest of OpenIB management software, OpenIB projects and Linux in +general. And in this way to make OpenSM more developer friendly and to +involve more open source programmers to be part of OpenSM development +process. + +The goal of this is not to provide long and boring list of coding style +paradigms, but rather to define general coding style concept and to +suggest a way for such a concept to be implemented in the existing +OpenSM code base. + +The OpenSM project is an OpenIB and Linux centric project, so we think +it is reasonable to use the coding style most popular with OpenIB +projects (linux/Documentation/CodingStyle) as the starting point rather +than reinventing one more coding style rule-set. + +Some things from there in short: tab character for indentation and space +character for alignment, K&R style braces, short local and meanful +global names, please no confused Hungary style, short functions. And of +course to be reasonable about all above. + + +Some ideas about existing OpenSM code improvements in terms of the +Coding style: + +* When writing new code, please try to follow the new Coding style. +* Coding style improvement patches are desired and accepted, but please + try to not mix coding style improvement with functional and other + changes in one patch. +* When you are going to improve coding style for existing code, please + try to do it for entire file(s). -- 1.5.0.1.40.gb40d From halr at voltaire.com Mon Mar 5 10:42:06 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Mar 2007 13:42:06 -0500 Subject: [ofa-general] Re: [PATCH TRIVIAL] diags/mcm_rereg_test: more info messages In-Reply-To: <20070305182553.GI19019@sashak.voltaire.com> References: <20070305182553.GI19019@sashak.voltaire.com> Message-ID: <1173120123.4546.275843.camel@hal.voltaire.com> On Mon, 2007-03-05 at 13:25, Sasha Khapyorsky wrote: > Add more info messages. Change number of max clients. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to both master and ofed_1_2). -- Hal From halr at voltaire.com Mon Mar 5 10:49:53 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Mar 2007 13:49:53 -0500 Subject: [ofa-general] Re: [PATCH TRIVIAL] opensm: remove unneeded checks In-Reply-To: <20070305185310.GJ19019@sashak.voltaire.com> References: <20070305185310.GJ19019@sashak.voltaire.com> Message-ID: <1173120593.4546.276335.camel@hal.voltaire.com> On Mon, 2007-03-05 at 13:53, Sasha Khapyorsky wrote: > Now osm_switch_recommend_path() loops port number from 1, so != 0 checks > are not needed anymore. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to both master and ofed_1_2). -- Hal From ardavis at ichips.intel.com Mon Mar 5 11:31:14 2007 From: ardavis at ichips.intel.com (Arlin Davis) Date: Mon, 05 Mar 2007 11:31:14 -0800 Subject: [ofa-general] RE: [Bug 396] OFED 1.2 alpha DAPL failures using IntelMPI 3.0.33, kernel patching issues In-Reply-To: <000001c75b89$108c0f10$ff0da8c0@amr.corp.intel.com> References: <000001c75b89$108c0f10$ff0da8c0@amr.corp.intel.com> Message-ID: <45EC7002.9000602@ichips.intel.com> Sean Hefty wrote: >>This is the result of incorrect timeout values being used as a result of >>sean_cm_limit_mra_timeout_patch. Can someone tell me the purpose of this patch >>and how it became >>part of the OFED 1.2 build? >> >> > >This patch sets the timeout values incorrectly and needs to be removed from >OFED. The purpose was to work-around a storage target firmware bug, which I >believe now has a fix. > > Vladimir, can you please remove this patch from OFED 1.2 builds? It looks like it is still included. From erezs at voltaire.com Mon Mar 5 11:40:38 2007 From: erezs at voltaire.com (Erez Strauss) Date: Mon, 5 Mar 2007 21:40:38 +0200 Subject: [ofa-general] IPoIB-CM/RC - mthca error Message-ID: Hello IB general list I'm using the OFED-1.2 alpha1 on few machines, using IPoIB-CM/RC. After some TCP and multicast traffic I get the following error on all the machines: ib_mthca 0000:17:00.0: CQ entry for unknown QP 170405 ib_mthca 0000:17:00.0: CQ entry for unknown QP 170405 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1a0405 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1a0405 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1a0405 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1d0407 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1d0407 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1d0407 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1d0405 What is it? What should I do? Does it mean the system will fall back to UD? Thanks, Erez Voltaire Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Mon Mar 5 11:46:27 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 5 Mar 2007 11:46:27 -0800 Subject: [ofa-general] IPoIB-CM/RC - mthca error In-Reply-To: References: Message-ID: This is bug 394 https://bugs.openfabrics.org/show_bug.cgi?id=394, and it's fixed now. Scott ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Erez Strauss Sent: Monday, March 05, 2007 11:41 AM To: openib-general at lists.openfabrics.org Subject: [ofa-general] IPoIB-CM/RC - mthca error Hello IB general list I'm using the OFED-1.2 alpha1 on few machines, using IPoIB-CM/RC. After some TCP and multicast traffic I get the following error on all the machines: ib_mthca 0000:17:00.0: CQ entry for unknown QP 170405 ib_mthca 0000:17:00.0: CQ entry for unknown QP 170405 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1a0405 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1a0405 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1a0405 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1d0407 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1d0407 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1d0407 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1d0405 What is it? What should I do? Does it mean the system will fall back to UD? Thanks, Erez Voltaire Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sashak at voltaire.com Mon Mar 5 11:59:01 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 5 Mar 2007 21:59:01 +0200 Subject: [ofa-general] [PATCH TRIVIAL] diags: clean gcc-4.1 warnigns Message-ID: <20070305195900.GA24579@sashak.voltaire.com> Clean pointer target signedness warnings generated by gcc-4.1 Signed-off-by: Sasha Khapyorsky --- diags/include/ibdiag_common.h | 2 +- diags/src/ibdiag_common.c | 2 +- diags/src/saquery.c | 8 ++++---- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/diags/include/ibdiag_common.h b/diags/include/ibdiag_common.h index 030bdaa..2d463c5 100644 --- a/diags/include/ibdiag_common.h +++ b/diags/include/ibdiag_common.h @@ -63,6 +63,6 @@ char *lookup_switch_name(FILE *switch_map_fp, uint64_t target_guid, void iberror(const char *fn, char *msg, ...); /* NOTE: this modifies the parameter "nodedesc". */ -char *clean_nodedesc(uint8_t *nodedesc); +char *clean_nodedesc(char *nodedesc); #endif /* _IBDIAG_COMMON_H_ */ diff --git a/diags/src/ibdiag_common.c b/diags/src/ibdiag_common.c index 6ae02c5..a81bb95 100644 --- a/diags/src/ibdiag_common.c +++ b/diags/src/ibdiag_common.c @@ -135,7 +135,7 @@ iberror(const char *fn, char *msg, ...) } char * -clean_nodedesc(uint8_t *nodedesc) +clean_nodedesc(char *nodedesc) { int i = 0; diff --git a/diags/src/saquery.c b/diags/src/saquery.c index fce11d3..f581e0c 100644 --- a/diags/src/saquery.c +++ b/diags/src/saquery.c @@ -101,7 +101,7 @@ print_node_desc(ib_node_record_t *node_record) { printf("%6d \"%s\"\n", cl_ntoh16(node_record->lid), - clean_nodedesc(p_nd->description)); + clean_nodedesc((char *)p_nd->description)); } } @@ -156,7 +156,7 @@ print_node_record(ib_node_record_t *node_record) cl_ntoh32( p_ni->revision ), ib_node_info_get_local_port_num( p_ni ), cl_ntoh32( ib_node_info_get_vendor_id( p_ni )), - clean_nodedesc(node_record->node_desc.description) + clean_nodedesc((char *)node_record->node_desc.description) ); } @@ -331,7 +331,7 @@ print_multicast_member_record(ib_member_rec_t *p_mcmr) "0x%016" PRIx64 " (%s)\n", gid_prefix, gid_interface_id, - clean_nodedesc(node_record->node_desc.description) + clean_nodedesc((char *)node_record->node_desc.description) ); } } else { @@ -352,7 +352,7 @@ print_multicast_member_record(ib_member_rec_t *p_mcmr) gid_interface_id, p_mcmr->scope_state, p_mcmr->proxy_join, - clean_nodedesc(node_record->node_desc.description) + clean_nodedesc((char *)node_record->node_desc.description) ); } } -- 1.5.0.3.287.gf6d7 From mshefty at ichips.intel.com Mon Mar 5 11:57:55 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 05 Mar 2007 11:57:55 -0800 Subject: [ofa-general] rdma_cm issues in 2.6.21-rc1 In-Reply-To: <000001c75c59$86b852e0$ff0da8c0@amr.corp.intel.com> References: <000001c75c59$86b852e0$ff0da8c0@amr.corp.intel.com> Message-ID: <45EC7643.5020008@ichips.intel.com> Sean Hefty wrote: > As just a note, I'm investigating two issues with the rdma_cm and 2.6.21-rc1. > > Running ucmatose twice results in a failure binding to an address the second > time that it's run. > > I'm also seeing a kernel crash if ucmatose is killed while waiting for a > connection. Just an update - I think I've found the issue. The merger of the multicast support added with kzalloc -> kmalloc cleanups left a field uninitialized. I'll have a patch shortly. - Sean From erezs at voltaire.com Mon Mar 5 12:05:22 2007 From: erezs at voltaire.com (Erez Strauss) Date: Mon, 5 Mar 2007 22:05:22 +0200 Subject: [ofa-general] IPoIB-CM/RC - mthca error In-Reply-To: References: Message-ID: Hi Scott, Thanks for the prompt reply. Where can I find the fix, is it part of the nightly builds? What is the nature of the bug and its affect on the system behavior? Thanks, Erez ________________________________ From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Sent: Monday, March 05, 2007 2:46 PM To: Erez Strauss; openib-general at lists.openfabrics.org Subject: RE: [ofa-general] IPoIB-CM/RC - mthca error This is bug 394 https://bugs.openfabrics.org/show_bug.cgi?id=394, and it's fixed now. Scott ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Erez Strauss Sent: Monday, March 05, 2007 11:41 AM To: openib-general at lists.openfabrics.org Subject: [ofa-general] IPoIB-CM/RC - mthca error Hello IB general list I'm using the OFED-1.2 alpha1 on few machines, using IPoIB-CM/RC. After some TCP and multicast traffic I get the following error on all the machines: ib_mthca 0000:17:00.0: CQ entry for unknown QP 170405 ib_mthca 0000:17:00.0: CQ entry for unknown QP 170405 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1a0405 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1a0405 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1a0405 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1d0407 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1d0407 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1d0407 ib_mthca 0000:17:00.0: CQ entry for unknown QP 1d0405 What is it? What should I do? Does it mean the system will fall back to UD? Thanks, Erez Voltaire Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Mar 5 11:58:27 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Mar 2007 14:58:27 -0500 Subject: [ofa-general] Re: [PATCH TRIVIAL] diags: clean gcc-4.1 warnigns In-Reply-To: <20070305195900.GA24579@sashak.voltaire.com> References: <20070305195900.GA24579@sashak.voltaire.com> Message-ID: <1173124704.4546.280601.camel@hal.voltaire.com> On Mon, 2007-03-05 at 14:59, Sasha Khapyorsky wrote: > Clean pointer target signedness warnings generated by gcc-4.1 > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to both master and ofed_1_2). -- Hal From halr at voltaire.com Mon Mar 5 12:16:20 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Mar 2007 15:16:20 -0500 Subject: [ofa-general] Re: [PATCH] opensm: draft of the Coding Style doc for OpenSM In-Reply-To: <20070305185811.GK19019@sashak.voltaire.com> References: <20070305185811.GK19019@sashak.voltaire.com> Message-ID: <1173125780.4546.281684.camel@hal.voltaire.com> On Mon, 2007-03-05 at 13:58, Sasha Khapyorsky wrote: > Initial writeup about OpenSM Coding Style recommendations. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to both master and ofed_1_2). -- Hal From sean.hefty at intel.com Mon Mar 5 12:29:40 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 5 Mar 2007 12:29:40 -0800 Subject: [ofa-general] [GIT PULL] 2.6.21-rc3 please pull rdma-dev.git Message-ID: <000001c75f65$00935660$a437170a@amr.corp.intel.com> Roland, please pull from: git.openfabrics.org/~shefty/rdma-dev.git for-roland This contains one bug fix for rc3: rdma_cm: initialize rdma_bind_list in cma_alloc_any_port The struct rdma_bind_list fields for hlist are not being initialized, resulting in a corrupted list. Signed-off-by: Sean Hefty diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d441815..fde92ce 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1821,7 +1821,7 @@ static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv, struct rdma_bind_list *bind_list; int port, ret; - bind_list = kmalloc(sizeof *bind_list, GFP_KERNEL); + bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL); if (!bind_list) return -ENOMEM; From halr at voltaire.com Mon Mar 5 12:41:47 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Mar 2007 15:41:47 -0500 Subject: [ofa-general] [PATCH][MINOR] OpenSM/osm_subnet.c: Support configurable no_clients_rereg in opensm options file Message-ID: <1173127301.4546.283229.camel@hal.voltaire.com> OpenSM/osm_subnet.c: Support configurable no_clients_rereg in opensm options file Signed-off-by: Hal Rosenstock diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c index e4e69c0..4be09dc 100644 --- a/osm/opensm/osm_subnet.c +++ b/osm/opensm/osm_subnet.c @@ -1028,6 +1028,10 @@ osm_subn_parse_conf_file( "enable_quirks", p_key, p_val, &p_opts->enable_quirks); + __osm_subn_opts_unpack_boolean( + "no_clients_rereg", + p_key, p_val, &p_opts->no_clients_rereg); + } } fclose(opts_file); @@ -1250,6 +1254,8 @@ osm_subn_write_conf_file( "dump_files_dir %s\n\n" "# If TRUE enables new high risk options and hardware specific quirks\n" "enable_quirks %s\n\n" + "# If TRUE disables client reregistration\n" + "no_clients_rereg %s\n\n" "# If TRUE OpenSM should disable multicast support\n" "no_multicast_option %s\n\n" "# No multicast routing is performed if TRUE\n" @@ -1267,6 +1273,7 @@ osm_subn_write_conf_file( p_opts->accum_log_file ? "TRUE" : "FALSE", p_opts->dump_files_dir, p_opts->enable_quirks ? "TRUE" : "FALSE", + p_opts->no_clients_rereg ? "TRUE" : "FALSE", p_opts->no_multicast_option ? "TRUE" : "FALSE", p_opts->disable_multicast ? "TRUE" : "FALSE", p_opts->exit_on_fatal ? "TRUE" : "FALSE", From arlin.r.davis at intel.com Mon Mar 5 12:55:38 2007 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 5 Mar 2007 12:55:38 -0800 Subject: [ofa-general] RE: [PATCH] dapl compile problem - dapltest/dtest In-Reply-To: <1173108176.14159.17.camel@stevo-desktop> Message-ID: <000001c75f68$a14994e0$9f97070a@amr.corp.intel.com> >-----Original Message----- >From: Steve Wise [mailto:swise at opengridcomputing.com] >I get errors in dtest: > >make[2]: Leaving directory `/usr/local/src/git/dapl' >Making all in test/dtest >make[2]: Entering directory `/usr/local/src/git/dapl/test/dtest' >if gcc -DHAVE_CONFIG_H -I. -I. -I../.. -g -O2 -MT dtest.o -MD -MP -MF ".deps/dtest.Tpo" -c -o >dtest.o dtest.c; \ >then mv -f ".deps/dtest.Tpo" ".deps/dtest.Po"; else rm -f ".deps/dtest.Tpo"; exit 1; fi >dtest.c:55:25: error: dat/udat.h: No such file or directory > Steve, Can you try the following patch? Signed-off by: Arlin Davis ardavis at ichips.intel.com diff --git a/test/dapltest/Makefile.am b/test/dapltest/Makefile.am index 0c83924..1a19c53 100755 --- a/test/dapltest/Makefile.am +++ b/test/dapltest/Makefile.am @@ -1,5 +1,6 @@ INCLUDES = -I include \ - -I mdep/linux + -I mdep/linux \ + -I $(srcdir)/../../dat/include bin_PROGRAMS = dapltest diff --git a/test/dtest/Makefile.am b/test/dtest/Makefile.am index ac9837b..fcb9b4e 100755 --- a/test/dtest/Makefile.am +++ b/test/dtest/Makefile.am @@ -1,4 +1,5 @@ bin_PROGRAMS = dtest dtest_SOURCES = dtest.c +INCLUDES = -I $(srcdir)/../../dat/include dtest_LDADD = $(srcdir)/../../dat/udat/libdat.la From changquing.tang at hp.com Mon Mar 5 12:59:13 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Mon, 5 Mar 2007 20:59:13 -0000 Subject: [ofa-general] What is the size of async event queue ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA84039985D6@G3W0634.americas.hpqcorp.net> References: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net><15ddcffd0703050136x4f0814c3n30e03d849521a9b7@mail.gmail.com> <349DCDA352EACF42A0C49FA6DCEA84039985D6@G3W0634.americas.hpqcorp.net> Message-ID: <349DCDA352EACF42A0C49FA6DCEA84039D9186@G3W0634.americas.hpqcorp.net> Or: I think I find the SPEC part talking about zero-length message. I just search zero-length. For rdma write, the receiver side does not need to do anything, for send, a receive WR is need in receiver side. And that makes sense. --CQ > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Tang, Changqing > Sent: Monday, March 05, 2007 11:51 AM > To: Or Gerlitz > Cc: Roland Dreier; openib-general at openib.org > Subject: RE: [ofa-general] What is the size of async event queue ? > > > > Combining possibilities: if you want to get a notification on every > > peer process crash, you would need to either poll/select > once a while > > the libibcm/librdmacm event queue or implement some keep a live of > > your own protocol. For instance, I think the IB spec mentions doing > > zero length rdma write once in a while as a mean for > implementing such > > protocol. > > > Can you point me the spec page talking about zero-length > rdma-write and send ? > > If I use zero-length rdma write, does it generate something > on wire to let me detect broken connection, and there is no > effect on remote buffer ? > > If I use zero-length send, do I get the same thing? How about > if I don't have receive posted on remote side ? > > --CQ > > > > > > Or. > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Mon Mar 5 13:02:50 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 05 Mar 2007 15:02:50 -0600 Subject: [ofa-general] RE: [PATCH] dapl compile problem - dapltest/dtest In-Reply-To: <000001c75f68$a14994e0$9f97070a@amr.corp.intel.com> References: <000001c75f68$a14994e0$9f97070a@amr.corp.intel.com> Message-ID: <1173128570.14159.60.camel@stevo-desktop> That seems to do the trick. But the patch didn't apply cleanly...I had to modify it to apply the changes to dapltest/Makefile... Steve. On Mon, 2007-03-05 at 12:55 -0800, Arlin Davis wrote: > > >-----Original Message----- > >From: Steve Wise [mailto:swise at opengridcomputing.com] > >I get errors in dtest: > > > >make[2]: Leaving directory `/usr/local/src/git/dapl' > >Making all in test/dtest > >make[2]: Entering directory `/usr/local/src/git/dapl/test/dtest' > >if gcc -DHAVE_CONFIG_H -I. -I. -I../.. -g -O2 -MT dtest.o -MD -MP -MF ".deps/dtest.Tpo" -c -o > >dtest.o dtest.c; \ > >then mv -f ".deps/dtest.Tpo" ".deps/dtest.Po"; else rm -f ".deps/dtest.Tpo"; exit 1; fi > >dtest.c:55:25: error: dat/udat.h: No such file or directory > > > > Steve, > > Can you try the following patch? > > > > Signed-off by: Arlin Davis ardavis at ichips.intel.com > > diff --git a/test/dapltest/Makefile.am b/test/dapltest/Makefile.am > index 0c83924..1a19c53 100755 > --- a/test/dapltest/Makefile.am > +++ b/test/dapltest/Makefile.am > @@ -1,5 +1,6 @@ > INCLUDES = -I include \ > - -I mdep/linux > + -I mdep/linux \ > + -I $(srcdir)/../../dat/include > > bin_PROGRAMS = dapltest > > diff --git a/test/dtest/Makefile.am b/test/dtest/Makefile.am > index ac9837b..fcb9b4e 100755 > --- a/test/dtest/Makefile.am > +++ b/test/dtest/Makefile.am > @@ -1,4 +1,5 @@ > bin_PROGRAMS = dtest > dtest_SOURCES = dtest.c > +INCLUDES = -I $(srcdir)/../../dat/include > dtest_LDADD = $(srcdir)/../../dat/udat/libdat.la From andrew.robbie at gmail.com Mon Mar 5 13:47:00 2007 From: andrew.robbie at gmail.com (Andrew Robbie (GMail)) Date: Tue, 6 Mar 2007 08:47:00 +1100 Subject: [ofa-general] IB switches: managed or not? Message-ID: Hi, Apologies if this is the wrong O.F. list to use. I am building a small (~16) node cluster with an IB interconnect. I need to decide whether I will buy a cheaper, dumb switch and run OpenSM, or get a more expensive switch with a built in subnet manager. The largest this system would every grow is 32 nodes (two 24 port switches). Various vendors (integrators, not switch OEMs) have stated to me that managed switches are the go, and that OpenSM is (a) buggy, and (b) very time consuming to set up. But, a managed name brand switch seems to cost a lot more than a non-managed one using the Mellanox reference design kit (rebadged, but I suspect made by Flextronics...). My other query is about diagnostic software. With an ethernet switch it is pretty easy to fire up Ethereal (sorry Wireshark, but it is such a silly name) or Etherape and get a look at what is going on. If I buy a Cisco or Voltaire etc do they come with tools that let me get accurate representations of what is going on? Or are their tools really for large IB networks? Regards, Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From arlin.r.davis at intel.com Mon Mar 5 13:49:37 2007 From: arlin.r.davis at intel.com (Arlin Davis) Date: Mon, 5 Mar 2007 13:49:37 -0800 Subject: [ofa-general] [PATCH] uDAPL dtest/dapltest build issues Message-ID: <000201c75f70$2ba22150$9f97070a@amr.corp.intel.com> Fix dapltest and dtest build issues. Applied to master and ofed_1_2. Signed-off by: Arlin Davis ardavis at ichips.intel.com diff --git a/test/dapltest/Makefile.am b/test/dapltest/Makefile.am index 0c83924..1a19c53 100755 --- a/test/dapltest/Makefile.am +++ b/test/dapltest/Makefile.am @@ -1,5 +1,6 @@ INCLUDES = -I include \ - -I mdep/linux + -I mdep/linux \ + -I $(srcdir)/../../dat/include bin_PROGRAMS = dapltest diff --git a/test/dtest/Makefile.am b/test/dtest/Makefile.am index ac9837b..fcb9b4e 100755 --- a/test/dtest/Makefile.am +++ b/test/dtest/Makefile.am @@ -1,4 +1,5 @@ bin_PROGRAMS = dtest dtest_SOURCES = dtest.c +INCLUDES = -I $(srcdir)/../../dat/include dtest_LDADD = $(srcdir)/../../dat/udat/libdat.la From erezs at voltaire.com Mon Mar 5 14:07:41 2007 From: erezs at voltaire.com (Erez Strauss) Date: Tue, 6 Mar 2007 00:07:41 +0200 Subject: [ofa-general] IPoIB-CM/RC - NAPI patch. Message-ID: Hello Roland, I understand that you have a patch to enable NAPI for IPoIB-CM/RC. I'm interested in the patch to reduce the number of interrupts I get on a system (40k /s). Does it apply to the nightly build? Are there tunable variables to control the increase of latency due to reduction of interrupts? Would you please send me a copy? Thanks, Erez Voltaire Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sorrillo at jlab.org Mon Mar 5 14:28:36 2007 From: sorrillo at jlab.org (Lawrence Sorrillo) Date: Mon, 5 Mar 2007 17:28:36 -0500 Subject: [ofa-general] Implement IP oer IB on Solaris 10 Message-ID: <200703052228.l25MSZFS009734@ccs17.jlab.org> Hi: I am a new to both IB and Solaris. But I have significant experience with Linux. I am using a Sun Fire 4500 with x86 on Solaris. There is a dual IB card installed. I want to configure IP over IB in solaris. The final goal here is to implement file services for over IB. 1. My /kernel/drv/ib.conf file below: . . .. port-svc-list=""; vppa-svc-list="ipib"; hca-svc-list=""; 2. root@**.***.**** # cfgadm -a | grep ib ib IB-Fabric connected configured ok ib::daplt,0 IB-PSEUDO connected configured ok ib::rpcib,0 IB-PSEUDO connected configured ok root at hpcdata7.jlab.org Here is /var/adm/messages: Grep ib messages: Mar 5 15:59:12 XXX.XXXX genunix: [ID 408114 kern.info] /ib/rpcib at 0 (rpcib0) online Mar 5 15:59:12 xxxx.xxx genunix: [ID 834635 kern.info] /ib/rpcib at 0 (rpcib0) multipath status: degraded, path /pci at 3,0/pci1022,7458 at 9/pci15b3,5a46 at 1/pci15b3,5a44 at 0 (tavor0) to target address: rpcib,0 is online Load balancing: round-robin Mar 5 15:59:12 xxx.xxx ib: [ID 842868 kern.info] IB device: daplt at 0, daplt0 Mar 5 15:59:12 xxxx.xxx genunix: [ID 936769 kern.info] daplt0 is /ib/daplt at 0 Mar 5 15:59:12 xxxx.xxx genunix: [ID 408114 kern.info] /ib/daplt at 0 (daplt0) online Mar 5 15:59:12 xxxx.xxx genunix: [ID 834635 kern.info] /ib/daplt at 0 (daplt0) multipath status: degraded, path /pci at 3,0/pci1022,7458 at 9/pci15b3,5a46 at 1/pci15b3,5a44 at 0 (tavor0) to target address: daplt,0 is online Load balancing: round-robin What do I need to do here? Lawrence Sorrillo UNIX/Linux Systems Administrator Jefferson Laboratory 12000 Jeffferson Avenue, Newport News, VA 23606 Phone: 757-269-7681 Email: sorrillo at jlab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From weiny2 at llnl.gov Mon Mar 5 14:41:29 2007 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 5 Mar 2007 14:41:29 -0800 Subject: [ofa-general] IB switches: managed or not? In-Reply-To: References: Message-ID: <20070305144129.657516a3.weiny2@llnl.gov> On Tue, 6 Mar 2007 08:47:00 +1100 "Andrew Robbie (GMail)" wrote: > Hi, > > Apologies if this is the wrong O.F. list to use. > > I am building a small (~16) node cluster with an IB interconnect. I need to > decide whether I will buy a cheaper, dumb switch and run OpenSM, or get a > more expensive switch with a built in subnet manager. The largest this > system would every grow is 32 nodes (two 24 port switches). > > Various vendors (integrators, not switch OEMs) have stated to me that > managed switches are the go, and that OpenSM is (a) buggy, and (b) very time > consuming to set up. But, a managed name brand switch seems to cost a lot > more than a non-managed one using the Mellanox reference design kit > (rebadged, but I suspect made by Flextronics...). > We are using OSM on clusters from 5 nodes all the way up to 1100 nodes. Granted there have been some issues. However, the current 1.1 OFED version seems to work just fine. (We are pushing to the OFED 1.2 version for the 1100 node cluster at the moment.) > > My other query is about diagnostic software. With an ethernet switch it is > pretty easy to fire up Ethereal (sorry Wireshark, but it is such a silly > name) or Etherape and get a look at what is going on. If I buy a Cisco or > Voltaire etc do they come with tools that let me get accurate > representations of what is going on? Or are their tools really for large IB > networks? > Their are some issues with diags. However, I don't know of any product from any vendor which captures on the wire packets like Wireshark. (But I could be wrong and would love to know about it if it was out there.) Checkout the openib-diags and ibutils packages for diags which are available. YMMV, Ira Weiny weiny2 at llnl.gov From Nitin.Hande at Sun.COM Mon Mar 5 14:56:24 2007 From: Nitin.Hande at Sun.COM (Nitin Hande) Date: Mon, 05 Mar 2007 14:56:24 -0800 Subject: [ofa-general] Implement IP oer IB on Solaris 10 In-Reply-To: <200703052228.l25MSZFS009734@ccs17.jlab.org> References: <200703052228.l25MSZFS009734@ccs17.jlab.org> Message-ID: <45ECA018.9040605@sun.com> Lawrence, Lawrence Sorrillo wrote: > > Hi: > > I am a new to both IB and Solaris. But I have significant experience > with Linux. > > I am using a Sun Fire 4500 with x86 on Solaris. There is a dual IB > card installed. > If the broadcast group have been configured on the switch, then you just need to plumb the interface . Thanks Nitin > > I want to configure IP over IB in solaris. The final goal here is to > implement file services for over IB. > > 1. My /kernel/drv/ib.conf file below: > > … > > … > > …. > > port-svc-list=""; > > vppa-svc-list="ipib"; > > hca-svc-list=""; > > 2. root@**.***.**** # cfgadm -a | grep ib > > ib IB-Fabric connected configured ok > > ib::daplt,0 IB-PSEUDO connected configured ok > > ib::rpcib,0 IB-PSEUDO connected configured ok > > root at hpcdata7.jlab.org > > Here is /var/adm/messages: > > Grep ib messages: > > Mar 5 15:59:12 XXX.XXXX genunix: [ID 408114 kern.info] /ib/rpcib at 0 > (rpcib0) online > > Mar 5 15:59:12 xxxx.xxx genunix: [ID 834635 kern.info] /ib/rpcib at 0 > (rpcib0) multipath status: degraded, path > /pci at 3,0/pci1022,7458 at 9/pci15b3,5a46 at 1/pci15b3,5a44 at 0 (tavor0) to > target address: rpcib,0 is online Load balancing: round-robin > > Mar 5 15:59:12 xxx.xxx ib: [ID 842868 kern.info] IB device: daplt at 0, > daplt0 > > Mar 5 15:59:12 xxxx.xxx genunix: [ID 936769 kern.info] daplt0 is > /ib/daplt at 0 > > Mar 5 15:59:12 xxxx.xxx genunix: [ID 408114 kern.info] /ib/daplt at 0 > (daplt0) online > > Mar 5 15:59:12 xxxx.xxx genunix: [ID 834635 kern.info] /ib/daplt at 0 > (daplt0) multipath status: degraded, path > /pci at 3,0/pci1022,7458 at 9/pci15b3,5a46 at 1/pci15b3,5a44 at 0 (tavor0) to > target address: daplt,0 is online Load balancing: round-robin > > What do I need to do here? > > Lawrence Sorrillo > > UNIX/Linux Systems Administrator > > Jefferson Laboratory > > 12000 Jeffferson Avenue, > > Newport News, VA 23606 > > Phone: 757-269-7681 > > Email: sorrillo at jlab.org > > ------------------------------------------------------------------------ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From David.Brean at Sun.COM Mon Mar 5 15:22:54 2007 From: David.Brean at Sun.COM (David M. Brean) Date: Mon, 05 Mar 2007 18:22:54 -0500 Subject: [ofa-general] Implement IP oer IB on Solaris 10 In-Reply-To: <200703052228.l25MSZFS009734@ccs17.jlab.org> References: <200703052228.l25MSZFS009734@ccs17.jlab.org> Message-ID: <45ECA64E.10506@sun.com> Hello, driver-discuss at opensolaris.org is a more appropriate alias for you to send future questions and have follow-up discussions. I saw a suggestion from Nitin Hande, but I might suggest that you check your hardware configuration because the Solaris doesn't appear to be detecting the presence of the HCA. -David Lawrence Sorrillo wrote: > Hi: > > > > I am a new to both IB and Solaris. But I have significant experience > with Linux. > > > > I am using a Sun Fire 4500 with x86 on Solaris. There is a dual IB > card installed. > > > > > > > > I want to configure IP over IB in solaris. The final goal here is to > implement file services for over IB. > > > > 1. My /kernel/drv/ib.conf file below: > > > > ... > > ... > > .... > > port-svc-list=""; > > vppa-svc-list="ipib"; > > hca-svc-list=""; > > > > > > 2. root@**.***.**** # cfgadm -a | grep ib > > ib IB-Fabric connected configured ok > > ib::daplt,0 IB-PSEUDO connected configured ok > > ib::rpcib,0 IB-PSEUDO connected configured ok > > root at hpcdata7.jlab.org > > > > > > Here is /var/adm/messages: > > > > Grep ib messages: > > > > Mar 5 15:59:12 XXX.XXXX genunix: [ID 408114 kern.info] /ib/rpcib at 0 > (rpcib0) online > > Mar 5 15:59:12 xxxx.xxx genunix: [ID 834635 kern.info] /ib/rpcib at 0 > (rpcib0) multipath status: degraded, path > /pci at 3,0/pci1022,7458 at 9/pci15b3,5a46 at 1/pci15b3,5a44 at 0 (tavor0) to > target address: rpcib,0 is online Load balancing: round-robin > > Mar 5 15:59:12 xxx.xxx ib: [ID 842868 kern.info] IB device: daplt at 0, > daplt0 > > Mar 5 15:59:12 xxxx.xxx genunix: [ID 936769 kern.info] daplt0 is > /ib/daplt at 0 > > Mar 5 15:59:12 xxxx.xxx genunix: [ID 408114 kern.info] /ib/daplt at 0 > (daplt0) online > > Mar 5 15:59:12 xxxx.xxx genunix: [ID 834635 kern.info] /ib/daplt at 0 > (daplt0) multipath status: degraded, path > /pci at 3,0/pci1022,7458 at 9/pci15b3,5a46 at 1/pci15b3,5a44 at 0 (tavor0) to > target address: daplt,0 is online Load balancing: round-robin > > > > What do I need to do here? > > > > Lawrence Sorrillo > > UNIX/Linux Systems Administrator > > Jefferson Laboratory > > 12000 Jeffferson Avenue, > > Newport News, VA 23606 > > Phone: 757-269-7681 > > Email: sorrillo at jlab.org > > > >------------------------------------------------------------------------ > >_______________________________________________ >general mailing list >general at lists.openfabrics.org >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wombat2 at us.ibm.com Mon Mar 5 15:27:14 2007 From: wombat2 at us.ibm.com (Bernard King-Smith) Date: Mon, 5 Mar 2007 18:27:14 -0500 Subject: [ofa-general] IPoIB-CM/RC - NAPI patch. In-Reply-To: <20070305222846.13B4BE60825@openfabrics.org> Message-ID: > ----- Message from "Erez Strauss" on Tue, 6 Mar > 2007 00:07:41 +0200 ----- > > To: > > > > Subject: > > [ofa-general] IPoIB-CM/RC - NAPI patch. > > Hello Roland, > > I understand that you have a patch to enable NAPI for IPoIB-CM/RC. > > I?m interested in the patch to reduce the number of interrupts I get > on a system (40k /s). Are you still getting 40k interrupts per second with IPoIB-CM? You shouldn't because if you are using a 32768 MTU size you should cut the interrupts down by a factor of 16. With IPoIB-CM, each interrupt should be a 32K message instead of a 2K message. > > Does it apply to the nightly build? > > Are there tunable variables to control the increase of latency due > to reduction of interrupts? > > Would you please send me a copy? > > Thanks, > Erez > Voltaire Inc. Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Mon Mar 5 15:32:46 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 05 Mar 2007 17:32:46 -0600 Subject: [ofa-general] [PATCH 2.6.21-rc2] iw_cxgb3: Start ep timer on a MPA reject. Message-ID: <1173137566.14159.95.camel@stevo-desktop> Start ep timer on a MPA reject. If the consumer rejects the connection we end up under-referencing the endpoint structure. The fix is to call iwch_ep_disconnect() instead of the low level disconnect functions so that the endpoint close timer is started correctly. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index fd2f3ca..d0ed1d3 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -1691,12 +1691,11 @@ int iwch_reject_cr(struct iw_cm_id *cm_i return -ECONNRESET; } BUG_ON(state_read(&ep->com) != MPA_REQ_RCVD); - state_set(&ep->com, CLOSING); if (mpa_rev == 0) abort_connection(ep, NULL, GFP_KERNEL); else { err = send_mpa_reject(ep, pdata, pdata_len); - err = send_halfclose(ep, GFP_KERNEL); + err = iwch_ep_disconnect(ep, 0, GFP_KERNEL); } return 0; } From swise at opengridcomputing.com Mon Mar 5 15:34:20 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 05 Mar 2007 17:34:20 -0600 Subject: [ofa-general] [PATCH ofed_1_2] iw_cxgb3: Start ep timer on a MPA reject. Message-ID: <1173137660.14159.97.camel@stevo-desktop> Start ep timer on a MPA reject. If the consumer rejects the connection we end up under-referencing the endpoint structure. The fix is to call iwch_ep_disconnect() instead of the low level disconnect functions so that the endpoint close timer is started correctly. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index bbd34e7..ac91a96 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -1692,12 +1692,11 @@ int iwch_reject_cr(struct iw_cm_id *cm_i return -ECONNRESET; } BUG_ON(state_read(&ep->com) != MPA_REQ_RCVD); - state_set(&ep->com, CLOSING); if (mpa_rev == 0) abort_connection(ep, NULL, GFP_KERNEL); else { err = send_mpa_reject(ep, pdata, pdata_len); - err = send_halfclose(ep, GFP_KERNEL); + err = iwch_ep_disconnect(ep, 0, GFP_KERNEL); } return 0; } From weiny2 at llnl.gov Mon Mar 5 15:41:02 2007 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 5 Mar 2007 15:41:02 -0800 Subject: [ofa-general] nightly OFED 1.2 build still fails. Message-ID: <20070305154102.00475da0.weiny2@llnl.gov> I downloaded the latest nightly again and tried to build. I still get the following error. Just to be sure I cloned the libibverbs git tree as well as the libibcm tree and this builds just fine from those sources. I must therefore assume there is an error in the build.sh majic. Why is the build autoconf tools not used from the git trees? Ira make[2]: Entering directory `/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/libibcm' if /bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I../libibverbs/include -g -Wall -D_GNU_SOURCE -g -O2 -MT cm.lo -MD -MP -MF ".deps/cm.Tpo" -c -o cm.lo `test -f 'src/cm.c' || echo './'`src/cm.c; \ then mv -f ".deps/cm.Tpo" ".deps/cm.Plo"; else rm -f ".deps/cm.Tpo"; exit 1; fi mkdir .libs gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I../libibverbs/include -g -Wall -D_GNU_SOURCE -g -O2 -MT cm.lo -MD -MP -MF .deps/cm.Tpo -c src/cm.c -fPIC -DPIC -o .libs/cm.o /bin/sh ./libtool --tag=CC --mode=link gcc -g -Wall -D_GNU_SOURCE -g -O2 -L../libibverbs/src -libverbs -lsysfs -L. -o src/libibcm.la -rpath /usr/lib64 -avoid-version cm.lo mkdir src/.libs gcc -shared .libs/cm.o -Wl,--rpath -Wl,/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/libibverbs/src/.libs /tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/libibverbs/src/.libs/libibverbs.so /usr/lib/libsysfs.so -L/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/libibcm -Wl,-soname -Wl,libibcm.so -o src/.libs/libibcm.so /usr/lib/libsysfs.so: could not read symbols: File in wrong format collect2: ld returned 1 exit status make[2]: *** [src/libibcm.la] Error 1 make[2]: Leaving directory `/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/libibcm' From sweitzen at cisco.com Mon Mar 5 15:48:16 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 5 Mar 2007 15:48:16 -0800 Subject: [ofa-general] still have many bugs to fix before OFED 1.2 beta Message-ID: I just took a quick look at open bugs, and see many I think should be fixed before OFED 1.2 beta (which was supposed to have code cutoff on Saturday). I think we need to fix all compilation failures and MPI failures for beta. bug_id assigned_to short_desc 348 vlad at mellanox.co.il OFED 1.2 build does not create links for libdat.so and libdaplcma.so 370 pasha at mellanox.co.il OFED 1.2 alpha1 MVAPICH Intel compiler support broken 375 vlad at mellanox.co.il Open MPI PGI C++ failure at runtime 379 vlad at mellanox.co.il can't compile OFED 1.1 alpha1 on RHEL4/SLES10 ppc64 380 pasha at mellanox.co.il OFED 1.2 alpha1 gcc MVAPICH won't compile on RHEL4 IA64 381 rowland at cse.ohio-state.edu OFED 1.2 alpha1 MVAPICH2 won't compile on RHEL4 IA64 with Intel compiler 395 vlad at mellanox.co.il uDAPL fails (with Intel MPI or HP MPI) on SLES 10 i686 396 vlad at mellanox.co.il OFED 1.2 alpha DAPL failures using Intel MPI 3.0.33 397 jsquyres at cisco.com OFED 1.2 alpha1 Open MPI "InfiniBand retry count" errors If you have bugs assigned to you, please keep their status accurate. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From troy at scl.ameslab.gov Mon Mar 5 18:19:44 2007 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Mon, 5 Mar 2007 20:19:44 -0600 Subject: [ofa-general] ibv_reg_mr permissions? Message-ID: <9BBF0228-1DC6-4D06-9D56-A71C35C01D51@scl.ameslab.gov> What permissions must one have to register memory in the OFED_1_2 git tree? I am running kernel 2.6.19.2, and I'm in the RDMA group, and can open /dev/infiniband/uverbs0 as a user, but can't register memory as a user. From mst at mellanox.co.il Mon Mar 5 21:22:51 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Mar 2007 07:22:51 +0200 Subject: [ofa-general] test please ignore Message-ID: <20070306052251.GD4264@mellanox.co.il> -- MST From erezs at voltaire.com Mon Mar 5 21:51:52 2007 From: erezs at voltaire.com (Erez Strauss) Date: Tue, 6 Mar 2007 07:51:52 +0200 Subject: [ofa-general] IPoIB-CM/RC - NAPI patch. In-Reply-To: References: <20070305222846.13B4BE60825@openfabrics.org> Message-ID: Hi Bernie, Thank you for your reply. In this case I'm using an application which is using very small messages (300Bytes) and I can not change it. The focus of the testing is to reduce the latency to minimum during high stress, while keeping the application intact. Similar synthetic test would be multiple 'iperf -l 300 ....' and sum of the total bandwidth (or packets per seconds). Any suggestions for IPoIB-CM/RC are welcome. Do you expect the SDP to perform better under the above constrains? Thanks, Erez Voltaire Inc. ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Bernard King-Smith Sent: Monday, March 05, 2007 6:27 PM To: general at lists.openfabrics.org Subject: Re: [ofa-general] IPoIB-CM/RC - NAPI patch. > ----- Message from "Erez Strauss" on Tue, 6 Mar > 2007 00:07:41 +0200 ----- > > To: > > > > Subject: > > [ofa-general] IPoIB-CM/RC - NAPI patch. > > Hello Roland, > > I understand that you have a patch to enable NAPI for IPoIB-CM/RC. > > I'm interested in the patch to reduce the number of interrupts I get > on a system (40k /s). Are you still getting 40k interrupts per second with IPoIB-CM? You shouldn't because if you are using a 32768 MTU size you should cut the interrupts down by a factor of 16. With IPoIB-CM, each interrupt should be a 32K message instead of a 2K message. > > Does it apply to the nightly build? > > Are there tunable variables to control the increase of latency due > to reduction of interrupts? > > Would you please send me a copy? > > Thanks, > Erez > Voltaire Inc. Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner -------------- next part -------------- An HTML attachment was scrubbed... URL: From ogerlitz at voltaire.com Mon Mar 5 23:11:22 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 06 Mar 2007 09:11:22 +0200 Subject: [ofa-general] What is the size of async event queue ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA84039D9186@G3W0634.americas.hpqcorp.net> References: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net><15ddcffd0703050136x4f0814c3n30e03d849521a9b7@mail.gmail.com> <349DCDA352EACF42A0C49FA6DCEA84039985D6@G3W0634.americas.hpqcorp.net> <349DCDA352EACF42A0C49FA6DCEA84039D9186@G3W0634.americas.hpqcorp.net> Message-ID: <45ED141A.1080100@voltaire.com> Tang, Changqing wrote: > Or: > I think I find the SPEC part talking about zero-length message. > I just search zero-length. > For rdma write, the receiver side does not need to do anything, > for send, a receive > WR is need in receiver side. And that makes sense. Indeed. However, i am not sure that for the scales of K ranks job, i would run to implements zero-length-rdma-write keep alive protocol putting O(K^2) messages on the wire before making sure its really needed to fulfill the functional demand. Or. From ogerlitz at voltaire.com Mon Mar 5 23:23:06 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 06 Mar 2007 09:23:06 +0200 Subject: [ofa-general] What is the size of async event queue ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA8403998162@G3W0634.americas.hpqcorp.net> References: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> <15ddcffd0703050136x4f0814c3n30e03d849521a9b7@mail.gmail.com> <349DCDA352EACF42A0C49FA6DCEA8403998162@G3W0634.americas.hpqcorp.net> Message-ID: <45ED16DA.30508@voltaire.com> Tang, Changqing wrote: > Or: > Thank you for the description. I have read the spec carefully > and got some idea. But here is a case I don't know. > > I have 1024 QPs on a single port/cable. There is NO receive > posted because I use pure RDMA write. And also there is no pending send. > At this point I pull the cable out. > > I will get the port error event(right ?). Do I also get 1024 QP > error events ? Because there is no way to report through completion > status. Or the QPs are still in good state even though I pull out cable > ? All the QPs for which there is inflight TX (send/rdma) when the cable is plugged out would be moved by the HCA to the ERROR state, and you will detect this by getting completion with error on the associated CQ. As Roland explained (and pointed you to the spec...) you will not QP error (async) event just b/c the cable was plugged out. Now, as for the QPs which are "idle" before the time of the link down (cable removal) and till the link is up (cable is back) - my IB understanding tells me that they should be live and kicking and you should be able to use them. This b/c the RC QP lives in IB L4 (transport layer, the equivalent of TCP) and the port up/down is IB L2 (link layer, the ~equivalent of Ethernet for this discussion) event. Or. From erezz at voltaire.com Mon Mar 5 23:28:23 2007 From: erezz at voltaire.com (Erez Zilber) Date: Tue, 06 Mar 2007 09:28:23 +0200 Subject: [ofa-general] Re: [ewg] still have many bugs to fix before OFED 1.2 beta In-Reply-To: References: Message-ID: <45ED1817.1000604@voltaire.com> Scott Weitzenkamp (sweitzen) wrote: > I just took a quick look at open bugs, and see many I think should be > fixed before OFED 1.2 beta (which was supposed to have code cutoff on > Saturday). I think we need to fix all compilation failures and MPI > failures for beta. > bug_id assigned_to short_desc > 348 vlad at mellanox.co.il OFED 1.2 build > does not create links for libdat.so and libdaplcma.so > 370 pasha at mellanox.co.il OFED 1.2 > alpha1 MVAPICH Intel compiler support broken > 375 vlad at mellanox.co.il Open MPI PGI > C++ failure at runtime > 379 vlad at mellanox.co.il can't compile > OFED 1.1 alpha1 on RHEL4/SLES10 ppc64 > 380 pasha at mellanox.co.il OFED 1.2 > alpha1 gcc MVAPICH won't compile on RHEL4 IA64 > 381 rowland at cse.ohio-state.edu > OFED 1.2 alpha1 MVAPICH2 won't compile on RHEL4 IA64 with Intel compiler > 395 vlad at mellanox.co.il uDAPL fails > (with Intel MPI or HP MPI) on SLES 10 i686 > 396 vlad at mellanox.co.il OFED 1.2 alpha > DAPL failures using Intel MPI 3.0.33 > 397 jsquyres at cisco.com OFED 1.2 alpha1 > Open MPI "InfiniBand retry count" errors > Here's another important bug (opened 2 days ago): 409 vlad at mellanox.co.il Cannot uninstall OFED on RHEL5 beta 2 > If you have bugs assigned to you, please keep their status accurate. > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > ------------------------------------------------------------------------ > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- ____________________________________________________________ Erez Zilber | 972-9-971-7689 Software Engineer, Storage Team Voltaire – _The Grid Backbone_ __ www.voltaire.com From monisonlists at gmail.com Mon Mar 5 23:51:20 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Tue, 06 Mar 2007 09:51:20 +0200 Subject: [ofa-general] build failure on nightly tarball -- bonding In-Reply-To: <45EC463A.6000805@open-mpi.org> References: <45E846F6.7070705@open-mpi.org> <45EA9A08.1090500@gmail.com> <45EC463A.6000805@open-mpi.org> Message-ID: <45ED1D78.5090602@gmail.com> Andrew Friedley wrote: > Moni Shoua wrote: >> Andrew Friedley wrote: >>> The chelsio build errors from yesterday appear to be gone, though now >>> I'm seeing errors building the IB bonding code with the 3/2 alpha >>> tarball -- error below. I'm wondering, is there a way to selectively >>> avoid building things like this that seem to be optional, as a tarball >>> user? >>> >>> Andrew >> For the error messages.... It seems to me that the problem is one that >> I have already fixed. >> The corrected source RPM is in my home dir. > > Is there a reason the fix is not in the nightly alpha tarballs? Where > do I find your home directory? > > Andrew > I'm not sure what happened. The fix should have been included n the nightly build. My home is at ~monis/ofed_1_2/ From ogerlitz at voltaire.com Tue Mar 6 00:00:29 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 06 Mar 2007 10:00:29 +0200 Subject: [ofa-general] What is the size of async event queue ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA84039981BA@G3W0634.americas.hpqcorp.net> References: <349DCDA352EACF42A0C49FA6DCEA840396107E@G3W0634.americas.hpqcorp.net> <15ddcffd0703050136x4f0814c3n30e03d849521a9b7@mail.gmail.com> <349DCDA352EACF42A0C49FA6DCEA84039981BA@G3W0634.americas.hpqcorp.net> Message-ID: <45ED1F9D.6040506@voltaire.com> Tang, Changqing wrote: >> I want to understand what is the exact fearure you need. > I want our MPI code can survive from connection loss, or peer > process/machine crash. This process can detect any IB error, and then > clean that connection, use healthy connections only, and possibly make > new connections. Again, note that your attempt to use a "non healthy" connection would end up with a notification on the problem (completion with error etc). OK. There are quite a few cases here... thinking loud, if you want to go the simplest way, zero-len-rdma-write keep alive protocol seems to catch them all. If you want to avoid the traffic overhead incurred by such a protocol, and you are willing to go in a less simple approach, i suggest to define exactly what are the cases you want to handle and what is the excepted action after the local process realized the conn is lost, eg case expected action remote process crash remote process hang remote machine crash remote machine hang etc etc etc and then see what approach can work. Or. From ishai at dev.mellanox.co.il Mon Mar 5 13:00:17 2007 From: ishai at dev.mellanox.co.il (Ishai Rabinovitz) Date: Mon, 05 Mar 2007 23:00:17 +0200 Subject: [ofa-general] Re: [openib-general] [PATCH] 2.6.20 ib_cm: limit cm message timeouts In-Reply-To: References: <002301c73355$c220e180$8698070a@amr.corp.intel.com> Message-ID: <45EC84E1.1080709@dev.mellanox.co.il> Hi Roland, Sean, How about the attached fix to Sean patch? Ishai Roland Dreier wrote: >This all looks rather fishy: > > > +/* > > + * Limit CM msg timeouts to something reasonable. > > + * 8 seconds, with up to 15 retries, gives per msg timeout of 2 min. > > + */ > > +#define IB_CM_MAX_TIMEOUT 21 > >OK... (although 8 seconds seems a little short -- it seems a somewhat >longer timeout could be legitimate on a very busy fabric across a WAN >or something like that) > >but then... > > > + timeout = min(IB_CM_MAX_TIMEOUT, > > + cm_convert_to_ms(cm_mra_get_service_timeout(mra_msg)) + > > + cm_convert_to_ms(cm_id_priv->av.packet_life_time)); > >should the IB_CM_MAX_TIMEOUT be inside a cm_convert_to_ms() too? >and similarly... > > > - cm_id_priv->timeout_ms = param->timeout_ms; > > + cm_id_priv->timeout_ms = min(IB_CM_MAX_TIMEOUT, param->timeout_ms); > >is timeout_ms misnamed, or did we just limit all timeouts to 21 msecs? > >...and other places in the patch seem to have similar problems. > >Also, I would like to see warning messages like > > ib_cm: Possibly bogus timeout of xx (yyyyyy msecs) in REP from GID zzzz > >printed in the kernel log so people realize they have broken SRP >targets or whatever. > > - R. > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fixed_sean_cm_limit_mra_timeout.patch URL: From guyg at voltaire.com Tue Mar 6 01:16:14 2007 From: guyg at voltaire.com (Guy German) Date: Tue, 06 Mar 2007 11:16:14 +0200 Subject: [ofa-general] ibv_reg_mr permissions? In-Reply-To: <9BBF0228-1DC6-4D06-9D56-A71C35C01D51@scl.ameslab.gov> References: <9BBF0228-1DC6-4D06-9D56-A71C35C01D51@scl.ameslab.gov> Message-ID: <45ED315E.2040002@voltaire.com> > I am running kernel 2.6.19.2, and I'm in the RDMA group, and can open > /dev/infiniband/uverbs0 as a user, but can't register memory as a user. try increasing your "max locked memory" limitation. you can try setting as root "ulimit -l unlimited" and switch user. if it works for you - set in /etc/security/limits.conf the needed limitations for your user e.g.: user hard memlock 16384 user soft memlock 16384 Guy From vlad at lists.openfabrics.org Tue Mar 6 02:15:03 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Tue, 6 Mar 2007 02:15:03 -0800 (PST) Subject: [ofa-general] ofa_1_2_kernel 20070306-0200 daily build status Message-ID: <20070306101506.5E76FE60808@openfabrics.org> This email was generated automatically, please do not reply Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.14 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.19 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on powerpc with linux-2.6.14 Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.16.21-0.8-default Failed: From vlad at mellanox.co.il Tue Mar 6 01:58:28 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 06 Mar 2007 11:58:28 +0200 Subject: [ofa-general] [PATCH ofed_1_2] iw_cxgb3: Start ep timer on a MPA reject. In-Reply-To: <1173137660.14159.97.camel@stevo-desktop> References: <1173137660.14159.97.camel@stevo-desktop> Message-ID: <1173175108.32088.14.camel@vladsk-laptop> On Mon, 2007-03-05 at 17:34 -0600, Steve Wise wrote: > Start ep timer on a MPA reject. > > If the consumer rejects the connection we end up under-referencing the > endpoint structure. The fix is to call iwch_ep_disconnect() instead of > the low level disconnect functions so that the endpoint close timer is > started correctly. > > Signed-off-by: Steve Wise > --- Applied. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From vlad at dev.mellanox.co.il Tue Mar 6 05:29:29 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 06 Mar 2007 15:29:29 +0200 Subject: [ofa-general] dapltest compilation fails on x86 Message-ID: <45ED6CB9.80906@dev.mellanox.co.il> Hi Arlin, Please check https://bugs.openfabrics.org/show_bug.cgi?id=408 : I have 32 bit server (Intel(R) Xeon(TM)) with SLES10. Linux sw050 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 i686 i686 i386 GNU/Linux I got the following compilation error: Is dapltest supported on this system? Making all in test/dapltest make[3]: Entering directory `/tmp/gen2_devel_user-20070304-1118_check/src/userspace/dapl/test/dapltest' if gcc -DHAVE_CONFIG_H -I. -I. -I../.. -I include -I mdep/linux -I../libibverbs/include/infiniband -I../librdmacm/include -I../libibverbs/include -I../../dat/include -g -O2 -MT dapl_main.o -MD -MP -MF ".deps/dapl_main.Tpo" -c -o dapl_main.o `test -f 'cmd/dapl_main.c' || echo './'`cmd/dapl_main.c; \ then mv -f ".deps/dapl_main.Tpo" ".deps/dapl_main.Po"; else rm -f ".deps/dapl_main.Tpo"; exit 1; fi In file included from include/dapl_mdep.h:37, from include/dapl_proto.h:45, from cmd/dapl_main.c:31: mdep/linux/dapl_mdep_user.h:146:2: error: #error "Non-Pentium and Non-PPC Linux - unimplemented" make[3]: *** [dapl_main.o] Error 1 make[3]: Leaving directory `/tmp/gen2_devel_user-20070304-1118_check/src/userspace/dapl/test/dapltest' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/tmp/gen2_devel_user-20070304-1118_check/src/userspace/dapl' make[1]: *** [all] Error 2 make[1]: Leaving directory `/tmp/gen2_devel_user-20070304-1118_check/src/userspace/dapl' make: *** [dapl] Error 2 Regards, Vladimir From monil at voltaire.com Tue Mar 6 05:41:57 2007 From: monil at voltaire.com (Moni Levy) Date: Tue, 6 Mar 2007 15:41:57 +0200 Subject: [ofa-general] Re: [PATCHv3] IB/ipoib: Fix ipoib handling for pkey reordering In-Reply-To: <45EAB607.4010904@gmail.com> References: <45E6E7A0.7070902@voltaire.com> <20070301145644.GL14282@mellanox.co.il> <45EAB607.4010904@gmail.com> Message-ID: <6a122cc00703060541q44066e02u751eedf1b6b8d392@mail.gmail.com> Roland On 3/4/07, Moni Levy wrote: > On 3/1/07, Michael S. Tsirkin wrote: > > > SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence > > > does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP. > > > > Please limit line length to 80 chars or so. > > Do you want me to change anything else? We really need that for OFED 1.2. > Here is an updated patch. Did you have a chance to look at the last version of the patch ? I think that it's now acceptable for Michael as I got no additional remarks, again, we really need a resolution to that issue. Thanks, Moni > > This issue was found during partitioning & SM fail over testing. The fix was > tested over the weekend with pkey reshuffling, removal and addition every few > seconds concurrent with OFED restart. > > Changes from v1: > * added flush flag to ipoib_ib_dev_stop(), ipoib_ib_dev_down() alike > * fixed a bug in device extraction from the work struct > * removed some warnings in case they are caused due to missing PKEY as > this seems like a valid flow now. > > Changes from v2: > * less/fixed debug prints > * flush_restart_qp stuff renamed to just restart_qp > * the patch now depends on Roland's "IPoIB: Only handle async > events for one port" > > SM reconfiguration or failover possibly causes a shuffling of the values in > the port pkey table. The current implementation only queries for the index of > the pkey once, when it creates the device QP and after that moves it into > working state, and hence does not address this scenario. Fix this by using the > PKEY_CHANGE event as a trigger to reconfigure the device QP. > > Signed-off-by: Moni Levy > --- > drivers/infiniband/ulp/ipoib/ipoib.h | 4 + > drivers/infiniband/ulp/ipoib/ipoib_ib.c | 51 ++++++++++++++++++++----- > drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +- > drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 11 ++--- > drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 7 ++- > 5 files changed, 59 insertions(+), 19 deletions(-) > > Index: b/drivers/infiniband/ulp/ipoib/ipoib.h > =================================================================== > --- a/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-01 14:11:43.698307017 +0200 > +++ b/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-01 14:43:04.624119588 +0200 > @@ -205,6 +205,7 @@ struct ipoib_dev_priv { > struct delayed_work pkey_task; > struct delayed_work mcast_task; > struct work_struct flush_task; > + struct work_struct restart_qp_task; > struct work_struct restart_task; > struct delayed_work ah_reap_task; > > @@ -334,12 +335,13 @@ struct ipoib_dev_priv *ipoib_intf_alloc( > > int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port); > void ipoib_ib_dev_flush(struct work_struct *work); > +void ipoib_ib_dev_restart_qp(struct work_struct *work); > void ipoib_ib_dev_cleanup(struct net_device *dev); > > int ipoib_ib_dev_open(struct net_device *dev); > int ipoib_ib_dev_up(struct net_device *dev); > int ipoib_ib_dev_down(struct net_device *dev, int flush); > -int ipoib_ib_dev_stop(struct net_device *dev); > +int ipoib_ib_dev_stop(struct net_device *dev, int flush); > > int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port); > void ipoib_dev_cleanup(struct net_device *dev); > Index: b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > =================================================================== > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-01 14:11:43.713304355 +0200 > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-01 16:14:17.003881103 +0200 > @@ -415,21 +415,22 @@ int ipoib_ib_dev_open(struct net_device > > ret = ipoib_init_qp(dev); > if (ret) { > - ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); > + if (ret != -ENOENT) > + ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); > return -1; > } > > ret = ipoib_ib_post_receives(dev); > if (ret) { > ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); > - ipoib_ib_dev_stop(dev); > + ipoib_ib_dev_stop(dev, 1); > return -1; > } > > ret = ipoib_cm_dev_open(dev); > if (ret) { > ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); > - ipoib_ib_dev_stop(dev); > + ipoib_ib_dev_stop(dev, 1); > return -1; > } > > @@ -508,7 +509,7 @@ static int recvs_pending(struct net_devi > return pending; > } > > -int ipoib_ib_dev_stop(struct net_device *dev) > +int ipoib_ib_dev_stop(struct net_device *dev, int flush) > { > struct ipoib_dev_priv *priv = netdev_priv(dev); > struct ib_qp_attr qp_attr; > @@ -581,7 +582,8 @@ timeout: > /* Wait for all AHs to be reaped */ > set_bit(IPOIB_STOP_REAPER, &priv->flags); > cancel_delayed_work(&priv->ah_reap_task); > - flush_workqueue(ipoib_workqueue); > + if (flush) > + flush_workqueue(ipoib_workqueue); > > begin = jiffies; > > @@ -622,13 +624,17 @@ int ipoib_ib_dev_init(struct net_device > return 0; > } > > -void ipoib_ib_dev_flush(struct work_struct *work) > +static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, int restart_qp) > { > - struct ipoib_dev_priv *cpriv, *priv = > - container_of(work, struct ipoib_dev_priv, flush_task); > + struct ipoib_dev_priv *cpriv; > struct net_device *dev = priv->dev; > > - if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) ) { > + /* > + * ipoib_ib_dev_stop() below may not find the PKey and leave the > + * IPOIB_FLAG_INITIALIZED flag off so flush in that case with restart_qp > + * flag on is Ok. > + */ > + if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) && !restart_qp) { > ipoib_dbg(priv, "Not flushing - IPOIB_FLAG_INITIALIZED not set.\n"); > return; > } > @@ -642,6 +648,13 @@ void ipoib_ib_dev_flush(struct work_stru > > ipoib_ib_dev_down(dev, 0); > > + if (restart_qp) { > + ipoib_dbg(priv, "restarting the device QP\n"); > + if (test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) ) > + ipoib_ib_dev_stop(dev, 0); > + ipoib_ib_dev_open(dev); > + } > + > /* > * The device could have been brought down between the start and when > * we get here, don't bring it back up if it's not configured up > @@ -655,11 +668,29 @@ void ipoib_ib_dev_flush(struct work_stru > > /* Flush any child interfaces too */ > list_for_each_entry(cpriv, &priv->child_intfs, list) > - ipoib_ib_dev_flush(&cpriv->flush_task); > + __ipoib_ib_dev_flush(cpriv, restart_qp); > > mutex_unlock(&priv->vlan_mutex); > } > > +void ipoib_ib_dev_flush(struct work_struct *work) > +{ > + struct ipoib_dev_priv *priv = > + container_of(work, struct ipoib_dev_priv, flush_task); > + /* We only restart the QP in case of pkey change event */ > + ipoib_dbg(priv, "Flushing %s\n", priv->dev->name); > + __ipoib_ib_dev_flush(priv, 0); > +} > + > +void ipoib_ib_dev_restart_qp(struct work_struct *work) > +{ > + struct ipoib_dev_priv *priv = > + container_of(work, struct ipoib_dev_priv, restart_qp_task); > + /* We only restart the QP in case of pkey change event */ > + ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", priv->dev->name); > + __ipoib_ib_dev_flush(priv, 1); > +} > + > void ipoib_ib_dev_cleanup(struct net_device *dev) > { > struct ipoib_dev_priv *priv = netdev_priv(dev); > Index: b/drivers/infiniband/ulp/ipoib/ipoib_main.c > =================================================================== > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-01 14:11:43.729301517 +0200 > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-01 14:43:04.666112093 +0200 > @@ -107,7 +107,7 @@ int ipoib_open(struct net_device *dev) > return -EINVAL; > > if (ipoib_ib_dev_up(dev)) { > - ipoib_ib_dev_stop(dev); > + ipoib_ib_dev_stop(dev, 1); > return -EINVAL; > } > > @@ -152,7 +152,7 @@ static int ipoib_stop(struct net_device > flush_workqueue(ipoib_workqueue); > > ipoib_ib_dev_down(dev, 1); > - ipoib_ib_dev_stop(dev); > + ipoib_ib_dev_stop(dev, 1); > > if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { > struct ipoib_dev_priv *cpriv; > @@ -993,6 +993,7 @@ static void ipoib_setup(struct net_devic > INIT_DELAYED_WORK(&priv->pkey_task, ipoib_pkey_poll); > INIT_DELAYED_WORK(&priv->mcast_task, ipoib_mcast_join_task); > INIT_WORK(&priv->flush_task, ipoib_ib_dev_flush); > + INIT_WORK(&priv->restart_qp_task, ipoib_ib_dev_restart_qp); > INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); > INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); > } > Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > =================================================================== > --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-01 14:11:43.743299033 +0200 > +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-01 16:21:43.128181147 +0200 > @@ -232,9 +232,10 @@ static int ipoib_mcast_join_finish(struc > ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), > &mcast->mcmember.mgid); > if (ret < 0) { > - ipoib_warn(priv, "couldn't attach QP to multicast group " > - IPOIB_GID_FMT "\n", > - IPOIB_GID_ARG(mcast->mcmember.mgid)); > + if (ret != -ENXIO) /* No pkey found */ > + ipoib_warn(priv, "couldn't attach QP to multicast group " > + IPOIB_GID_FMT "\n", > + IPOIB_GID_ARG(mcast->mcmember.mgid)); > > clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags); > return ret; > @@ -312,7 +313,7 @@ ipoib_mcast_sendonly_join_complete(int s > status = ipoib_mcast_join_finish(mcast, &multicast->rec); > > if (status) { > - if (mcast->logcount++ < 20) > + if (mcast->logcount++ < 20 && status != -ENXIO) > ipoib_dbg_mcast(netdev_priv(dev), "multicast join failed for " > IPOIB_GID_FMT ", status %d\n", > IPOIB_GID_ARG(mcast->mcmember.mgid), status); > @@ -416,7 +417,7 @@ static int ipoib_mcast_join_complete(int > ", status %d\n", > IPOIB_GID_ARG(mcast->mcmember.mgid), > status); > - } else { > + } else if (status != -ENXIO) { > ipoib_warn(priv, "multicast join failed for " > IPOIB_GID_FMT ", status %d\n", > IPOIB_GID_ARG(mcast->mcmember.mgid), > Index: b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > =================================================================== > --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-03-01 14:39:46.712444790 +0200 > +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-03-01 16:12:55.069541201 +0200 > @@ -52,8 +52,10 @@ int ipoib_mcast_attach(struct net_device > if (ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) { > clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); > ret = -ENXIO; > + ipoib_dbg(priv, "pkey %X not found\n", priv->pkey); > goto out; > } > + ipoib_dbg(priv, "pkey %X found at index %d\n", priv->pkey, pkey_index); > set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); > > /* set correct QKey for QP */ > @@ -260,7 +262,6 @@ void ipoib_event(struct ib_event_handler > container_of(handler, struct ipoib_dev_priv, event_handler); > > if ((record->event == IB_EVENT_PORT_ERR || > - record->event == IB_EVENT_PKEY_CHANGE || > record->event == IB_EVENT_PORT_ACTIVE || > record->event == IB_EVENT_LID_CHANGE || > record->event == IB_EVENT_SM_CHANGE || > @@ -268,5 +269,9 @@ void ipoib_event(struct ib_event_handler > record->element.port_num == priv->port) { > ipoib_dbg(priv, "Port state change event\n"); > queue_work(ipoib_workqueue, &priv->flush_task); > + } else if (record->event == IB_EVENT_PKEY_CHANGE && > + record->element.port_num == priv->port) { > + ipoib_dbg(priv, "pkey change event on port:%d\n", priv->port); > + queue_work(ipoib_workqueue, &priv->restart_qp_task); > } > } > > From mst at mellanox.co.il Tue Mar 6 05:53:56 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Mar 2007 15:53:56 +0200 Subject: [ofa-general] Re: [RFC] IB/ipoib: Asynchronous events delivered without port parameter. In-Reply-To: <6a122cc00703060543m1ed9bf54ud230ebf3775001ff@mail.gmail.com> References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com> <6a122cc00702280250q33c81f90m66c6a291384b4be3@mail.gmail.com> <6a122cc00703060543m1ed9bf54ud230ebf3775001ff@mail.gmail.com> Message-ID: <20070306135356.GC6981@mellanox.co.il> Can you open a bugzilla issue pls? Quoting Moni Levy : Subject: Re: [RFC] IB/ipoib: Asynchronous events delivered without port parameter. Michael, can you please add that patch to OFED 1.2. Thanks, Moni On 2/28/07, Moni Levy wrote: > ---------- Forwarded message ---------- > From: Roland Dreier > Date: Feb 27, 2007 5:38 PM > Subject: Re: [RFC] IB/ipoib: Asynchronous events delivered without > port parameter. > To: myopenib at gmail.com > Cc: "Michael S. Tsirkin" , Or Gerlitz > , Moni Shoua , OPENIB > > > > > I did a short code review of the ipoib code concentrating on > > partitioning support and I mentioned that the asynchronous events > > handler in the ipoib code does not take the port number reported in > > the event record into consideration. The effect of that is that all of > > the ib# devices related to that specific HCA are flushed when it seems > > to me that only the relevant port one should be. Is that done on > > purpose, or am I missing something ? > > I don't think there's any particular reason the code is that way > except for the oversight never being corrected. But it looks trivial > to fix, like the patch below. Does that look right to you? > > > p.s. I'm working on a patch that should solve another issue caused by > > PKEY reordering & ipoib behavior and the above issue further > > complicates things for me. > > Why not fix the issue first then? > > commit a27cbe878203076247c1b5287f5ab59ed143b560 > Author: Roland Dreier > Date: Tue Feb 27 07:37:49 2007 -0800 > > IPoIB: Only handle async events for one port > > An asynchronous event carries the port number that the event occurred > on, so there's no reason for an IPoIB interface to process an event > associated with a different local HCA port. > > Signed-off-by: Roland Dreier > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > index 3cb551b..7f3ec20 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler, > struct ipoib_dev_priv *priv = > container_of(handler, struct ipoib_dev_priv, event_handler); > > - if (record->event == IB_EVENT_PORT_ERR || > - record->event == IB_EVENT_PKEY_CHANGE || > - record->event == IB_EVENT_PORT_ACTIVE || > - record->event == IB_EVENT_LID_CHANGE || > - record->event == IB_EVENT_SM_CHANGE || > - record->event == IB_EVENT_CLIENT_REREGISTER) { > + if ((record->event == IB_EVENT_PORT_ERR || > + record->event == IB_EVENT_PKEY_CHANGE || > + record->event == IB_EVENT_PORT_ACTIVE || > + record->event == IB_EVENT_LID_CHANGE || > + record->event == IB_EVENT_SM_CHANGE || > + record->event == IB_EVENT_CLIENT_REREGISTER) && > + record->element.port_num == priv->port) { > ipoib_dbg(priv, "Port state change event\n"); > queue_work(ipoib_workqueue, &priv->flush_task); > } > -- MST From bugzilla-daemon at lists.openfabrics.org Tue Mar 6 06:14:29 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 6 Mar 2007 06:14:29 -0800 (PST) Subject: [ofa-general] [Bug 413] New: IPoIB passes async events to an unrelated devices. Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=413 Summary: IPoIB passes async events to an unrelated devices. Product: OpenFabrics Linux Version: 1.2alpha1 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P3 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: monil at voltaire.com CC: mst at mellanox.co.il For further information please look at: "[RFC] IB/ipoib: Asynchronous events delivered without port parameter." thread on open-fabrics. The patch that solves the issue is (Please take the original one from the thread above): commit a27cbe878203076247c1b5287f5ab59ed143b560 Author: Roland Dreier Date: Tue Feb 27 07:37:49 2007 -0800 IPoIB: Only handle async events for one port An asynchronous event carries the port number that the event occurred on, so there's no reason for an IPoIB interface to process an event associated with a different local HCA port. Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 3cb551b..7f3ec20 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler, struct ipoib_dev_priv *priv = container_of(handler, struct ipoib_dev_priv, event_handler); - if (record->event == IB_EVENT_PORT_ERR || - record->event == IB_EVENT_PKEY_CHANGE || - record->event == IB_EVENT_PORT_ACTIVE || - record->event == IB_EVENT_LID_CHANGE || - record->event == IB_EVENT_SM_CHANGE || - record->event == IB_EVENT_CLIENT_REREGISTER) { + if ((record->event == IB_EVENT_PORT_ERR || + record->event == IB_EVENT_PKEY_CHANGE || + record->event == IB_EVENT_PORT_ACTIVE || + record->event == IB_EVENT_LID_CHANGE || + record->event == IB_EVENT_SM_CHANGE || + record->event == IB_EVENT_CLIENT_REREGISTER) && + record->element.port_num == priv->port) { ipoib_dbg(priv, "Port state change event\n"); queue_work(ipoib_workqueue, &priv->flush_task); } -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From wombat2 at us.ibm.com Tue Mar 6 06:14:13 2007 From: wombat2 at us.ibm.com (Bernard King-Smith) Date: Tue, 6 Mar 2007 09:14:13 -0500 Subject: [ofa-general] IPoIB-CM/RC - NAPI patch. In-Reply-To: Message-ID: "Erez Strauss" wrote on 03/06/2007 12:51:52 AM: > Hi Bernie, > > Thank you for your reply. > > In this case I?m using an application which is using very small > messages (300Bytes) and I can not change it. > The focus of the testing is to reduce the latency to minimum during > high stress, while keeping the application intact. O.K. you are looking at small packet performance as opposed to peak bandwidth. In this case NAPI would lower the interrupt rate if TCP doesn't already coalesce multiple TCP packets into a larger packet over IB. However, at a packet size of 300 bytes, I would expect that the traffic actually comes over the UD QP not the RC one. With your rate of 40000 interrupts per second, if each interrupt was a single packet you are only looking at 12 MB/s. If you are not using TCP_NODELAY then your bandwidth may be higher. Even so, NAPI hopefully with improve the small packet throughput. > > Similar synthetic test would be multiple ?iperf ?l 300 ?.? and sum > of the total bandwidth (or packets per seconds). > > Any suggestions for IPoIB-CM/RC are welcome. > Do you expect the SDP to perform better under the above constrains? > > Thanks, > Erez > Voltaire Inc. > Regards, Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner -------------- next part -------------- An HTML attachment was scrubbed... URL: From monil at voltaire.com Tue Mar 6 06:16:05 2007 From: monil at voltaire.com (Moni Levy) Date: Tue, 6 Mar 2007 16:16:05 +0200 Subject: [ofa-general] Re: [RFC] IB/ipoib: Asynchronous events delivered without port parameter. In-Reply-To: <20070306135356.GC6981@mellanox.co.il> References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com> <6a122cc00702280250q33c81f90m66c6a291384b4be3@mail.gmail.com> <6a122cc00703060543m1ed9bf54ud230ebf3775001ff@mail.gmail.com> <20070306135356.GC6981@mellanox.co.il> Message-ID: <6a122cc00703060616t439760a4m15eece83d4dc144a@mail.gmail.com> On 3/6/07, Michael S. Tsirkin wrote: > Can you open a bugzilla issue pls? Bug 413 was opened to track that issue. -- Moni From bugzilla-daemon at lists.openfabrics.org Tue Mar 6 06:16:27 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 6 Mar 2007 06:16:27 -0800 (PST) Subject: [ofa-general] [Bug 390] perftools don't work on alpha1 In-Reply-To: Message-ID: <20070306141627.20341E60836@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=390 swise at opengridcomputing.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |swise at opengridcomputing.com Status|REOPENED |NEW ------- Comment #4 from swise at opengridcomputing.com 2007-03-06 06:16 ------- I guess I'll fix this... -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Tue Mar 6 06:47:10 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 6 Mar 2007 06:47:10 -0800 (PST) Subject: [ofa-general] [Bug 413] IPoIB passes async events to an unrelated devices. In-Reply-To: Message-ID: <20070306144710.1751CE607F1@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=413 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |mst at mellanox.co.il -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Tue Mar 6 07:07:53 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Mar 2007 17:07:53 +0200 Subject: [ofa-general] Re: [RFC] IB/ipoib: Asynchronous events delivered without port parameter. In-Reply-To: <6a122cc00703060616t439760a4m15eece83d4dc144a@mail.gmail.com> References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com> <6a122cc00702280250q33c81f90m66c6a291384b4be3@mail.gmail.com> <6a122cc00703060543m1ed9bf54ud230ebf3775001ff@mail.gmail.com> <20070306135356.GC6981@mellanox.co.il> <6a122cc00703060616t439760a4m15eece83d4dc144a@mail.gmail.com> Message-ID: <20070306150753.GD6981@mellanox.co.il> > Quoting Moni Levy : > Subject: Re: [RFC] IB/ipoib: Asynchronous events delivered without port parameter. > > On 3/6/07, Michael S. Tsirkin wrote: > > Can you open a bugzilla issue pls? > > Bug 413 was opened to track that issue. Thanks. -- MST From troy at scl.ameslab.gov Tue Mar 6 07:14:20 2007 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Tue, 6 Mar 2007 09:14:20 -0600 Subject: [ofa-general] ibv_reg_mr permissions? In-Reply-To: <45ED315E.2040002@voltaire.com> References: <9BBF0228-1DC6-4D06-9D56-A71C35C01D51@scl.ameslab.gov> <45ED315E.2040002@voltaire.com> Message-ID: <92930C83-1CE4-4B5A-B2E1-6D2E887EBD9A@scl.ameslab.gov> On Mar 6, 2007, at 3:16 AM, Guy German wrote: >> I am running kernel 2.6.19.2, and I'm in the RDMA group, and can open >> /dev/infiniband/uverbs0 as a user, but can't register memory as a >> user. > > try increasing your "max locked memory" limitation. > you can try setting as root "ulimit -l unlimited" and switch user. > if it works for you - set in /etc/security/limits.conf the needed > limitations for your user > > e.g.: > user hard memlock 16384 > user soft memlock 16384 > > Guy Thanks! Now, how does one determine what processes have registered memory regions, and how much memory is registered? I have a testcase with NetPIPE and PVFS in which the process memory usage is about 100MB, but the free memory as reported by 'free' and 'top' indicates about 3GB of memory is used. This is with the IBM ehca, which can apparently support registering a lot more memory than the mellanox PCI-X card I have in another power5 system. I don't see the extreme memory usage of 4GB with mthca, but it does appear that the free memory count drops more than what I would expect giving the process usage. From tziporet at mellanox.co.il Tue Mar 6 07:32:04 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 06 Mar 2007 17:32:04 +0200 Subject: [ofa-general] OFED 1.2 beta status Message-ID: <45ED8974.60100@mellanox.co.il> Hi, There are several blocker/critical bugs for the beta release (If you are in the "To" section its mean you are an owner of one of these bugs) I updated the bugzilla with their priority - owners please fix or update bug status (you can change also priority if you think that it's not blocker/critical). According to this status the target date for the beta is moved to Thursday Marc-8 This is the list of bugs that must be fixed for the beta release: bug_id bug_severity assigned_to short_short_desc 370 blocker pasha at mellanox.co.il OFED 1.2 alpha1 MVAPICH Intel compiler support broken 379 blocker vlad at mellanox.co.il can't compile OFED 1.2 on RHEL4/SLES10 ppc64 381 blocker rowland at cse.ohio-state.edu OFED 1.2 alpha1 MVAPICH2 won't compile on RHEL4 IA64 with Intel compiler 409 blocker vlad at mellanox.co.il Cannot uninstall OFED on RHEL5 beta 2 397 critical jsquyres at cisco.com OFED 1.2 alpha1 Open MPI "InfiniBand retry count" errors 400 critical sean.hefty at intel.com OFED 1.2 alpha1 IPoIB HA failover gets QP warnings, IPoIB CM stops working 402 critical mst at mellanox.co.il On stress kernel: BUG: soft lockup detected on CPU#0! 411 critical pasha at mellanox.co.il Some Intel test suite test stuck I also attach the full list of bugs (although anyone can get it from a simple query in bugzilla) All - please work on the bugs opened to you. Tziporet -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: bugs-2007-03-06.csv URL: From kliteyn at dev.mellanox.co.il Tue Mar 6 08:03:16 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 06 Mar 2007 18:03:16 +0200 Subject: [ofa-general] [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c Message-ID: <45ED90C4.60204@dev.mellanox.co.il> Hi Hal. Converting the the C++ code to C. Please apply both to trunk and to 1.2 Thanks. Signed-off-by: Yevgeny Kliteynik --- osm/opensm/osm_ucast_lash.c | 23 ++++++++++++++++------- 1 files changed, 16 insertions(+), 7 deletions(-) diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c index 0afa43c..8c9172e 100644 --- a/osm/opensm/osm_ucast_lash.c +++ b/osm/opensm/osm_ucast_lash.c @@ -406,7 +406,7 @@ static int get_phys_connection(switch_t static void shortest_path(lash_t *p_lash, int ir) { switch_t **switches = p_lash->switches, *sw, *swi; - int i; + uint16_t i; cl_list_t bfsq; cl_list_construct(&bfsq); @@ -986,11 +986,18 @@ static int lash_core(lash_t *p_lash) int output_link2, i_next_switch2; int cycle_found2 = 0; int status = IB_SUCCESS; + int * switch_bitmap = NULL; OSM_LOG_ENTER(p_log, lash_core); - //Bitmap to check if we have processed this pair - int switch_bitmap[num_switches][num_switches]; + switch_bitmap = (int *)malloc(num_switches * num_switches * sizeof(int)); + if (!switch_bitmap) + { + osm_log(p_log, OSM_LOG_ERROR, + "lash_core: ERR 4D04: " + "Failed allocating switch_bitmap - out of memory\n"); + goto Exit; + } for(i=0; iused_channels = 0; switches[j]->q_state = UNQUEUED; @@ -1015,7 +1022,7 @@ static int lash_core(lash_t *p_lash) for(i=0; ivirtual_location[i][dest_switch][v_lane] = 1; p_lash->virtual_location[dest_switch][i][v_lane] = 1; - switch_bitmap[i][dest_switch] = 1; - switch_bitmap[dest_switch][i] = 1; + switch_bitmap[i * num_switches + dest_switch] = 1; + switch_bitmap[dest_switch * num_switches + i] = 1; } for(j=0; jvl_min, lanes_needed); Exit: + if (switch_bitmap) + free(switch_bitmap); OSM_LOG_EXIT(p_log); return status; } -- 1.4.4.1.GIT From bugzilla-daemon at lists.openfabrics.org Tue Mar 6 08:18:28 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 6 Mar 2007 08:18:28 -0800 (PST) Subject: [ofa-general] [Bug 316] Daily build: libibcm build fails on x86-64 if 32-bit sysfsutils is present In-Reply-To: Message-ID: <20070306161828.4E920E607F1@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=316 weiny2 at llnl.gov changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |weiny2 at llnl.gov ------- Comment #1 from weiny2 at llnl.gov 2007-03-06 08:18 ------- I am getting this same bug with the OFED-1.2-20070302-0600 and OFED-1.2-20070305-0722 nightly builds. I know this was fixed some time between when this bug was submitted and these builds because I was building from a couple of weeks ago just fine. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From swise at opengridcomputing.com Tue Mar 6 08:24:55 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Mar 2007 10:24:55 -0600 Subject: [ofa-general] [PATCH ofed_1_2] perftest: asprintf usage error in rdma_bw.c Message-ID: <1173198295.14011.21.camel@stevo-desktop> asprintf usage error in rdma_bw.c Signed-off-by: Steve Wise --- rdma_bw.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/rdma_bw.c b/rdma_bw.c index e55b82d..28cee43 100644 --- a/rdma_bw.c +++ b/rdma_bw.c @@ -327,7 +327,7 @@ static struct pingpong_context *pp_serve struct rdma_cm_id *child_cm_id; struct rdma_conn_param conn_param; - if (asprintf(&service, "%d", data->port)) + if (asprintf(&service, "%d", data->port) < 0) goto err5; if ( (n = getaddrinfo(NULL, service, &hints, &res)) < 0 ) { From troy at scl.ameslab.gov Tue Mar 6 08:27:09 2007 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Tue, 6 Mar 2007 10:27:09 -0600 Subject: [ofa-general] ibv_reg_mr errno return codes? Message-ID: <96039460-1ACA-44F5-AEA9-F091B1CD67BE@scl.ameslab.gov> I made the following change to infiniband/hw/ehca/ehca_mrmw.c to have ibv_reg_mr return -ENOMEM instead of -EINVAL.. shouldn't we define and document what errno codes ibv_reg_mr is expected to return so that applications have some idea if there is a permanent failure and they need to exit, or go back and unregister some locked memory? diff --git a/drivers/infiniband/hw/ehca/ehca_mrmw.c b/drivers/ infiniband/hw/ehca/ehca_ index cfb362a..58073f0 100644 --- a/drivers/infiniband/hw/ehca/ehca_mrmw.c +++ b/drivers/infiniband/hw/ehca/ehca_mrmw.c @@ -2047,13 +2047,14 @@ int ehca_mrmw_map_hrc_alloc(const u64 hipz_rc) return 0; case H_ADAPTER_PARM: /* invalid adapter handle */ case H_RT_PARM: /* invalid resource type */ - case H_NOT_ENOUGH_RESOURCES: /* insufficient resources */ case H_MLENGTH_PARM: /* invalid memory length */ case H_MEM_ACCESS_PARM: /* invalid access controls */ case H_CONSTRAINED: /* resource constraint */ return -EINVAL; case H_BUSY: /* long busy */ return -EBUSY; + case H_NOT_ENOUGH_RESOURCES: /* insufficient resources */ + return -ENOMEM; default: return -EINVAL; } -------------- next part -------------- An HTML attachment was scrubbed... URL: From monisonlists at gmail.com Tue Mar 6 08:35:40 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Tue, 06 Mar 2007 18:35:40 +0200 Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <20070305151411.GF5311@mellanox.co.il> References: <45E313D2.70909@voltaire.com> <45EAA02F.4000108@gmail.com> <20070304175915.GG17950@mellanox.co.il> <45EC290C.9000702@gmail.com> <20070305151411.GF5311@mellanox.co.il> Message-ID: <45ED985C.8000506@gmail.com> Michael S. Tsirkin wrote: >> Quoting Moni Shoua : >> Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB >> >> Michael S. Tsirkin wrote: >>>> Quoting Moni Shoua : >>>> Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB >>>> >>>> This version of the patch tracks the allocs and releases of ipoib_neigh and >>>> keeps a list of them. Before IPoIB net device unregisters the list is passed >>>> to destroy ipoib_neighs that ride on on a bond neighbour. >>>> >>>> This is a replacement to the method of scanning the arp and ndisc >>>> tables. >>> Why does the list need to be global? >>> We already have a per-device list of paths, and each of these in turn >>> has a list of neighbours. Can't this be used? >>> >> OK, It's a good point but coming to think of it now I have a question >> >> When a device unregisters ipoib_stop() is called and all ipoib_neighs are destroyed. >> Isn't it enough to ensure that ipoib_neigh_destructor will not try to >> "touch" one of the ib devs? or in other words: Isn't it that the work to >> clean ipoib_neighs is unnecessary? >> Michael, Do you agree that destroying the ipoib_neighs through (a call trace that starts with) ipoib_stop() is enough for safety and that there is no need to do that just before calling to unregister_netdev() ? >> BTW: I guess that idea of global list was influenced from the ipoib_8111... patch. >> Why was it used there? > > AFAIK, the point is to check whether (in pre-2.6.17 kernels) some neighbour > shares the same ops pointer. Only after no such neighbours are left is it safe > to set the destructor to NULL. > > This backport is not raceless BTW - some neighbour not related to IPoIB could > be running the destructor. But I think it's the best I could come up with > for these old kernels. > From monisonlists at gmail.com Tue Mar 6 08:40:40 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Tue, 06 Mar 2007 18:40:40 +0200 Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <20070305175825.GC4264@mellanox.co.il> References: <45E313D2.70909@voltaire.com> <45EAA02F.4000108@gmail.com> <20070304182048.GG19828@mellanox.co.il> <45EC2AC6.2090805@gmail.com> <20070305175825.GC4264@mellanox.co.il> Message-ID: <45ED9988.4080608@gmail.com> Michael S. Tsirkin wrote: >>>> + if (ipoib_at_exit) >>>> + nn->neighbour->parms->neigh_destructor = NULL; >>> Is it safe to do this without locking? >>> Could the destructor be in progress when we do this? >> I think you're right. Maybe I need to attack the issue in a different way. >> I need to do some rethinking. > > The basic problem seems to be that bonding code is taking a pointer > into the module (neighbour setup) without taking reference on > the module. > I tend to agree but what is the way to do that? How can I prevent ib_ipoib from unloading if bonding is loaded? I still have to be able to unregister net_devices (for hotplug) and I don't want to do something that will force other network devices (e.g. Ethernet)to change. Is there a way to take reference count of a module in 2.6 kernels? From jgunthorpe at obsidianresearch.com Tue Mar 6 09:51:14 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 6 Mar 2007 10:51:14 -0700 Subject: [ofa-general] [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c In-Reply-To: <45ED90C4.60204@dev.mellanox.co.il> References: <45ED90C4.60204@dev.mellanox.co.il> Message-ID: <20070306175114.GJ11411@obsidianresearch.com> On Tue, Mar 06, 2007 at 06:03:16PM +0200, Yevgeny Kliteynik wrote: > Hi Hal. > > Converting the the C++ code to C. This is actually valid C99 code. This is the method that ISO standardized in C99 to do dynamic stack allocations (alloca is not an ISO C function). Since it is now 2007 is there really still a desire to not use C99 features? Jason From mshefty at ichips.intel.com Tue Mar 6 10:05:07 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 06 Mar 2007 10:05:07 -0800 Subject: [ofa-general] Re: [openib-general] [PATCH] 2.6.20 ib_cm: limit cm message timeouts In-Reply-To: <45EC84E1.1080709@dev.mellanox.co.il> References: <002301c73355$c220e180$8698070a@amr.corp.intel.com> <45EC84E1.1080709@dev.mellanox.co.il> Message-ID: <45EDAD53.1080604@ichips.intel.com> > +#define DRV_NAME "ib_cm" > +#define PFX DRV_NAME ": " Just define PFX. > + > +/* > + * Limit CM msg timeouts to something reasonable. > + * 8 seconds, with up to 15 retries, gives per msg timeout of 2 min. > + */ > +#define IB_CM_MAX_TIMEOUT 21 Thinking out loud... maybe we should make this a module parameter. > + > static void cm_add_one(struct ib_device *device); > static void cm_remove_one(struct ib_device *device); > > @@ -887,13 +896,26 @@ static void cm_format_req(struct cm_req_ > cm_req_set_local_qpn(req_msg, cpu_to_be32(param->qp_num)); > cm_req_set_resp_res(req_msg, param->responder_resources); > cm_req_set_init_depth(req_msg, param->initiator_depth); > - cm_req_set_remote_resp_timeout(req_msg, > - param->remote_cm_response_timeout); > + if ((u8) IB_CM_MAX_TIMEOUT >= param->remote_cm_response_timeout) This is a nit, but I find reading the reverse of the above check easier to understand and verify for correctness: if (remote_cm_response_timeout > IB_CM_MAX_TIMEOUT) ... if (remote_cm_response_timeout <= IB_CM_MAX_TIMEOUT) ... That is, we're comparing the user's timeout to the max, not the other way around. > + if ((u8) IB_CM_MAX_TIMEOUT >= param->local_cm_response_timeout) Same nit. > @@ -2707,7 +2745,13 @@ int ib_send_cm_sidr_req(struct ib_cm_id > > cm_id->service_id = param->service_id; > cm_id->service_mask = __constant_cpu_to_be64(~0ULL); > - cm_id_priv->timeout_ms = param->timeout_ms; > + if (cm_id_priv->timeout_ms > cm_convert_to_ms(IB_CM_MAX_TIMEOUT)) > + cm_id_priv->timeout_ms = param->timeout_ms; > + else { > + cm_id_priv->timeout_ms = cm_convert_to_ms(IB_CM_MAX_TIMEOUT); > + printk(KERN_WARNING PFX "Given timeout to ib_send_cm_sidr_req " > + "is too long, setting it to default value"); > + } This is keeping the larger of the two values. I would just remove the line after the if() and the else statement. Thanks for updating the patch. - Sean From bugzilla-daemon at lists.openfabrics.org Tue Mar 6 10:11:37 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 6 Mar 2007 10:11:37 -0800 (PST) Subject: [ofa-general] [Bug 316] Daily build: libibcm build fails on x86-64 if 32-bit sysfsutils is present In-Reply-To: Message-ID: <20070306181137.D5270E60803@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=316 vlad at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |vlad at mellanox.co.il -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From sean.hefty at intel.com Tue Mar 6 10:14:51 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 6 Mar 2007 10:14:51 -0800 Subject: [ofa-general] OFA web page needs updating Message-ID: <000001c7601b$55204470$8698070a@amr.corp.intel.com> Can we get the ofa web page updated? Specifically: Development Tools - link to git rather than svn Contact - update the mailing list information, including link to archives Downloads - link to developer public_html download areas - Sean From halr at voltaire.com Tue Mar 6 10:31:20 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Mar 2007 13:31:20 -0500 Subject: [ofa-general] [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c In-Reply-To: <20070306175114.GJ11411@obsidianresearch.com> References: <45ED90C4.60204@dev.mellanox.co.il> <20070306175114.GJ11411@obsidianresearch.com> Message-ID: <1173205862.4546.363189.camel@hal.voltaire.com> On Tue, 2007-03-06 at 12:51, Jason Gunthorpe wrote: > On Tue, Mar 06, 2007 at 06:03:16PM +0200, Yevgeny Kliteynik wrote: > > Hi Hal. > > > > Converting the the C++ code to C. > > This is actually valid C99 code. This is the method that ISO > standardized in C99 to do dynamic stack allocations (alloca is not > an ISO C function). > > Since it is now 2007 is there really still a desire to not use C99 > features? My guess is that the Windows compiler didn't like this though :-( -- Hal > Jason From mst at mellanox.co.il Tue Mar 6 10:39:50 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Mar 2007 20:39:50 +0200 Subject: [ofa-general] Re: [PATCH ofed_1_2] perftest: asprintf usage error in rdma_bw.c In-Reply-To: <1173198295.14011.21.camel@stevo-desktop> References: <1173198295.14011.21.camel@stevo-desktop> Message-ID: <20070306183950.GA16562@mellanox.co.il> > Quoting Steve Wise : > Subject: [PATCH ofed_1_2] perftest: asprintf usage error in rdma_bw.c > > > asprintf usage error in rdma_bw.c > > Signed-off-by: Steve Wise Applied to both ofed_1_2 and master in perftests. Vlad, pls pull. -- MST From mst at mellanox.co.il Tue Mar 6 10:43:00 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Mar 2007 20:43:00 +0200 Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45ED985C.8000506@gmail.com> References: <45E313D2.70909@voltaire.com> <45EAA02F.4000108@gmail.com> <20070304175915.GG17950@mellanox.co.il> <45EC290C.9000702@gmail.com> <20070305151411.GF5311@mellanox.co.il> <45ED985C.8000506@gmail.com> Message-ID: <20070306184300.GB16562@mellanox.co.il> > Quoting Moni Shoua : > Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB > > Michael S. Tsirkin wrote: > >> Quoting Moni Shoua : > >> Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB > >> > >> Michael S. Tsirkin wrote: > >>>> Quoting Moni Shoua : > >>>> Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB > >>>> > >>>> This version of the patch tracks the allocs and releases of ipoib_neigh and > >>>> keeps a list of them. Before IPoIB net device unregisters the list is passed > >>>> to destroy ipoib_neighs that ride on on a bond neighbour. > >>>> > >>>> This is a replacement to the method of scanning the arp and ndisc > >>>> tables. > >>> Why does the list need to be global? > >>> We already have a per-device list of paths, and each of these in turn > >>> has a list of neighbours. Can't this be used? > >>> > >> OK, It's a good point but coming to think of it now I have a question > >> > >> When a device unregisters ipoib_stop() is called and all ipoib_neighs are destroyed. > >> Isn't it enough to ensure that ipoib_neigh_destructor will not try to > >> "touch" one of the ib devs? or in other words: Isn't it that the work to > >> clean ipoib_neighs is unnecessary? > >> > Michael, Do you agree that destroying the ipoib_neighs > through (a call trace that starts with) ipoib_stop() is enough for safety > and that there is no need to do that just before calling to unregister_netdev() ? Well, you have to make sure no one is caching either the device pointer or the destructor/setup pointers. -- MST From rdreier at cisco.com Tue Mar 6 10:51:38 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Mar 2007 10:51:38 -0800 Subject: [ofa-general] IPoIB-CM/RC - NAPI patch. In-Reply-To: (Erez Strauss's message of "Tue, 6 Mar 2007 00:07:41 +0200") References: Message-ID: See git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git ipoib-napi for the NAPI work. From mst at mellanox.co.il Tue Mar 6 10:53:07 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Mar 2007 20:53:07 +0200 Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45ED9988.4080608@gmail.com> References: <45E313D2.70909@voltaire.com> <45EAA02F.4000108@gmail.com> <20070304182048.GG19828@mellanox.co.il> <45EC2AC6.2090805@gmail.com> <20070305175825.GC4264@mellanox.co.il> <45ED9988.4080608@gmail.com> Message-ID: <20070306185307.GC16562@mellanox.co.il> > Quoting Moni Shoua : > Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB > > Michael S. Tsirkin wrote: > >>>> + if (ipoib_at_exit) > >>>> + nn->neighbour->parms->neigh_destructor = NULL; > >>> Is it safe to do this without locking? > >>> Could the destructor be in progress when we do this? > >> I think you're right. Maybe I need to attack the issue in a different way. > >> I need to do some rethinking. > > > > The basic problem seems to be that bonding code is taking a pointer > > into the module (neighbour setup) without taking reference on > > the module. > > > I tend to agree but what is the way to do that? > How can I prevent ib_ipoib from unloading if bonding is loaded? > > I still have to be able to unregister net_devices (for hotplug) > and I don't want to do something that will force other network devices (e.g. Ethernet)to change. > > Is there a way to take reference count of a module in 2.6 kernels? Look in linux/module.h But taking out device reference might be better. -- MST From mst at mellanox.co.il Tue Mar 6 10:56:13 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Mar 2007 20:56:13 +0200 Subject: [ofa-general] Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c In-Reply-To: <45ED90C4.60204@dev.mellanox.co.il> References: <45ED90C4.60204@dev.mellanox.co.il> Message-ID: <20070306185613.GD16562@mellanox.co.il> > Hi Hal. > > Converting the the C++ code to C. > > Please apply both to trunk and to 1.2 > > Thanks. > > Signed-off-by: Yevgeny Kliteynik NAK. 1. I don't see any C++ here. 2. Why do we need this on ofed branch? Only bugfixes should go there. What bug does it fix? > --- > osm/opensm/osm_ucast_lash.c | 23 ++++++++++++++++------- > 1 files changed, 16 insertions(+), 7 deletions(-) > > diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c > index 0afa43c..8c9172e 100644 > --- a/osm/opensm/osm_ucast_lash.c > +++ b/osm/opensm/osm_ucast_lash.c > @@ -406,7 +406,7 @@ static int get_phys_connection(switch_t > static void shortest_path(lash_t *p_lash, int ir) > { > switch_t **switches = p_lash->switches, *sw, *swi; > - int i; > + uint16_t i; > cl_list_t bfsq; > > cl_list_construct(&bfsq); > @@ -986,11 +986,18 @@ static int lash_core(lash_t *p_lash) > int output_link2, i_next_switch2; > int cycle_found2 = 0; > int status = IB_SUCCESS; > + int * switch_bitmap = NULL; > > OSM_LOG_ENTER(p_log, lash_core); > > - //Bitmap to check if we have processed this pair > - int switch_bitmap[num_switches][num_switches]; > + switch_bitmap = (int *)malloc(num_switches * num_switches * sizeof(int)); > + if (!switch_bitmap) > + { > + osm_log(p_log, OSM_LOG_ERROR, > + "lash_core: ERR 4D04: " > + "Failed allocating switch_bitmap - out of memory\n"); > + goto Exit; > + } > > for(i=0; i > @@ -1006,7 +1013,7 @@ static int lash_core(lash_t *p_lash) > > for(j=0; j for(k=0; k - switch_bitmap[j][k] = 0; > + switch_bitmap[j * num_switches + k] = 0; > } > switches[j]->used_channels = 0; > switches[j]->q_state = UNQUEUED; > @@ -1015,7 +1022,7 @@ static int lash_core(lash_t *p_lash) > > for(i=0; i for(dest_switch=0; dest_switch - if(dest_switch != i && switch_bitmap[i][dest_switch] == 0) { > + if(dest_switch != i && switch_bitmap[i * num_switches + dest_switch] == 0) { > v_lane = 0; > stop = 0; > while(v_lane < lanes_needed && stop == 0) { > @@ -1078,8 +1085,8 @@ static int lash_core(lash_t *p_lash) > p_lash->virtual_location[i][dest_switch][v_lane] = 1; > p_lash->virtual_location[dest_switch][i][v_lane] = 1; > > - switch_bitmap[i][dest_switch] = 1; > - switch_bitmap[dest_switch][i] = 1; > + switch_bitmap[i * num_switches + dest_switch] = 1; > + switch_bitmap[dest_switch * num_switches + i] = 1; > } > > for(j=0; j @@ -1115,6 +1122,8 @@ static int lash_core(lash_t *p_lash) > "Lane requirements (%d) exceed available lanes (%d)\n", > p_lash->vl_min, lanes_needed); > Exit: > + if (switch_bitmap) > + free(switch_bitmap); > OSM_LOG_EXIT(p_log); > return status; > } -- MST From mst at mellanox.co.il Tue Mar 6 10:59:38 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Mar 2007 20:59:38 +0200 Subject: [ofa-general] Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c In-Reply-To: <1173205862.4546.363189.camel@hal.voltaire.com> References: <45ED90C4.60204@dev.mellanox.co.il> <20070306175114.GJ11411@obsidianresearch.com> <1173205862.4546.363189.camel@hal.voltaire.com> Message-ID: <20070306185938.GE16562@mellanox.co.il> > Quoting Hal Rosenstock : > Subject: Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c > > On Tue, 2007-03-06 at 12:51, Jason Gunthorpe wrote: > > On Tue, Mar 06, 2007 at 06:03:16PM +0200, Yevgeny Kliteynik wrote: > > > Hi Hal. > > > > > > Converting the the C++ code to C. > > > > This is actually valid C99 code. This is the method that ISO > > standardized in C99 to do dynamic stack allocations (alloca is not > > an ISO C function). > > > > Since it is now 2007 is there really still a desire to not use C99 > > features? > > My guess is that the Windows compiler didn't like this though :-( Usually it's just a question of using the proper compiler flags. Please check. In any case 1. Should say so in the log 2. Please don't apply to ofed branch, only bugfixes should go there -- MST From mst at mellanox.co.il Tue Mar 6 11:00:51 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Mar 2007 21:00:51 +0200 Subject: [ofa-general] Re: OFA web page needs updating In-Reply-To: <000001c7601b$55204470$8698070a@amr.corp.intel.com> References: <000001c7601b$55204470$8698070a@amr.corp.intel.com> Message-ID: <20070306190051.GF16562@mellanox.co.il> > Quoting Sean Hefty : > Subject: OFA web page needs updating > > Can we get the ofa web page updated? Specifically: > > Development Tools - link to git rather than svn > Contact - update the mailing list information, including link to archives > Downloads - link to developer public_html download areas Best of all, can www.openfabrics.org please finally migrate to the new server? Then we will be able to fix things ourselves. -- MST From afriedle at open-mpi.org Tue Mar 6 10:57:07 2007 From: afriedle at open-mpi.org (Andrew Friedley) Date: Tue, 06 Mar 2007 13:57:07 -0500 Subject: [ofa-general] build failure on nightly tarball -- bonding In-Reply-To: <45ED1D78.5090602@gmail.com> References: <45E846F6.7070705@open-mpi.org> <45EA9A08.1090500@gmail.com> <45EC463A.6000805@open-mpi.org> <45ED1D78.5090602@gmail.com> Message-ID: <45EDB983.5050304@open-mpi.org> Moni Shoua wrote: > Andrew Friedley wrote: >> Moni Shoua wrote: >>> Andrew Friedley wrote: >>>> The chelsio build errors from yesterday appear to be gone, though now >>>> I'm seeing errors building the IB bonding code with the 3/2 alpha >>>> tarball -- error below. I'm wondering, is there a way to selectively >>>> avoid building things like this that seem to be optional, as a tarball >>>> user? >>>> >>>> Andrew >>> For the error messages.... It seems to me that the problem is one that >>> I have already fixed. >>> The corrected source RPM is in my home dir. >> Is there a reason the fix is not in the nightly alpha tarballs? Where >> do I find your home directory? >> >> Andrew >> > I'm not sure what happened. The fix should have been included n the nightly build. > My home is at ~monis/ofed_1_2/ Thanks for the pointer. Using last night's tarball with ib-bonding-0.9.0-2.src.rpm from your home director still fails with the exact same error message. Furthermore, I ran md5sums of both your SRPM and the one included in last night's alpha tarball -- they are exactly the same. As expected, a straight install of last night's tarball still fails in the same manner. Andrew From jsquyres at cisco.com Tue Mar 6 11:31:35 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 6 Mar 2007 14:31:35 -0500 Subject: [ofa-general] Re: OFA web page needs updating In-Reply-To: <20070306190051.GF16562@mellanox.co.il> References: <000001c7601b$55204470$8698070a@amr.corp.intel.com> <20070306190051.GF16562@mellanox.co.il> Message-ID: www.openfabrics.org has been on the new server for quite a long time. Be aware that the promoters group runs the main content on www.openfabrics.org and they're just about to launch a new version of the site (I heard 2nd hand -- I'm not part of the promoters group). On Mar 6, 2007, at 2:00 PM, Michael S. Tsirkin wrote: >> Quoting Sean Hefty : >> Subject: OFA web page needs updating >> >> Can we get the ofa web page updated? Specifically: >> >> Development Tools - link to git rather than svn >> Contact - update the mailing list information, including link to >> archives >> Downloads - link to developer public_html download areas > > Best of all, can www.openfabrics.org please finally migrate to the > new server? > Then we will be able to fix things ourselves. > > -- > MST > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From mst at mellanox.co.il Tue Mar 6 11:39:11 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Mar 2007 21:39:11 +0200 Subject: [ofa-general] Re: OFA web page needs updating In-Reply-To: References: <000001c7601b$55204470$8698070a@amr.corp.intel.com> <20070306190051.GF16562@mellanox.co.il> Message-ID: <20070306193911.GP16562@mellanox.co.il> Well, for now, I can't imagine fixing a couple of typos will be much of a problem. Quoting Jeff Squyres : Subject: Re: [ofa-general] Re: OFA web page needs updating www.openfabrics.org has been on the new server for quite a long time. Be aware that the promoters group runs the main content on www.openfabrics.org and they're just about to launch a new version of the site (I heard 2nd hand -- I'm not part of the promoters group). On Mar 6, 2007, at 2:00 PM, Michael S. Tsirkin wrote: >> Quoting Sean Hefty : >> Subject: OFA web page needs updating >> >> Can we get the ofa web page updated? Specifically: >> >> Development Tools - link to git rather than svn >> Contact - update the mailing list information, including link to >> archives >> Downloads - link to developer public_html download areas > > Best of all, can www.openfabrics.org please finally migrate to the > new server? > Then we will be able to fix things ourselves. > > -- > MST > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems -- MST From mst at mellanox.co.il Tue Mar 6 11:06:10 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Mar 2007 21:06:10 +0200 Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45ED9988.4080608@gmail.com> References: <45E313D2.70909@voltaire.com> <45EAA02F.4000108@gmail.com> <20070304182048.GG19828@mellanox.co.il> <45EC2AC6.2090805@gmail.com> <20070305175825.GC4264@mellanox.co.il> <45ED9988.4080608@gmail.com> Message-ID: <20070306190610.GH16562@mellanox.co.il> > Quoting Moni Shoua : > Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB > > Michael S. Tsirkin wrote: > >>>> + if (ipoib_at_exit) > >>>> + nn->neighbour->parms->neigh_destructor = NULL; > >>> Is it safe to do this without locking? > >>> Could the destructor be in progress when we do this? > >> I think you're right. Maybe I need to attack the issue in a different way. > >> I need to do some rethinking. > > > > The basic problem seems to be that bonding code is taking a pointer > > into the module (neighbour setup) without taking reference on > > the module. > > > I tend to agree but what is the way to do that? > How can I prevent ib_ipoib from unloading if bonding is loaded? > > I still have to be able to unregister net_devices (for hotplug) > and I don't want to do something that will force other network devices (e.g. Ethernet)to change. > > Is there a way to take reference count of a module in 2.6 kernels? Thinking aloud, maybe unregister_netdev should be made aware of bonding, and flush out all neighbours skbs etc for all devices in the bonding group. -- MST From rdreier at cisco.com Tue Mar 6 11:47:11 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Mar 2007 11:47:11 -0800 Subject: [ofa-general] [PATCH 2.6.21-rc2] iw_cxgb3: Start ep timer on a MPA reject. In-Reply-To: <1173137566.14159.95.camel@stevo-desktop> (Steve Wise's message of "Mon, 05 Mar 2007 17:32:46 -0600") References: <1173137566.14159.95.camel@stevo-desktop> Message-ID: Thanks, applied. From rdreier at cisco.com Tue Mar 6 11:48:27 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Mar 2007 11:48:27 -0800 Subject: [ofa-general] [PATCH 2.6.21-rc2] iw_cxgb3: Don't use mm after its freed in iwch_mmap(). In-Reply-To: <1172873196.21241.62.camel@stevo-desktop> (Steve Wise's message of "Fri, 02 Mar 2007 16:06:36 -0600") References: <1172873196.21241.62.camel@stevo-desktop> Message-ID: Thanks, applied. From rdreier at cisco.com Tue Mar 6 11:51:03 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Mar 2007 11:51:03 -0800 Subject: [ofa-general] Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy In-Reply-To: <20070303215150.GB17950@mellanox.co.il> (Michael S. Tsirkin's message of "Sat, 3 Mar 2007 23:51:50 +0200") References: <20070302111658.GE27542@mellanox.co.il> <20070303215150.GB17950@mellanox.co.il> Message-ID: > With current code, when we destroy a QP, we remove it from table first, > and move QP to reset. This is clearly wrong, and this patch fixes this. I guess so, but it still leaves some other obvious races. First, the QP is removed from the table before its CQEs are cleaned -- to fix this, we should move the mthca_cq_clean() call inside the CQ locks (and change mthca_cq_clean to require the CQ lock be held). Second, I think we do need to synchronize against the asynchronous event interrupt when freeing a QP. I don't think the modify QP to reset code needs to change, since the QP is not being removed from the table there. > To fix the issue completely, the simplest approach is to use the same > EQ for completion and async events and for command interface. > I plan to send such a patch next week. I don't get it -- how do you handle having multiple completion EQs with this idea? - R. From jsquyres at cisco.com Tue Mar 6 11:51:06 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 6 Mar 2007 14:51:06 -0500 Subject: [ofa-general] Re: OFA web page needs updating In-Reply-To: <20070306193911.GP16562@mellanox.co.il> References: <000001c7601b$55204470$8698070a@amr.corp.intel.com> <20070306190051.GF16562@mellanox.co.il> <20070306193911.GP16562@mellanox.co.il> Message-ID: <12BBC742-9691-4F5D-9CA8-F7BD3553E9B2@cisco.com> All I'm saying is that the promoters group is the "owner" of the web site. Before you touch it, you should coordinate with them. On Mar 6, 2007, at 2:39 PM, Michael S. Tsirkin wrote: > Well, for now, I can't imagine fixing a couple of typos > will be much of a problem. > > Quoting Jeff Squyres : > Subject: Re: [ofa-general] Re: OFA web page needs updating > > www.openfabrics.org has been on the new server for quite a long time. > > Be aware that the promoters group runs the main content on > www.openfabrics.org and they're just about to launch a new version of > the site (I heard 2nd hand -- I'm not part of the promoters group). > > > On Mar 6, 2007, at 2:00 PM, Michael S. Tsirkin wrote: > >>> Quoting Sean Hefty : >>> Subject: OFA web page needs updating >>> >>> Can we get the ofa web page updated? Specifically: >>> >>> Development Tools - link to git rather than svn >>> Contact - update the mailing list information, including link to >>> archives >>> Downloads - link to developer public_html download areas >> >> Best of all, can www.openfabrics.org please finally migrate to the >> new server? >> Then we will be able to fix things ourselves. >> >> -- >> MST >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/ >> openib-general > > > -- > Jeff Squyres > Server Virtualization Business Unit > Cisco Systems > > -- > MST -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From mst at mellanox.co.il Tue Mar 6 11:58:58 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Mar 2007 21:58:58 +0200 Subject: [ofa-general] Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy In-Reply-To: References: <20070303215150.GB17950@mellanox.co.il> Message-ID: <20070306195858.GB21620@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy > > > With current code, when we destroy a QP, we remove it from table first, > > and move QP to reset. This is clearly wrong, and this patch fixes this. > > I guess so, but it still leaves some other obvious races. First, the > QP is removed from the table before its CQEs are cleaned -- to fix > this, we should move the mthca_cq_clean() call inside the CQ locks > (and change mthca_cq_clean to require the CQ lock be held). Second, I > think we do need to synchronize against the asynchronous event > interrupt when freeing a QP. > > I don't think the modify QP to reset code needs to change, since the > QP is not being removed from the table there. Yes but here we also must make sure completion events and async events are flushed out: once QP is in reset no events should be generated. > > To fix the issue completely, the simplest approach is to use the same > > EQ for completion and async events and for command interface. > > I plan to send such a patch next week. > > I don't get it -- how do you handle having multiple completion EQs > with this idea? You are right here. But the point is that multiple completion EQs are not 2.6.21 material. OK? -- MST From mst at mellanox.co.il Tue Mar 6 11:59:50 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Mar 2007 21:59:50 +0200 Subject: [ofa-general] Re: Re: OFA web page needs updating In-Reply-To: <12BBC742-9691-4F5D-9CA8-F7BD3553E9B2@cisco.com> References: <000001c7601b$55204470$8698070a@amr.corp.intel.com> <20070306190051.GF16562@mellanox.co.il> <20070306193911.GP16562@mellanox.co.il> <12BBC742-9691-4F5D-9CA8-F7BD3553E9B2@cisco.com> Message-ID: <20070306195950.GC21620@mellanox.co.il> Is there a list? They don't seem to be reading this one ... Quoting Jeff Squyres : Subject: Re: Re: OFA web page needs updating All I'm saying is that the promoters group is the "owner" of the web site. Before you touch it, you should coordinate with them. On Mar 6, 2007, at 2:39 PM, Michael S. Tsirkin wrote: >Well, for now, I can't imagine fixing a couple of typos >will be much of a problem. > >Quoting Jeff Squyres : >Subject: Re: [ofa-general] Re: OFA web page needs updating > >www.openfabrics.org has been on the new server for quite a long time. > >Be aware that the promoters group runs the main content on >www.openfabrics.org and they're just about to launch a new version of >the site (I heard 2nd hand -- I'm not part of the promoters group). > > >On Mar 6, 2007, at 2:00 PM, Michael S. Tsirkin wrote: > >>>Quoting Sean Hefty : >>>Subject: OFA web page needs updating >>> >>>Can we get the ofa web page updated? Specifically: >>> >>>Development Tools - link to git rather than svn >>>Contact - update the mailing list information, including link to >>>archives >>>Downloads - link to developer public_html download areas >> >>Best of all, can www.openfabrics.org please finally migrate to the >>new server? >>Then we will be able to fix things ourselves. >> >>-- >>MST >>_______________________________________________ >>general mailing list >>general at lists.openfabrics.org >>http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >>To unsubscribe, please visit http://openib.org/mailman/listinfo/ >>openib-general > > >-- >Jeff Squyres >Server Virtualization Business Unit >Cisco Systems > >-- >MST -- Jeff Squyres Server Virtualization Business Unit Cisco Systems _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- MST From rdreier at cisco.com Tue Mar 6 12:04:31 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Mar 2007 12:04:31 -0800 Subject: [ofa-general] Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy In-Reply-To: <20070306195858.GB21620@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 6 Mar 2007 21:58:58 +0200") References: <20070303215150.GB17950@mellanox.co.il> <20070306195858.GB21620@mellanox.co.il> Message-ID: > Yes but here we also must make sure completion events and async events are > flushed out: once QP is in reset no events should be generated. Completion events are fine -- at worst the consumer gets a spurious event but doesn't find any CQEs. I see the point about async events. It does make sense to merge the async event and command EQs to get good ordering -- splitting them doesn't make much performance difference I guess. > > I don't get it -- how do you handle having multiple completion EQs > > with this idea? > > You are right here. > But the point is that multiple completion EQs are not 2.6.21 material. > OK? I don't want to put a fix in that we have to back out later. Let's figure out how to fix this properly in the first place. But I think just merging async events and command completions into one EQ suffices. - R. From chas.williams at cmf.nrl.navy.mil Tue Mar 6 12:07:59 2007 From: chas.williams at cmf.nrl.navy.mil (chas williams - CONTRACTOR) Date: Tue, 06 Mar 2007 15:07:59 -0500 Subject: [ofa-general] ipoib performance (and xplot) Message-ID: <200703062008.l26K80EK019312@cmf.nrl.navy.mil> while looking at some ipoib performance i had a chance to graph the tcp flow in xplot (see http://www.xplot.org/). the graph appears very strange and is attached to this message. the lower solid line represent acks coming back from the tcp server, the up line represent the window size (i disabled window scaling btw, this doesnt affect performance). the up/down arrows (they look like diamonds due to scale) represent packets. this is a view from the tcp client. the initial part of the graph is tcp slow-start/congestion. what is curious to how the returning acks (after slow start is finished) seem to get quiet periodically. then the next ack that returns, then acks the entire window. this seems to be leading to a very bursty behavior. i would normally expect to see two data packets followed by an ack as can be seen between the 'burst' regions. i see aboue 800Mb/s (good put) between the hosts which i understand to get typical for ipoib. there dont appear to be any link errors either. so why the long pauses? ftp://ftp.cmf.nrl.navy.mil/pub/chas/ipoib.jpg From xma at us.ibm.com Tue Mar 6 12:07:28 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 6 Mar 2007 12:07:28 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: <20070303223702.GN25553@obsidianresearch.com> Message-ID: Jason Gunthorpe wrote on 03/03/2007 02:37:02 PM: > On Thu, Mar 01, 2007 at 05:04:43PM -0800, Shirley Ma wrote: > > > IPv6 ND doesn't prevent the duplicate IPv6 link-local address in > the network. > > It only saves a warning in /var/log/messages to indicate that thisaddress is > > duplicated in the network. ND can detect this when sending packets. > > That isn't quite accurate, if IPv6 DAD detects a duplicate then the > address never leaves the tentative state and won't be usable. Without multicast join succesffully, the IPv6 address can' work, with the multicast join successfully later, the DAD detects will make the address usable. So the solution here doesn't break the IPv6. > There is also a similar problem here with the timing of IPv6 router > solicitation. > > Maybe the solution here is that IPv6 core should be waiting for the > multicast groups it requests to register before doing DAD/RS? > > Jason No, it doesn't work. For example IPv6 sendonly doesn't need to join any IP multicast group, and IP multicast join doesn't return error. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Tue Mar 6 12:16:03 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 6 Mar 2007 12:16:03 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: Message-ID: I believe even IPv6 with ethernet, the interface will be UP and RUNNING even they have a duplicated IPv6 address so IPv4 can work. I don't know why we do thing differently here. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Mar 6 12:20:55 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Mar 2007 12:20:55 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: (Shirley Ma's message of "Tue, 6 Mar 2007 12:16:03 -0800") References: Message-ID: > I believe even IPv6 with ethernet, the interface will be UP and RUNNING > even they have a duplicated IPv6 address so IPv4 can work. I don't know why > we do thing differently here. That's not the point. The point is that if we bring the interface up before the multicast groups are joined, then IPv6 DAD has a chance of incorrectly not detecting duplicate addresses (think about it -- if IPv6 autoconf/DAD starts before all the multicast groups are ready, then the IPv6 stack will generate ND and router solicitation messages, but they will just be queued up pending multicast join completion, and if that doesn't happen before the timeouts occur, then the IPv6 stack will incorrectly conclude that a duplicate address doesn't exist, even if it does) The real question is what tradeoff do we want to make for broken fabrics where some multicast groups can never be joined. Is it worth the (small) risk of breaking IPv6 autoconf on good fabrics in order to behave more gracefully on broken fabrics? - R. From sean.hefty at intel.com Tue Mar 6 12:24:01 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 6 Mar 2007 12:24:01 -0800 Subject: [ofa-general] [GIT PULL] 2.6.21-rc3 please pull rdma-dev.git In-Reply-To: <000001c75f65$00935660$a437170a@amr.corp.intel.com> Message-ID: <000601c7602d$60f45a90$8698070a@amr.corp.intel.com> Roland, I added another minor fix to my git tree: git://git.openfabrics.org/~shefty/rdma-dev.git for-roland rdma_ucm: avoid sending reject if backlog is full I don't have any other pending changes for 2.6.21, but we're continuing scalability testing. This fix is not critical, but does help us scale on larger clusters (256 nodes). - Sean --- rdma_ucm: avoid sending reject if backlog is full Change the returned error code to ENOMEM if the connection event backlog is full. This prevents the ib_cm from issuing a reject on the connection, which can allow retries to succeed. Signed-off-by: Sean Hefty diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index b516b93..c859134 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -266,7 +266,7 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id, mutex_lock(&ctx->file->mut); if (event->event == RDMA_CM_EVENT_CONNECT_REQUEST) { if (!ctx->backlog) { - ret = -EDQUOT; + ret = -ENOMEM; kfree(uevent); goto out; } From rdreier at cisco.com Tue Mar 6 12:38:58 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Mar 2007 12:38:58 -0800 Subject: [ofa-general] [GIT PULL] 2.6.21-rc3 please pull rdma-dev.git In-Reply-To: <000601c7602d$60f45a90$8698070a@amr.corp.intel.com> (Sean Hefty's message of "Tue, 6 Mar 2007 12:24:01 -0800") References: <000601c7602d$60f45a90$8698070a@amr.corp.intel.com> Message-ID: > Roland, I added another minor fix to my git tree: Really? git ls-remote git://git.openfabrics.org/~shefty/rdma-dev.git shows the head of for-roland is 53ecd8bead5d6d9a28f26447f309b636aa361c82 refs/heads/for-roland and git show --pretty=short -M 53ecd8b commit 53ecd8bead5d6d9a28f26447f309b636aa361c82 Author: Sean Hefty rdma_cm: initialize rdma_bind_list in cma_alloc_any_port so I don't see the rdma_ucm commit there at all. From rdreier at cisco.com Tue Mar 6 12:42:03 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Mar 2007 12:42:03 -0800 Subject: [ofa-general] Re: [GIT PULL] 2.6.21-rc3 please pull rdma-dev.git In-Reply-To: <000001c75f65$00935660$a437170a@amr.corp.intel.com> (Sean Hefty's message of "Mon, 5 Mar 2007 12:29:40 -0800") References: <000001c75f65$00935660$a437170a@amr.corp.intel.com> Message-ID: rdma_cm: initialize rdma_bind_list in cma_alloc_any_port I applied this one at least... From xma at us.ibm.com Tue Mar 6 12:52:27 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 6 Mar 2007 12:52:27 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: Message-ID: Roland Dreier wrote on 03/06/2007 12:20:55 PM: > > I believe even IPv6 with ethernet, the interface will be UP and RUNNING > > even they have a duplicated IPv6 address so IPv4 can work. I don't know why > > we do thing differently here. > > That's not the point. The point is that if we bring the interface up > before the multicast groups are joined, then IPv6 DAD has a chance of > incorrectly not detecting duplicate addresses (think about it -- if > IPv6 autoconf/DAD starts before all the multicast groups are ready, > then the IPv6 stack will generate ND and router solicitation messages, > but they will just be queued up pending multicast join completion, and > if that doesn't happen before the timeouts occur, then the IPv6 stack > will incorrectly conclude that a duplicate address doesn't exist, even > if it does) The IPv6 stack will generate ND and router solicitation messages when sending packet. The duplicated address can be detected anytime. Am I right? So if the multicast join completion later, the duplication address will be detect later. I thought none of the packets can be sent out before the interface is up and running. The link carrier is ON, doesn't mean packets can be sent, it only means the carrier is ready to send data. I looked at the code again, and found that the interface is UP in the beginning of the ipoib code. We could move the UP in the end of operation. > The real question is what tradeoff do we want to make for broken > fabrics where some multicast groups can never be joined. Is it worth > the (small) risk of breaking IPv6 autoconf on good fabrics in order to > behave more gracefully on broken fabrics? > > - R. My point is IPoIB should behave as other networking device drivers. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgunthorpe at obsidianresearch.com Tue Mar 6 14:17:40 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 6 Mar 2007 15:17:40 -0700 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: References: Message-ID: <20070306221740.GL11411@obsidianresearch.com> On Tue, Mar 06, 2007 at 12:52:27PM -0800, Shirley Ma wrote: > The IPv6 stack will generate ND and router solicitation messages > when sending packet. The duplicated address can be detected > anytime. Am I right? So if the multicast join completion later, the > duplication address will be detect later. No, this isn't right. When a new address is assigned to the interface it starts out in the tentative state. This triggers DAD which will move the address out of the tentative state and makes it usable. The DAD procedure is on a timer since it is probing with ND packets to find a duplicate. I don't think DAD will activate again once the initial probing is done. RS is in two parts. First when the interface first comes up the kernel sends out RS packets to get a router to respond faster. This is also on a timer. Second, the routers send out RS packets on their own from time to time and the kernel will act on them. All of this needs the right multicast groups to be active before the sequences are started. DAD needs full membership in the solicited node multicast group and RS needs memberships in the all-hosts and all-routers groups. > My point is IPoIB should behave as other networking device drivers. IPoIB is not like other networking fabrics. IB has multicast that requires synchronization with the SM. Ethernet has no such thing. Jason From sean.hefty at intel.com Tue Mar 6 14:38:33 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 6 Mar 2007 14:38:33 -0800 Subject: [ofa-general] [GIT PULL] 2.6.21-rc3 please pull rdma-dev.git In-Reply-To: Message-ID: <000701c76040$2bf21f90$8698070a@amr.corp.intel.com> >Really? No - not really. I skipped the git push step... It should be there now. - Sean From sweitzen at cisco.com Tue Mar 6 15:52:31 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 6 Mar 2007 15:52:31 -0800 Subject: [ofa-general] OFED 1.2 beta blocking bugs Message-ID: I've been testing OFED-1.2-20070306-0807 today, and here's a current list of bugs I'd like fixed for OFED 1.2 beta. bug_id assigned_to short_desc 397 jsquyres at cisco.com OFED 1.2 alpha1 Open MPI "InfiniBand retry count" errors 381 rowland at cse.ohio-state.edu OFED 1.2 alpha1 MVAPICH2 won't compile on RHEL4 IA64 with Intel compiler 370 pasha at mellanox.co.il OFED 1.2 alpha1 MVAPICH Intel compiler support broken 379 vlad at mellanox.co.il can't compile OFED 1.2 on RHEL4/SLES10 ppc64 416 vlad at mellanox.co.il OFED-1.2-20070306-0807 won't compile on SLES10 x86_64 375 jsquyres at cisco.com Open MPI PGI C++ failure at runtime 400 sean.hefty at intel.com OFED 1.2 alpha1 IPoIB HA failover gets QP warnings, IPoIB CM stops working 395 vlad at mellanox.co.il uDAPL fails (with Intel MPI or HP MPI) on SLES 10 i686 Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at lists.openfabrics.org Tue Mar 6 16:14:05 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 6 Mar 2007 16:14:05 -0800 (PST) Subject: [ofa-general] [Bug 417] New: can't unload drivers on SLES10 x86_64 Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=417 Summary: can't unload drivers on SLES10 x86_64 Product: OpenFabrics Linux Version: 1.2 Platform: X86-64 OS/Version: SLES 10 Status: NEW Severity: major Priority: P2 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: sweitzen at cisco.com CC: tziporet at mellanox.co.il OFED-1.2-20070304-0600 Can't unload drivers. Not sure who should look into it. svbu-qa1850-4:~ # /etc/init.d/openibd restart Unloading ib_cm [FAILED] ERROR: Module ib_cm is in use by ib_ipoib Can't unload ib_ipoib manually. svbu-qa1850-4:~ # lsmod | grep ipoib ib_ipoib 83672 0 ib_cm 50344 1 ib_ipoib ib_sa 40672 2 ib_ipoib,ib_cm ib_core 76304 6 ib_ipoib,ib_umad,ib_mthca,ib_cm,ib_sa,ib_mad ipv6 329728 29 ib_ipoib svbu-qa1850-4:~ # rmmod ib_ipoib svbu-qa1850-4:~ # lsmod | grep ipoib ib_ipoib 83672 0 ib_cm 50344 1 ib_ipoib ib_sa 40672 2 ib_ipoib,ib_cm ib_core 76304 6 ib_ipoib,ib_umad,ib_mthca,ib_cm,ib_sa,ib_mad ipv6 329728 29 ib_ipoib -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Tue Mar 6 16:14:59 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 6 Mar 2007 16:14:59 -0800 (PST) Subject: [ofa-general] [Bug 417] can't unload OFED 1.2 drivers on SLES10 x86_64 In-Reply-To: Message-ID: <20070307001459.449ADE60815@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=417 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|can't unload drivers on |can't unload OFED 1.2 |SLES10 x86_64 |drivers on SLES10 x86_64 -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From xma at us.ibm.com Tue Mar 6 16:22:45 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 6 Mar 2007 16:22:45 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: <20070306221740.GL11411@obsidianresearch.com> Message-ID: Jason Gunthorpe wrote on 03/06/2007 02:17:40 PM: > On Tue, Mar 06, 2007 at 12:52:27PM -0800, Shirley Ma wrote: > > > The IPv6 stack will generate ND and router solicitation messages > > when sending packet. The duplicated address can be detected > > anytime. Am I right? So if the multicast join completion later, the > > duplication address will be detect later. > > No, this isn't right. > > When a new address is assigned to the interface it starts out in the > tentative state. This triggers DAD which will move the address out of > the tentative state and makes it usable. The DAD procedure is on a > timer since it is probing with ND packets to find a duplicate. I don't > think DAD will activate again once the initial probing is done. > > RS is in two parts. First when the interface first comes up the kernel > sends out RS packets to get a router to respond faster. This is also > on a timer. Second, the routers send out RS packets on their own from > time to time and the kernel will act on them. > > All of this needs the right multicast groups to be active before the > sequences are started. DAD needs full membership in the solicited node > multicast group and RS needs memberships in the all-hosts and > all-routers groups. > > My point is IPoIB should behave as other networking device drivers. > > IPoIB is not like other networking fabrics. IB has multicast that > requires synchronization with the SM. Ethernet has no such thing. > > Jason So we could have same IPv6 addresses even without IPoIB if the NS doesn't respond on time for any reason, right? Then DAD is broken in current code already. In IPoIB case, we allow duplicated IPv6 addresses after the interface up and running but not before if any reason the multicast join successfully much later, probably after several tries? That doesn't make any sense to me. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Tue Mar 6 16:29:49 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 6 Mar 2007 16:29:49 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: <20070306221740.GL11411@obsidianresearch.com> Message-ID: BTW, I have tested IPv4 and IPv6 DAD, duplicate address doesn't prevent the interface from UP and RUNNING for ethernet. But this is not the recent kernel. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgunthorpe at obsidianresearch.com Tue Mar 6 17:02:08 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 6 Mar 2007 18:02:08 -0700 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: References: <20070306221740.GL11411@obsidianresearch.com> Message-ID: <20070307010208.GO11411@obsidianresearch.com> On Tue, Mar 06, 2007 at 04:22:45PM -0800, Shirley Ma wrote: > So we could have same IPv6 addresses even without IPoIB if the NS doesn't > respond on time for any reason, right? Right. An example would be if you connect two ethernet networks together that had duplicate addresses. The startup DAD mechanism does not protect from that. > Then DAD is broken in current code already. In IPoIB case, we allow > duplicated IPv6 addresses after the interface up and running but not > before if any reason the multicast join successfully much later, > probably after several tries? That doesn't make any sense to me. IPoIB has no control over DAD, it is all done in the core IPv6. It also has no impact on the up/running state: Ex: $ ping6 -I eth0 fe80::c2 PING fe80::c2(fe80::c2) from fe80::20e:cff:fe71:2858 eth0: 56 data bytes 64 bytes from fe80::c2: icmp_seq=1 ttl=64 time=1.28 ms $ ip addr add fe80::c2/64 dev eth0; ip addr 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:0e:0c:71:28:58 brd ff:ff:ff:ff:ff:ff inet6 fe80::c2/64 scope link tentative valid_lft forever preferred_lft forever inet6 fe80::20e:cff:fe71:2858/64 scope link valid_lft forever preferred_lft forever $ dmesg eth0: duplicate address detected! $ ip addr 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 inet6 fe80::c2/64 scope link tentative valid_lft forever preferred_lft forever Note that the *address* still has the tentative flag (this disables the address) and the interface is still up/running. If you look on tcpdump you see that the instant I did the 'ip addr add' the kernel spat out: 17:49:01.034403 fe80::20e:cff:fe71:2858 > ff02::16: HBH icmp6: type-#143 [hlim 1] 0x0000: 6000 0000 0038 0001 fe80 0000 0000 0000 `....8.......... 0x0010: 020e 0cff fe71 2858 ff02 0000 0000 0000 .....q(X........ 0x0020: 0000 0000 0000 0016 3a00 0502 0000 0100 ........:....... 0x0030: 8f00 0f8d 0000 0002 0400 0000 ff02 0000 ................ 0x0040: 0000 0000 0000 0001 ff00 00c2 0400 0000 ................ 0x0050: ff02 .. 17:49:01.642183 :: > ff02::1:ff00:c2: icmp6: neighbor sol: who has fe80::c2 0x0000: 6000 0000 0018 3aff 0000 0000 0000 0000 `.....:......... 0x0010: 0000 0000 0000 0000 ff02 0000 0000 0000 ................ 0x0020: 0000 0001 ff00 00c2 8700 7aa3 0000 0000 ..........z..... 0x0030: fe80 0000 0000 0000 0000 0000 0000 00c2 ................ The first is IGMPv6 to join the new solicited node multicast group and the last is the DAD probe. So, yes, currently, IPoIB is broken in that DAD for new addresses is not synchronized to the SM join. But, DAD for startup addresses is OK due to the trick that is played with carrier. Your patch breaks both equally :> Regards, Jason From xma at us.ibm.com Tue Mar 6 17:35:54 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 6 Mar 2007 17:35:54 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: <20070307010208.GO11411@obsidianresearch.com> Message-ID: Jason, >So, yes, currently, IPoIB is broken in that DAD for new addresses is >not synchronized to the SM join. But, DAD for startup addresses is >OK due to the trick that is played with carrier. Your patch breaks >both equally :> My patch doesn't break DAD that's I am tring to explain. The patch means the IPoIB link is ready to send data. That's how the net carrier defined. We shouldn't prevent the interface from UP and RUNNING because of there is a extremly small probability of duplicated link local address for IPv6. Especially when we have an IB spec for switch to support 1 to 16K MLIDs and IPv6 is a module enabled in kernel by default. For example, if the switch only supports 100 MLIDs, then we limit the fabric to 25 nodes cluster with 4 links on each. The IPoIB DAD should be fixed, like return something back to IPv6 and not set the flag to permenent. It has nothing with my patch. This patch just has a window to explore this problem. For IPv4 and IPv6, the carrier is ON has nothing to do with the IP address. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Tue Mar 6 17:44:11 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 6 Mar 2007 17:44:11 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: Message-ID: Jason, The whole purpose of this patch is trying to address network accessbility when MLIDs have limitation in the fabrics. We have a customer hit this problem in a large cluster. Basically the IPv4 doesn't work at all (since the interface can't up and running becacuse of IPv6 soliciated nodes address IB mulitcast join failure) when MLIDs exceeds a certain number because of the switch limitation with IPv6 module preloaded. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Mar 6 20:03:46 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 06 Mar 2007 20:03:46 -0800 Subject: [ofa-general] Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy In-Reply-To: (Roland Dreier's message of "Tue, 06 Mar 2007 12:04:31 -0800") References: <20070303215150.GB17950@mellanox.co.il> <20070306195858.GB21620@mellanox.co.il> Message-ID: What do you think of something like this, plus merging the async event and command interface EQs? diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c index efd79ef..6e247e2 100644 --- a/drivers/infiniband/hw/mthca/mthca_cq.c +++ b/drivers/infiniband/hw/mthca/mthca_cq.c @@ -279,15 +279,13 @@ static inline int is_recv_cqe(struct mthca_cqe *cqe) return !(cqe->is_send & 0x80); } -void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, - struct mthca_srq *srq) +void __mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, + struct mthca_srq *srq) { struct mthca_cqe *cqe; u32 prod_index; int nfreed = 0; - spin_lock_irq(&cq->lock); - /* * First we need to find the current producer index, so we * know where to start cleaning from. It doesn't matter if HW @@ -325,8 +323,14 @@ void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, cq->cons_index += nfreed; update_cons_index(dev, cq, nfreed); } +} - spin_unlock_irq(&cq->lock); +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, + struct mthca_srq *srq) +{ + spin_lock_irq(cq->lock); + __mthca_cq_clean(dev, cq, qpn, srq); + spin_unlock_irq(cq->lock); } void mthca_cq_resize_copy_cqes(struct mthca_cq *cq) diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h index b7e42ef..a786f56 100644 --- a/drivers/infiniband/hw/mthca/mthca_dev.h +++ b/drivers/infiniband/hw/mthca/mthca_dev.h @@ -505,6 +505,8 @@ void mthca_free_cq(struct mthca_dev *dev, void mthca_cq_completion(struct mthca_dev *dev, u32 cqn); void mthca_cq_event(struct mthca_dev *dev, u32 cqn, enum ib_event_type event_type); +void __mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, + struct mthca_srq *srq); void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, struct mthca_srq *srq); void mthca_cq_resize_copy_cqes(struct mthca_cq *cq); diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 1c6b63a..560b99a 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -1390,6 +1390,10 @@ void mthca_free_qp(struct mthca_dev *dev, struct mthca_cq *send_cq; struct mthca_cq *recv_cq; + if (qp->state != IB_QPS_RESET) + mthca_MODIFY_QP(dev, qp->state, IB_QPS_RESET, qp->qpn, 0, + NULL, 0, &status); + send_cq = to_mcq(qp->ibqp.send_cq); recv_cq = to_mcq(qp->ibqp.recv_cq); @@ -1403,28 +1407,27 @@ void mthca_free_qp(struct mthca_dev *dev, mthca_array_clear(&dev->qp_table.qp, qp->qpn & (dev->limits.num_qps - 1)); --qp->refcount; + + if (!qp->ibqp.uobject) { + __mthca_cq_clean(dev, send_cq, qp->qpn, + qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); + if (send_cq != recv_cq) + __mthca_cq_clean(dev, recv_cq, qp->qpn, + qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); + } + spin_unlock(&dev->qp_table.lock); mthca_unlock_cqs(send_cq, recv_cq); wait_event(qp->wait, !get_qp_refcount(dev, qp)); - if (qp->state != IB_QPS_RESET) - mthca_MODIFY_QP(dev, qp->state, IB_QPS_RESET, qp->qpn, 0, - NULL, 0, &status); - /* * If this is a userspace QP, the buffers, MR, CQs and so on * will be cleaned up in userspace, so all we have to do is * unref the mem-free tables and free the QPN in our table. */ if (!qp->ibqp.uobject) { - mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, - qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); - if (qp->ibqp.send_cq != qp->ibqp.recv_cq) - mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, - qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); - mthca_free_memfree(dev, qp); mthca_free_wqe_buf(dev, qp); } From xma at us.ibm.com Tue Mar 6 20:00:23 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 6 Mar 2007 20:00:23 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: <20070307010208.GO11411@obsidianresearch.com> Message-ID: >So, yes, currently, IPoIB is broken in that DAD for new addresses is >not synchronized to the SM join. But, DAD for startup addresses is >OK due to the trick that is played with carrier. Your patch breaks >both equally :> As you descripted, DAD is not some mechanism to prevent duplicated address. The default DAD timer is 1HZ. Even the solicited node IB multicast join successfully before the carrier on, that doesn't mean we won't get any potentical duplicated link local addresses at all. For example, the NA returns back after 1HZ. Comparing to the IPoIB accessibility(let IPv4 working) with playing trick to carrier on to avoid IPv6 link local DAD in a small possibility, this patch is a better choice for switches with limited MGCs resouce today in a large cluster. Then IPoIB will have the same behavior as IPv4 and IPv6 over ethernet: interface will be up and running no matter whether there is any duplicated address or not. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgunthorpe at obsidianresearch.com Tue Mar 6 20:26:53 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 6 Mar 2007 21:26:53 -0700 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: References: <20070307010208.GO11411@obsidianresearch.com> Message-ID: <20070307042653.GA17273@obsidianresearch.com> On Tue, Mar 06, 2007 at 08:00:23PM -0800, Shirley Ma wrote: > Comparing to the IPoIB accessibility(let IPv4 working) with playing trick to > carrier on to avoid IPv6 link local DAD in a small possibility, this patch is a > better choice for switches with limited MGCs resouce today in a large cluster. > Then IPoIB will have the same behavior as IPv4 and IPv6 over ethernet: > interface will be up and running no matter whether there is any duplicated > address or not. Don't get me wrong, I think your patch is ultimately the right way to go, but it needs another part to address the problem IPv6 has - or at least a plan on how to address it. I don't think ignoring the synchronizing problem is the way to go. Also, in my view, the problem you are seeing with MLID exhaustion is purely a SM problem and has nothing to do with IPoIB and switch limits. SMs need to treat MLIDs as a precious resource and share them agressively. Especially IPv6 solicited node multicast addresses. How about this as a nice simple alternative.. If IPoIB is asked to send to a multicast address that is not yet joined (join is pending to the SA, or whatever) then it uses the broadcast MLID for the packet. This is similar to how ethernet works and would let DAD and RS work correctly in all cases. Jason From mst at mellanox.co.il Tue Mar 6 20:42:36 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Mar 2007 06:42:36 +0200 Subject: [ofa-general] Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy In-Reply-To: References: <20070303215150.GB17950@mellanox.co.il> <20070306195858.GB21620@mellanox.co.il> Message-ID: <20070307044159.GB22053@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy > > > Yes but here we also must make sure completion events and async events are > > flushed out: once QP is in reset no events should be generated. > > Completion events are fine -- at worst the consumer gets a spurious > event but doesn't find any CQEs. This would be a spec violation - once no active QPs are attached, the consumer should be able to assume he will get no new events. But I agree we can regard this as a separate issue, since it does not affect IPoIB/CM. So for now I'm going for your proposed change: merge all QP associated events and the CMD IFC events to a single EQ. -- MST From sweitzen at cisco.com Tue Mar 6 22:10:40 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 6 Mar 2007 22:10:40 -0800 Subject: [ofa-general] What is all the new stuff in srptools? Message-ID: Ishai, I see several new files in /usr/local/ofed/sbin in OFED 1.2 pre-beta, what are they for? /usr/local/ofed/sbin/execute_multipath_or_kpartx.sh /usr/local/ofed/sbin/srp_dm_multipath_daemon /usr/local/ofed/sbin/srp_dnotify /usr/local/ofed/sbin/srp_post_multipath /usr/local/ofed/sbin/start_srp_dnotify.sh Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From kliteyn at dev.mellanox.co.il Tue Mar 6 23:33:06 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 07 Mar 2007 09:33:06 +0200 Subject: [ofa-general] [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c In-Reply-To: <1173205862.4546.363189.camel@hal.voltaire.com> References: <45ED90C4.60204@dev.mellanox.co.il> <20070306175114.GJ11411@obsidianresearch.com> <1173205862.4546.363189.camel@hal.voltaire.com> Message-ID: <45EE6AB2.5020801@dev.mellanox.co.il> Hal Rosenstock wrote: > On Tue, 2007-03-06 at 12:51, Jason Gunthorpe wrote: >> On Tue, Mar 06, 2007 at 06:03:16PM +0200, Yevgeny Kliteynik wrote: >>> Hi Hal. >>> >>> Converting the the C++ code to C. >> This is actually valid C99 code. This is the method that ISO >> standardized in C99 to do dynamic stack allocations (alloca is not >> an ISO C function). >> >> Since it is now 2007 is there really still a desire to not use C99 >> features? > > My guess is that the Windows compiler didn't like this though :-( Correct :( BTW, there's still a C++ element there - switch_bitmap is declared after using OSM_LOG_ENTER. --Yevgeny > -- Hal > >> Jason > From bugzilla-daemon at lists.openfabrics.org Tue Mar 6 23:58:39 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 6 Mar 2007 23:58:39 -0800 (PST) Subject: [ofa-general] [Bug 420] New: PKey table reordering caused by SM failover stops ipoib traffic Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=420 Summary: PKey table reordering caused by SM failover stops ipoib traffic Product: OpenFabrics Linux Version: 1.2alpha1 Platform: All OS/Version: All Status: NEW Severity: blocker Priority: P3 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: monil at voltaire.com CC: mst at mellanox.co.il SM failover/handover can cause a specific pkey to be placed in a differend pkey index. IPoIB traffic can't be resumed on that specific PKey as the ipoib driver is not aware of the index change. The resolution of the problem is under discussion -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From monil at voltaire.com Wed Mar 7 00:00:25 2007 From: monil at voltaire.com (Moni Levy) Date: Wed, 7 Mar 2007 10:00:25 +0200 Subject: [ofa-general] Re: [PATCHv3] IB/ipoib: Fix ipoib handling for pkey reordering In-Reply-To: <6a122cc00703060541q44066e02u751eedf1b6b8d392@mail.gmail.com> References: <45E6E7A0.7070902@voltaire.com> <20070301145644.GL14282@mellanox.co.il> <45EAB607.4010904@gmail.com> <6a122cc00703060541q44066e02u751eedf1b6b8d392@mail.gmail.com> Message-ID: <6a122cc00703070000t336bc4e7je9581f13b81bafee@mail.gmail.com> On 3/6/07, Moni Levy wrote: > Roland > On 3/4/07, Moni Levy wrote: > > On 3/1/07, Michael S. Tsirkin wrote: > > > > SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence > > > > does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP. > > > > > > Please limit line length to 80 chars or so. > > > > Do you want me to change anything else? We really need that for OFED 1.2. > > Here is an updated patch. > > Did you have a chance to look at the last version of the patch ? I > think that it's now acceptable for Michael as I got no additional > remarks, again, we really need a resolution to that issue. Bug 420 was opened to track that issue. -- Moni > > Thanks, > Moni > > > > > This issue was found during partitioning & SM fail over testing. The fix was > > tested over the weekend with pkey reshuffling, removal and addition every few > > seconds concurrent with OFED restart. > > > > Changes from v1: > > * added flush flag to ipoib_ib_dev_stop(), ipoib_ib_dev_down() alike > > * fixed a bug in device extraction from the work struct > > * removed some warnings in case they are caused due to missing PKEY as > > this seems like a valid flow now. > > > > Changes from v2: > > * less/fixed debug prints > > * flush_restart_qp stuff renamed to just restart_qp > > * the patch now depends on Roland's "IPoIB: Only handle async > > events for one port" > > > > SM reconfiguration or failover possibly causes a shuffling of the values in > > the port pkey table. The current implementation only queries for the index of > > the pkey once, when it creates the device QP and after that moves it into > > working state, and hence does not address this scenario. Fix this by using the > > PKEY_CHANGE event as a trigger to reconfigure the device QP. > > > > Signed-off-by: Moni Levy > > --- > > drivers/infiniband/ulp/ipoib/ipoib.h | 4 + > > drivers/infiniband/ulp/ipoib/ipoib_ib.c | 51 ++++++++++++++++++++----- > > drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +- > > drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 11 ++--- > > drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 7 ++- > > 5 files changed, 59 insertions(+), 19 deletions(-) > > > > Index: b/drivers/infiniband/ulp/ipoib/ipoib.h > > =================================================================== > > --- a/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-01 14:11:43.698307017 +0200 > > +++ b/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-01 14:43:04.624119588 +0200 > > @@ -205,6 +205,7 @@ struct ipoib_dev_priv { > > struct delayed_work pkey_task; > > struct delayed_work mcast_task; > > struct work_struct flush_task; > > + struct work_struct restart_qp_task; > > struct work_struct restart_task; > > struct delayed_work ah_reap_task; > > > > @@ -334,12 +335,13 @@ struct ipoib_dev_priv *ipoib_intf_alloc( > > > > int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port); > > void ipoib_ib_dev_flush(struct work_struct *work); > > +void ipoib_ib_dev_restart_qp(struct work_struct *work); > > void ipoib_ib_dev_cleanup(struct net_device *dev); > > > > int ipoib_ib_dev_open(struct net_device *dev); > > int ipoib_ib_dev_up(struct net_device *dev); > > int ipoib_ib_dev_down(struct net_device *dev, int flush); > > -int ipoib_ib_dev_stop(struct net_device *dev); > > +int ipoib_ib_dev_stop(struct net_device *dev, int flush); > > > > int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port); > > void ipoib_dev_cleanup(struct net_device *dev); > > Index: b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > > =================================================================== > > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-01 14:11:43.713304355 +0200 > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-01 16:14:17.003881103 +0200 > > @@ -415,21 +415,22 @@ int ipoib_ib_dev_open(struct net_device > > > > ret = ipoib_init_qp(dev); > > if (ret) { > > - ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); > > + if (ret != -ENOENT) > > + ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); > > return -1; > > } > > > > ret = ipoib_ib_post_receives(dev); > > if (ret) { > > ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); > > - ipoib_ib_dev_stop(dev); > > + ipoib_ib_dev_stop(dev, 1); > > return -1; > > } > > > > ret = ipoib_cm_dev_open(dev); > > if (ret) { > > ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); > > - ipoib_ib_dev_stop(dev); > > + ipoib_ib_dev_stop(dev, 1); > > return -1; > > } > > > > @@ -508,7 +509,7 @@ static int recvs_pending(struct net_devi > > return pending; > > } > > > > -int ipoib_ib_dev_stop(struct net_device *dev) > > +int ipoib_ib_dev_stop(struct net_device *dev, int flush) > > { > > struct ipoib_dev_priv *priv = netdev_priv(dev); > > struct ib_qp_attr qp_attr; > > @@ -581,7 +582,8 @@ timeout: > > /* Wait for all AHs to be reaped */ > > set_bit(IPOIB_STOP_REAPER, &priv->flags); > > cancel_delayed_work(&priv->ah_reap_task); > > - flush_workqueue(ipoib_workqueue); > > + if (flush) > > + flush_workqueue(ipoib_workqueue); > > > > begin = jiffies; > > > > @@ -622,13 +624,17 @@ int ipoib_ib_dev_init(struct net_device > > return 0; > > } > > > > -void ipoib_ib_dev_flush(struct work_struct *work) > > +static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, int restart_qp) > > { > > - struct ipoib_dev_priv *cpriv, *priv = > > - container_of(work, struct ipoib_dev_priv, flush_task); > > + struct ipoib_dev_priv *cpriv; > > struct net_device *dev = priv->dev; > > > > - if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) ) { > > + /* > > + * ipoib_ib_dev_stop() below may not find the PKey and leave the > > + * IPOIB_FLAG_INITIALIZED flag off so flush in that case with restart_qp > > + * flag on is Ok. > > + */ > > + if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) && !restart_qp) { > > ipoib_dbg(priv, "Not flushing - IPOIB_FLAG_INITIALIZED not set.\n"); > > return; > > } > > @@ -642,6 +648,13 @@ void ipoib_ib_dev_flush(struct work_stru > > > > ipoib_ib_dev_down(dev, 0); > > > > + if (restart_qp) { > > + ipoib_dbg(priv, "restarting the device QP\n"); > > + if (test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) ) > > + ipoib_ib_dev_stop(dev, 0); > > + ipoib_ib_dev_open(dev); > > + } > > + > > /* > > * The device could have been brought down between the start and when > > * we get here, don't bring it back up if it's not configured up > > @@ -655,11 +668,29 @@ void ipoib_ib_dev_flush(struct work_stru > > > > /* Flush any child interfaces too */ > > list_for_each_entry(cpriv, &priv->child_intfs, list) > > - ipoib_ib_dev_flush(&cpriv->flush_task); > > + __ipoib_ib_dev_flush(cpriv, restart_qp); > > > > mutex_unlock(&priv->vlan_mutex); > > } > > > > +void ipoib_ib_dev_flush(struct work_struct *work) > > +{ > > + struct ipoib_dev_priv *priv = > > + container_of(work, struct ipoib_dev_priv, flush_task); > > + /* We only restart the QP in case of pkey change event */ > > + ipoib_dbg(priv, "Flushing %s\n", priv->dev->name); > > + __ipoib_ib_dev_flush(priv, 0); > > +} > > + > > +void ipoib_ib_dev_restart_qp(struct work_struct *work) > > +{ > > + struct ipoib_dev_priv *priv = > > + container_of(work, struct ipoib_dev_priv, restart_qp_task); > > + /* We only restart the QP in case of pkey change event */ > > + ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", priv->dev->name); > > + __ipoib_ib_dev_flush(priv, 1); > > +} > > + > > void ipoib_ib_dev_cleanup(struct net_device *dev) > > { > > struct ipoib_dev_priv *priv = netdev_priv(dev); > > Index: b/drivers/infiniband/ulp/ipoib/ipoib_main.c > > =================================================================== > > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-01 14:11:43.729301517 +0200 > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-01 14:43:04.666112093 +0200 > > @@ -107,7 +107,7 @@ int ipoib_open(struct net_device *dev) > > return -EINVAL; > > > > if (ipoib_ib_dev_up(dev)) { > > - ipoib_ib_dev_stop(dev); > > + ipoib_ib_dev_stop(dev, 1); > > return -EINVAL; > > } > > > > @@ -152,7 +152,7 @@ static int ipoib_stop(struct net_device > > flush_workqueue(ipoib_workqueue); > > > > ipoib_ib_dev_down(dev, 1); > > - ipoib_ib_dev_stop(dev); > > + ipoib_ib_dev_stop(dev, 1); > > > > if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { > > struct ipoib_dev_priv *cpriv; > > @@ -993,6 +993,7 @@ static void ipoib_setup(struct net_devic > > INIT_DELAYED_WORK(&priv->pkey_task, ipoib_pkey_poll); > > INIT_DELAYED_WORK(&priv->mcast_task, ipoib_mcast_join_task); > > INIT_WORK(&priv->flush_task, ipoib_ib_dev_flush); > > + INIT_WORK(&priv->restart_qp_task, ipoib_ib_dev_restart_qp); > > INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); > > INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); > > } > > Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > > =================================================================== > > --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-01 14:11:43.743299033 +0200 > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-01 16:21:43.128181147 +0200 > > @@ -232,9 +232,10 @@ static int ipoib_mcast_join_finish(struc > > ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), > > &mcast->mcmember.mgid); > > if (ret < 0) { > > - ipoib_warn(priv, "couldn't attach QP to multicast group " > > - IPOIB_GID_FMT "\n", > > - IPOIB_GID_ARG(mcast->mcmember.mgid)); > > + if (ret != -ENXIO) /* No pkey found */ > > + ipoib_warn(priv, "couldn't attach QP to multicast group " > > + IPOIB_GID_FMT "\n", > > + IPOIB_GID_ARG(mcast->mcmember.mgid)); > > > > clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags); > > return ret; > > @@ -312,7 +313,7 @@ ipoib_mcast_sendonly_join_complete(int s > > status = ipoib_mcast_join_finish(mcast, &multicast->rec); > > > > if (status) { > > - if (mcast->logcount++ < 20) > > + if (mcast->logcount++ < 20 && status != -ENXIO) > > ipoib_dbg_mcast(netdev_priv(dev), "multicast join failed for " > > IPOIB_GID_FMT ", status %d\n", > > IPOIB_GID_ARG(mcast->mcmember.mgid), status); > > @@ -416,7 +417,7 @@ static int ipoib_mcast_join_complete(int > > ", status %d\n", > > IPOIB_GID_ARG(mcast->mcmember.mgid), > > status); > > - } else { > > + } else if (status != -ENXIO) { > > ipoib_warn(priv, "multicast join failed for " > > IPOIB_GID_FMT ", status %d\n", > > IPOIB_GID_ARG(mcast->mcmember.mgid), > > Index: b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c > > =================================================================== > > --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-03-01 14:39:46.712444790 +0200 > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-03-01 16:12:55.069541201 +0200 > > @@ -52,8 +52,10 @@ int ipoib_mcast_attach(struct net_device > > if (ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) { > > clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); > > ret = -ENXIO; > > + ipoib_dbg(priv, "pkey %X not found\n", priv->pkey); > > goto out; > > } > > + ipoib_dbg(priv, "pkey %X found at index %d\n", priv->pkey, pkey_index); > > set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); > > > > /* set correct QKey for QP */ > > @@ -260,7 +262,6 @@ void ipoib_event(struct ib_event_handler > > container_of(handler, struct ipoib_dev_priv, event_handler); > > > > if ((record->event == IB_EVENT_PORT_ERR || > > - record->event == IB_EVENT_PKEY_CHANGE || > > record->event == IB_EVENT_PORT_ACTIVE || > > record->event == IB_EVENT_LID_CHANGE || > > record->event == IB_EVENT_SM_CHANGE || > > @@ -268,5 +269,9 @@ void ipoib_event(struct ib_event_handler > > record->element.port_num == priv->port) { > > ipoib_dbg(priv, "Port state change event\n"); > > queue_work(ipoib_workqueue, &priv->flush_task); > > + } else if (record->event == IB_EVENT_PKEY_CHANGE && > > + record->element.port_num == priv->port) { > > + ipoib_dbg(priv, "pkey change event on port:%d\n", priv->port); > > + queue_work(ipoib_workqueue, &priv->restart_qp_task); > > } > > } > > > > > From bugzilla-daemon at lists.openfabrics.org Wed Mar 7 00:00:45 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 7 Mar 2007 00:00:45 -0800 (PST) Subject: [ofa-general] [Bug 420] PKey table reordering caused by SM failover stops ipoib traffic In-Reply-To: Message-ID: <20070307080045.1AD67E60828@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=420 mst at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|blocker |critical ------- Comment #1 from mst at mellanox.co.il 2007-03-07 00:00 ------- I don't think its a beta blocker - restarting ipoib is a simple workaround. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Wed Mar 7 00:02:55 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 7 Mar 2007 00:02:55 -0800 (PST) Subject: [ofa-general] [Bug 420] PKey table reordering caused by SM failover stops ipoib traffic In-Reply-To: Message-ID: <20070307080256.07876E607F1@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=420 mst at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |monil at voltaire.com ------- Comment #2 from mst at mellanox.co.il 2007-03-07 00:02 ------- I reassign to Moni now - he's driving this. When all that's left is integrate a patch, reassign to me or vlad. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From kliteyn at dev.mellanox.co.il Wed Mar 7 00:22:01 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 07 Mar 2007 10:22:01 +0200 Subject: [ofa-general] Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c In-Reply-To: <20070306185613.GD16562@mellanox.co.il> References: <45ED90C4.60204@dev.mellanox.co.il> <20070306185613.GD16562@mellanox.co.il> Message-ID: <45EE7629.4070706@dev.mellanox.co.il> Michael S. Tsirkin wrote: >> Hi Hal. >> >> Converting the the C++ code to C. >> >> Please apply both to trunk and to 1.2 >> >> Thanks. >> >> Signed-off-by: Yevgeny Kliteynik > > NAK. > 1. I don't see any C++ here. > > 2. Why do we need this on ofed branch? > Only bugfixes should go there. What bug does it fix? There are 3 things in this patch: 1. int i -> uint16_t i 2. Moving variable declaration (switch_bitmap) to the beginning of the function (currently, it is declared after OSM_LOG_ENTER) 3. Changing C99 dynamically allocated array to the old style. First two can be categorized as bugs. The third one is for compiler on windows. Each of these elements breaks OSM compilation on Windows. If we don't include either of these, then OFED 1.2 OpenSM compilation on windows will be broken. -- Yevgeny >> --- >> osm/opensm/osm_ucast_lash.c | 23 ++++++++++++++++------- >> 1 files changed, 16 insertions(+), 7 deletions(-) >> >> diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c >> index 0afa43c..8c9172e 100644 >> --- a/osm/opensm/osm_ucast_lash.c >> +++ b/osm/opensm/osm_ucast_lash.c >> @@ -406,7 +406,7 @@ static int get_phys_connection(switch_t >> static void shortest_path(lash_t *p_lash, int ir) >> { >> switch_t **switches = p_lash->switches, *sw, *swi; >> - int i; >> + uint16_t i; >> cl_list_t bfsq; >> >> cl_list_construct(&bfsq); >> @@ -986,11 +986,18 @@ static int lash_core(lash_t *p_lash) >> int output_link2, i_next_switch2; >> int cycle_found2 = 0; >> int status = IB_SUCCESS; >> + int * switch_bitmap = NULL; >> >> OSM_LOG_ENTER(p_log, lash_core); >> >> - //Bitmap to check if we have processed this pair >> - int switch_bitmap[num_switches][num_switches]; >> + switch_bitmap = (int *)malloc(num_switches * num_switches * sizeof(int)); >> + if (!switch_bitmap) >> + { >> + osm_log(p_log, OSM_LOG_ERROR, >> + "lash_core: ERR 4D04: " >> + "Failed allocating switch_bitmap - out of memory\n"); >> + goto Exit; >> + } >> >> for(i=0; i> >> @@ -1006,7 +1013,7 @@ static int lash_core(lash_t *p_lash) >> >> for(j=0; j> for(k=0; k> - switch_bitmap[j][k] = 0; >> + switch_bitmap[j * num_switches + k] = 0; >> } >> switches[j]->used_channels = 0; >> switches[j]->q_state = UNQUEUED; >> @@ -1015,7 +1022,7 @@ static int lash_core(lash_t *p_lash) >> >> for(i=0; i> for(dest_switch=0; dest_switch> - if(dest_switch != i && switch_bitmap[i][dest_switch] == 0) { >> + if(dest_switch != i && switch_bitmap[i * num_switches + dest_switch] == 0) { >> v_lane = 0; >> stop = 0; >> while(v_lane < lanes_needed && stop == 0) { >> @@ -1078,8 +1085,8 @@ static int lash_core(lash_t *p_lash) >> p_lash->virtual_location[i][dest_switch][v_lane] = 1; >> p_lash->virtual_location[dest_switch][i][v_lane] = 1; >> >> - switch_bitmap[i][dest_switch] = 1; >> - switch_bitmap[dest_switch][i] = 1; >> + switch_bitmap[i * num_switches + dest_switch] = 1; >> + switch_bitmap[dest_switch * num_switches + i] = 1; >> } >> >> for(j=0; j> @@ -1115,6 +1122,8 @@ static int lash_core(lash_t *p_lash) >> "Lane requirements (%d) exceed available lanes (%d)\n", >> p_lash->vl_min, lanes_needed); >> Exit: >> + if (switch_bitmap) >> + free(switch_bitmap); >> OSM_LOG_EXIT(p_log); >> return status; >> } > From mst at mellanox.co.il Wed Mar 7 00:40:58 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Mar 2007 10:40:58 +0200 Subject: [ofa-general] Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c In-Reply-To: <45EE7629.4070706@dev.mellanox.co.il> References: <45ED90C4.60204@dev.mellanox.co.il> <20070306185613.GD16562@mellanox.co.il> <45EE7629.4070706@dev.mellanox.co.il> Message-ID: <20070307084058.GH31276@mellanox.co.il> > Quoting Yevgeny Kliteynik : > Subject: Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c > > > Michael S. Tsirkin wrote: > >> Hi Hal. > >> > >> Converting the the C++ code to C. > >> > >> Please apply both to trunk and to 1.2 > >> > >> Thanks. > >> > >> Signed-off-by: Yevgeny Kliteynik > > > > NAK. > > 1. I don't see any C++ here. > > > > 2. Why do we need this on ofed branch? > > Only bugfixes should go there. What bug does it fix? > > There are 3 things in this patch: > 1. int i -> uint16_t i > 2. Moving variable declaration (switch_bitmap) to the beginning > of the function (currently, it is declared after OSM_LOG_ENTER) > 3. Changing C99 dynamically allocated array to the old style. > > First two can be categorized as bugs. > > The third one is for compiler on windows. > > Each of these elements breaks OSM compilation on Windows. > > If we don't include either of these, then OFED 1.2 OpenSM compilation > on windows will be broken. Ultimately, whether to merge this this and where is up to the maintainer. But I note that OFED 1.2 goals do not include windows builds. Why aren't you using the master branch on windows? Change 3 seems fairly big, and since it's not a bugfix, I'd be inclined not to put it on ofed_1_2 branch. -- MST From sashak at voltaire.com Wed Mar 7 01:56:04 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 07 Mar 2007 11:56:04 +0200 Subject: [ofa-general] Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c In-Reply-To: <45EE7629.4070706@dev.mellanox.co.il> References: <45ED90C4.60204@dev.mellanox.co.il> <20070306185613.GD16562@mellanox.co.il> <45EE7629.4070706@dev.mellanox.co.il> Message-ID: <1173261364.3876.21.camel@localhost> On Wed, 2007-03-07 at 10:22 +0200, Yevgeny Kliteynik wrote: > Michael S. Tsirkin wrote: > >> Hi Hal. > >> > >> Converting the the C++ code to C. > >> > >> Please apply both to trunk and to 1.2 > >> > >> Thanks. > >> > >> Signed-off-by: Yevgeny Kliteynik > > > > NAK. > > 1. I don't see any C++ here. > > > > 2. Why do we need this on ofed branch? > > Only bugfixes should go there. What bug does it fix? > > There are 3 things in this patch: > 1. int i -> uint16_t i Why it is needed at all? 'i' is compared with 'unsigned int', if you have signedness warnings you will want to change this to 'unsigned int' and not to 'uint16_t'. > 2. Moving variable declaration (switch_bitmap) to the beginning > of the function (currently, it is declared after OSM_LOG_ENTER) > 3. Changing C99 dynamically allocated array to the old style. > > First two can be categorized as bugs. > The third one is for compiler on windows. > > Each of these elements breaks OSM compilation on Windows. Is this breakage enforced by compilation flags? Sasha > > If we don't include either of these, then OFED 1.2 OpenSM compilation > on windows will be broken. > > -- Yevgeny > > >> --- > >> osm/opensm/osm_ucast_lash.c | 23 ++++++++++++++++------- > >> 1 files changed, 16 insertions(+), 7 deletions(-) > >> > >> diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c > >> index 0afa43c..8c9172e 100644 > >> --- a/osm/opensm/osm_ucast_lash.c > >> +++ b/osm/opensm/osm_ucast_lash.c > >> @@ -406,7 +406,7 @@ static int get_phys_connection(switch_t > >> static void shortest_path(lash_t *p_lash, int ir) > >> { > >> switch_t **switches = p_lash->switches, *sw, *swi; > >> - int i; > >> + uint16_t i; > >> cl_list_t bfsq; > >> > >> cl_list_construct(&bfsq); > >> @@ -986,11 +986,18 @@ static int lash_core(lash_t *p_lash) > >> int output_link2, i_next_switch2; > >> int cycle_found2 = 0; > >> int status = IB_SUCCESS; > >> + int * switch_bitmap = NULL; > >> > >> OSM_LOG_ENTER(p_log, lash_core); > >> > >> - //Bitmap to check if we have processed this pair > >> - int switch_bitmap[num_switches][num_switches]; > >> + switch_bitmap = (int *)malloc(num_switches * num_switches * sizeof(int)); > >> + if (!switch_bitmap) > >> + { > >> + osm_log(p_log, OSM_LOG_ERROR, > >> + "lash_core: ERR 4D04: " > >> + "Failed allocating switch_bitmap - out of memory\n"); > >> + goto Exit; > >> + } > >> > >> for(i=0; i >> > >> @@ -1006,7 +1013,7 @@ static int lash_core(lash_t *p_lash) > >> > >> for(j=0; j >> for(k=0; k >> - switch_bitmap[j][k] = 0; > >> + switch_bitmap[j * num_switches + k] = 0; > >> } > >> switches[j]->used_channels = 0; > >> switches[j]->q_state = UNQUEUED; > >> @@ -1015,7 +1022,7 @@ static int lash_core(lash_t *p_lash) > >> > >> for(i=0; i >> for(dest_switch=0; dest_switch >> - if(dest_switch != i && switch_bitmap[i][dest_switch] == 0) { > >> + if(dest_switch != i && switch_bitmap[i * num_switches + dest_switch] == 0) { > >> v_lane = 0; > >> stop = 0; > >> while(v_lane < lanes_needed && stop == 0) { > >> @@ -1078,8 +1085,8 @@ static int lash_core(lash_t *p_lash) > >> p_lash->virtual_location[i][dest_switch][v_lane] = 1; > >> p_lash->virtual_location[dest_switch][i][v_lane] = 1; > >> > >> - switch_bitmap[i][dest_switch] = 1; > >> - switch_bitmap[dest_switch][i] = 1; > >> + switch_bitmap[i * num_switches + dest_switch] = 1; > >> + switch_bitmap[dest_switch * num_switches + i] = 1; > >> } > >> > >> for(j=0; j >> @@ -1115,6 +1122,8 @@ static int lash_core(lash_t *p_lash) > >> "Lane requirements (%d) exceed available lanes (%d)\n", > >> p_lash->vl_min, lanes_needed); > >> Exit: > >> + if (switch_bitmap) > >> + free(switch_bitmap); > >> OSM_LOG_EXIT(p_log); > >> return status; > >> } > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From bugzilla-daemon at lists.openfabrics.org Wed Mar 7 01:16:06 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 7 Mar 2007 01:16:06 -0800 (PST) Subject: [ofa-general] [Bug 423] New: kernel panic during service openibd start on i686 Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=423 Summary: kernel panic during service openibd start on i686 Product: OpenFabrics Linux Version: 1.2 Platform: X86 OS/Version: RHEL 4 Status: NEW Severity: major Priority: P1 Component: IB Core AssignedTo: bugzilla at openib.org ReportedBy: yosefe at voltaire.com [regtst4 /tmp/regtest/OFED-1.2-reg-20070305-1230] # service openibd start Loading HCA driver and Access Layer:[FAILED] Please open an issue in the http://openib.org/bugzilla and attach /tmp/ib_debug_info.log ==================================================================== ===== ib_debug_info.log =============== ==================================================================== Hostname: regtst4 OS: Red Hat Enterprise Linux AS release 4 (Nahant Update 3) Kernel \r on an \m Current kernel: 2.6.9-34.ELsmp Architecture: i686 GCC version: gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2) Copyright (C) 2004 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. CPU: model name : Intel(R) XEON(TM) CPU 1.80GHz MemTotal: 513960 kB Chipset: 00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub (rev 01) Software ------------------------------------- Build ID: OFED-1.2-reg-20070305-1230 ofa_kernel-1.2: Git: git://git.openfabrics.org/~vlad/ofed_1_2/.git commit 2c106429af5306d2f5fd47f33886925a098406ab ofa_user-1.2: libibverbs: git://git.kernel.org/pub/scm/libs/infiniband/libibverbs.git commit 94bede79e2852f870b023366fbb3588db5683ac9 libmthca: git://git.kernel.org/pub/scm/libs/infiniband/libmthca.git commit 6dc3f25d9ddbc4e2da95e6bff3af1e91a6263890 libehca: git://git.openfabrics.org/~hnguyen/libehca.git commit 6c930c223e1bedc161340eb3b34acd9bf4a49872 libipathverbs: git://git.openfabrics.org/~bos/libipathverbs.git commit 6f2814ea0b0dde01176d539e391f6769b5ff0e4e tvflash: git://git.openfabrics.org/~rdreier/tvflash.git commit 5049e070afeebedf7916bb04a3363e9af38be6df libibcm: git://git.openfabrics.org/~shefty/libibcm.git commit a7fc7a0f196798a04a4ab14a76d338bc574b246c libsdp: git://git.openfabrics.org/~eitan/libsdp.git commit b384706b4973d56170d2003dd2407df092b9d133 mstflint: git://git.openfabrics.org/~mst/mstflint.git commit 6bfc232f0bf4ede0747fa4e368051205067b5f86 perftest: git://git.openfabrics.org/~mst/perftest.git commit 8cd9ba6dc53a688372807c0479e5b96cb1913df4 srptools: git://git.openfabrics.org/~ishai/srptools.git commit d560e8760ebfa3e0c8626e60e3ebf8cbc9726192 ipoibtools: git://git.openfabrics.org/~vlad/ipoibtools.git commit 19833e577a5ea7913bcab8bbd6ceb74ea85522b7 librdmacm: git://git.openfabrics.org/~shefty/librdmacm.git commit 0fe50cb85ede06461a2f3d0a345c0776bf9df2f6 dapl: git://git.openfabrics.org/~ardavis/dapl.git commit d245664e27148e54469268ad81f41b2a894a131a imgen: git://git.openfabrics.org/~mst/imgen.git commit 2b5da49987426a3942fa23fe5bc693e16e370643 management: git://git.openfabrics.org/~halr/management.git commit 632e7979dbbf9d335c9f35948623036898dfb1e4 libcxgb3: git://git.openfabrics.org/~swise/libcxgb3.git commit 42359ad5c2684b8b05d9ffb9f161e4bb9d724de2 qlvnictools: git://git.openfabrics.org/~ramachandrak/qlvnictools.git commit 33fb45f1b85b28343ff6dd9594d154fdaa1c2125 sdpnetstat: git://git.openfabrics.org/~mst/sdpnetstat.git commit cfc08ab244ece514f7c453d27397281129f14264 # MPI mvapich-0.9.9-1015.src.rpm mvapich2-0.9.8-7.src.rpm openmpi-1.2b4ofedr13713-1.src.rpm mpitests-2.0-698.src.rpm ------------------------------------- Device 03:00.0 Info: Firmware: Version: .. Date: // :: ############# LSPCI ############## 00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub (rev 01) 00:00.1 Class ff00: Intel Corporation E7500/E7501 Host RASUM Controller (rev 01) 00:02.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface B PCI-to-PCI Bridge (rev 01) 00:02.1 Class ff00: Intel Corporation E7500/E7501 Hub Interface B RASUM Controller (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1) (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2) (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3) (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42) 00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface Controller (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 02) 01:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 01:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 01:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 01:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 02:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1) 03:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1) 05:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 05:04.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 0d) 05:05.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02) ############# LSPCI -N ############## 00:00.0 Class 0600: 8086:254c (rev 01) 00:00.1 Class ff00: 8086:2541 (rev 01) 00:02.0 Class 0604: 8086:2543 (rev 01) 00:02.1 Class ff00: 8086:2544 (rev 01) 00:1d.0 Class 0c03: 8086:2482 (rev 02) 00:1d.1 Class 0c03: 8086:2484 (rev 02) 00:1d.2 Class 0c03: 8086:2487 (rev 02) 00:1e.0 Class 0604: 8086:244e (rev 42) 00:1f.0 Class 0601: 8086:2480 (rev 02) 00:1f.1 Class 0101: 8086:248b (rev 02) 00:1f.3 Class 0c05: 8086:2483 (rev 02) 01:1c.0 Class 0800: 8086:1461 (rev 04) 01:1d.0 Class 0604: 8086:1460 (rev 04) 01:1e.0 Class 0800: 8086:1461 (rev 04) 01:1f.0 Class 0604: 8086:1460 (rev 04) 02:01.0 Class 0604: 15b3:5a46 (rev a1) 03:00.0 Class 0c06: 15b3:5a44 (rev a1) 05:03.0 Class 0300: 1002:4752 (rev 27) 05:04.0 Class 0200: 8086:1229 (rev 0d) 05:05.0 Class 0200: 8086:100e (rev 02) ############# LSMOD ############## Module Size Used by ib_mthca 147616 0 ib_sdp 53424 0 rdma_cm 36540 1 ib_sdp iw_cm 13572 1 rdma_cm ib_addr 12548 1 rdma_cm ib_local_sa 16012 1 rdma_cm ib_umad 21424 0 ib_ucm 22276 0 ib_uverbs 45992 1 ib_ucm ib_cm 40440 2 rdma_cm,ib_ucm ib_sa 25792 3 rdma_cm,ib_local_sa,ib_cm ib_mad 39960 5 ib_mthca,ib_local_sa,ib_umad,ib_cm,ib_sa ib_core 51072 11 ib_mthca,ib_sdp,rdma_cm,iw_cm,ib_local_sa,ib_umad,ib_ucm,ib_uverbs,ib_cm,ib_sa,ib_mad parport_pc 27905 1 lp 15405 0 parport 37641 2 parport_pc,lp autofs4 22597 0 i2c_dev 14273 0 i2c_core 25921 1 i2c_dev nfs 209705 2 lockd 65257 2 nfs nfs_acl 7745 1 nfs sunrpc 142757 6 nfs,lockd,nfs_acl dm_mirror 28449 0 dm_multipath 22217 0 dm_mod 59973 2 dm_mirror,dm_multipath button 10449 0 battery 12869 0 ac 8773 0 joydev 14209 0 uhci_hcd 32729 0 hw_random 9557 0 e1000 99757 0 md5 8001 1 ipv6 240225 18 e100 38209 0 mii 9153 1 e100 floppy 58065 0 ext3 118729 2 jbd 59481 1 ext3 ############# DMESG ############## ib_mthca: Unknown parameter `db_commands' ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) ib_mthca: Initializing 0000:03:00.0 ACPI: PCI interrupt 0000:03:00.0[A] -> GSI 24 (level, low) -> IRQ 201 ib_ipoib: disagrees about version of symbol ib_cm_listen ib_ipoib: Unknown symbol ib_cm_listen ib_ipoib: disagrees about version of symbol ib_destroy_cm_id ib_ipoib: Unknown symbol ib_destroy_cm_id ib_ipoib: disagrees about version of symbol ib_sa_path_rec_get ib_ipoib: Unknown symbol ib_sa_path_rec_get ib_ipoib: disagrees about version of symbol ib_create_cm_id ib_ipoib: Unknown symbol ib_create_cm_id ib_ipoib: disagrees about version of symbol ib_send_cm_rep ib_ipoib: Unknown symbol ib_send_cm_rep ib_ipoib: disagrees about version of symbol ib_cm_init_qp_attr ib_ipoib: Unknown symbol ib_cm_init_qp_attr ib_ipoib: disagrees about version of symbol ib_send_cm_drep ib_ipoib: Unknown symbol ib_send_cm_drep ib_ipoib: disagrees about version of symbol ib_send_cm_rtu ib_ipoib: Unknown symbol ib_send_cm_rtu ib_ipoib: disagrees about version of symbol ib_send_cm_req ib_ipoib: Unknown symbol ib_send_cm_req ib_ipoib: disagrees about version of symbol ib_send_cm_rej ib_ipoib: Unknown symbol ib_send_cm_rej eip: e0826a77 ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:146! invalid operand: 0000 [#1] SMP Modules linked in: ib_mthca(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_local_sa(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd nfs_acl sunrpc dm_mirror dm_multipath dm_mod button battery ac joydev uhci_hcd hw_random e1000 md5 ipv6 e100 mii floppy ext3 jbd CPU: 1 EIP: 0060:[] Not tainted VLI EFLAGS: 00010046 (2.6.9-34.ELsmp) EIP is at _spin_lock_irqsave+0x20/0x45 eax: e0826a77 ebx: 00000046 ecx: c02e4ca6 edx: c02e4ca6 esi: d9da4690 edi: 00000282 ebp: dca32000 esp: c42ade7c ds: 007b es: 007b ss: 0068 Process ib_mad1 (pid: 18317, threadinfo=c42ad000 task=d43fe2b0) Stack: d9da468c d9da468c e0826a77 00000282 d9da468c c42adeb4 00000282 dca32000 e089d3de c633074c 00000000 c633070c e0a12d50 00000001 dca32000 dca32001 00000011 00000001 00000001 c633070c 00000001 dca32000 e0a1303c 01020246 Call Trace: [] mcast_groups_lost+0xf/0x71 [ib_sa] [] ib_dispatch_event+0x2e/0x55 [ib_core] [] smp_snoop+0x93/0xc0 [ib_mthca] [] mthca_process_mad+0x1a0/0x1d6 [ib_mthca] [] ib_mad_recv_done_handler+0x197/0x237 [ib_mad] [] ib_mad_completion_handler+0x45/0x78 [ib_mad] [] nr_processes+0x44/0x69 [] worker_thread+0x168/0x1d5 [] ib_mad_completion_handler+0x0/0x78 [ib_mad] [] default_wake_function+0x0/0xc [] default_wake_function+0x0/0xc [] worker_thread+0x0/0x1d5 [] kthread+0x73/0x9b [] kthread+0x0/0x9b [] kernel_thread_helper+0x5/0xb Code: 81 00 00 00 00 01 c3 f0 ff 00 c3 56 89 c6 53 9c 5b fa 81 78 04 ad 4e ad de 74 18 ff 74 24 08 68 a6 4c 2e c0 e8 ac 13 e5 ff 59 58 <0f> 0b 92 00 60 3d 2e c0 f0 fe 0e 79 13 f7 c3 00 02 00 00 74 01 <0>Fatal exception: panic in 5 seconds ############# Messages ############## Mar 5 13:40:02 regtst4 crond(pam_unix)[20762]: session closed for user root Mar 5 13:45:01 regtst4 crond(pam_unix)[8493]: session opened for user root by (uid=0) Mar 5 13:45:02 regtst4 crond(pam_unix)[8493]: session closed for user root Mar 5 13:50:01 regtst4 crond(pam_unix)[6875]: session opened for user root by (uid=0) Mar 5 13:50:01 regtst4 crond(pam_unix)[6876]: session opened for user root by (uid=0) Mar 5 13:50:01 regtst4 crond(pam_unix)[6876]: session closed for user root Mar 5 13:50:02 regtst4 crond(pam_unix)[6875]: session closed for user root Mar 5 13:55:01 regtst4 crond(pam_unix)[18154]: session opened for user root by (uid=0) Mar 5 13:55:01 regtst4 crond(pam_unix)[18154]: session closed for user root Mar 5 13:55:35 regtst4 kernel: ib_mthca: Unknown parameter `db_commands' Mar 5 13:55:35 regtst4 kernel: ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) Mar 5 13:55:35 regtst4 kernel: ib_mthca: Initializing 0000:03:00.0 Mar 5 13:55:35 regtst4 kernel: ACPI: PCI interrupt 0000:03:00.0[A] -> GSI 24 (level, low) -> IRQ 201 Mar 5 13:55:36 regtst4 kernel: ib_ipoib: disagrees about version of symbol ib_cm_listen Mar 5 13:55:36 regtst4 kernel: ib_ipoib: Unknown symbol ib_cm_listen Mar 5 13:55:36 regtst4 kernel: ib_ipoib: disagrees about version of symbol ib_destroy_cm_id Mar 5 13:55:36 regtst4 kernel: ib_ipoib: Unknown symbol ib_destroy_cm_id Mar 5 13:55:36 regtst4 kernel: ib_ipoib: disagrees about version of symbol ib_sa_path_rec_get Mar 5 13:55:36 regtst4 kernel: ib_ipoib: Unknown symbol ib_sa_path_rec_get Mar 5 13:55:36 regtst4 kernel: ib_ipoib: disagrees about version of symbol ib_create_cm_id Mar 5 13:55:36 regtst4 kernel: ib_ipoib: Unknown symbol ib_create_cm_id Mar 5 13:55:36 regtst4 kernel: ib_ipoib: disagrees about version of symbol ib_send_cm_rep Mar 5 13:55:36 regtst4 kernel: ib_ipoib: Unknown symbol ib_send_cm_rep Mar 5 13:55:36 regtst4 kernel: ib_ipoib: disagrees about version of symbol ib_cm_init_qp_attr Mar 5 13:55:36 regtst4 kernel: ib_ipoib: Unknown symbol ib_cm_init_qp_attr Mar 5 13:55:36 regtst4 kernel: ib_ipoib: disagrees about version of symbol ib_send_cm_drep Mar 5 13:55:36 regtst4 kernel: ib_ipoib: Unknown symbol ib_send_cm_drep Mar 5 13:55:36 regtst4 kernel: ib_ipoib: disagrees about version of symbol ib_send_cm_rtu Mar 5 13:55:36 regtst4 kernel: ib_ipoib: Unknown symbol ib_send_cm_rtu Mar 5 13:55:36 regtst4 kernel: ib_ipoib: disagrees about version of symbol ib_send_cm_req Mar 5 13:55:36 regtst4 kernel: ib_ipoib: Unknown symbol ib_send_cm_req Mar 5 13:55:36 regtst4 kernel: ib_ipoib: disagrees about version of symbol ib_send_cm_rej Mar 5 13:55:36 regtst4 kernel: ib_ipoib: Unknown symbol ib_send_cm_rej Mar 5 13:55:36 regtst4 kernel: eip: e0826a77 Mar 5 13:55:36 regtst4 kernel: ------------[ cut here ]------------ Mar 5 13:55:36 regtst4 kernel: kernel BUG at include/asm/spinlock.h:146! Mar 5 13:55:36 regtst4 kernel: invalid operand: 0000 [#1] Mar 5 13:55:36 regtst4 kernel: SMP Mar 5 13:55:36 regtst4 kernel: Modules linked in: ib_mthca(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_local_sa(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd nfs_acl sunrpc dm_mirror dm_multipath dm_mod button battery ac joydev uhci_hcd hw_random e1000 md5 ipv6 e100 mii floppy ext3 jbd Mar 5 13:55:36 regtst4 kernel: CPU: 1 Mar 5 13:55:36 regtst4 kernel: EIP: 0060:[] Not tainted VLI Mar 5 13:55:36 regtst4 kernel: EFLAGS: 00010046 (2.6.9-34.ELsmp) Mar 5 13:55:36 regtst4 kernel: EIP is at _spin_lock_irqsave+0x20/0x45 Mar 5 13:55:36 regtst4 kernel: eax: e0826a77 ebx: 00000046 ecx: c02e4ca6 edx: c02e4ca6 Mar 5 13:55:36 regtst4 kernel: esi: d9da4690 edi: 00000282 ebp: dca32000 esp: c42ade7c Mar 5 13:55:36 regtst4 kernel: ds: 007b es: 007b ss: 0068 Mar 5 13:55:36 regtst4 kernel: Process ib_mad1 (pid: 18317, threadinfo=c42ad000 task=d43fe2b0) Mar 5 13:55:36 regtst4 kernel: Stack: d9da468c d9da468c e0826a77 00000282 d9da468c c42adeb4 00000282 dca32000 Mar 5 13:55:36 regtst4 kernel: e089d3de c633074c 00000000 c633070c e0a12d50 00000001 dca32000 dca32001 Mar 5 13:55:36 regtst4 kernel: 00000011 00000001 00000001 c633070c 00000001 dca32000 e0a1303c 01020246 ############# Running Processes ############## UID PID PPID C STIME TTY TIME CMD root 1 0 0 Mar04 ? 00:00:01 init [3] root 2 1 0 Mar04 ? 00:00:01 [migration/0] root 3 1 0 Mar04 ? 00:00:00 [ksoftirqd/0] root 4 1 0 Mar04 ? 00:00:01 [migration/1] root 5 1 0 Mar04 ? 00:00:00 [ksoftirqd/1] root 6 1 0 Mar04 ? 00:00:01 [migration/2] root 7 1 0 Mar04 ? 00:00:00 [ksoftirqd/2] root 8 1 0 Mar04 ? 00:00:00 [migration/3] root 9 1 0 Mar04 ? 00:00:00 [ksoftirqd/3] root 10 1 0 Mar04 ? 00:00:00 [events/0] root 11 1 0 Mar04 ? 00:00:00 [events/1] root 12 1 0 Mar04 ? 00:00:00 [events/2] root 13 1 0 Mar04 ? 00:00:00 [events/3] root 14 10 0 Mar04 ? 00:00:00 [khelper] root 15 10 0 Mar04 ? 00:00:00 [kacpid] root 42 10 0 Mar04 ? 00:00:00 [kblockd/0] root 43 10 0 Mar04 ? 00:00:00 [kblockd/1] root 44 10 0 Mar04 ? 00:00:00 [kblockd/2] root 45 10 0 Mar04 ? 00:00:00 [kblockd/3] root 46 1 0 Mar04 ? 00:00:00 [khubd] root 58 10 0 Mar04 ? 00:00:00 [aio/0] root 59 10 0 Mar04 ? 00:00:00 [aio/1] root 60 10 0 Mar04 ? 00:00:00 [aio/2] root 61 10 0 Mar04 ? 00:00:00 [aio/3] root 57 1 0 Mar04 ? 00:00:02 [kswapd0] root 134 1 0 Mar04 ? 00:00:00 [kseriod] root 215 1 0 Mar04 ? 00:00:08 [kjournald] root 1106 1 0 Mar04 ? 00:00:00 udevd root 1992 10 0 Mar04 ? 00:00:00 [kauditd] root 2156 10 0 Mar04 ? 00:00:00 [kmpathd/0] root 2157 10 0 Mar04 ? 00:00:00 [kmpathd/1] root 2158 10 0 Mar04 ? 00:00:00 [kmpathd/2] root 2159 10 0 Mar04 ? 00:00:00 [kmpathd/3] root 2167 10 0 Mar04 ? 00:00:00 [kmirrord] root 2168 10 0 Mar04 ? 00:00:00 [kmir_mon] root 2189 1 0 Mar04 ? 00:00:00 [kjournald] root 2905 1 0 Mar04 ? 00:00:00 syslogd -m 0 root 2910 1 0 Mar04 ? 00:00:00 klogd -x root 2921 1 0 Mar04 ? 00:00:03 irqbalance rpc 2940 1 0 Mar04 ? 00:00:02 portmap rpcuser 2960 1 0 Mar04 ? 00:00:00 rpc.statd root 3574 1 0 Mar04 ? 00:00:00 rpc.idmapd root 3613 1 0 Mar04 ? 00:00:00 [rpciod] root 3614 1 0 Mar04 ? 00:00:00 [lockd] root 3804 1 0 Mar04 ? 00:00:00 /usr/sbin/acpid root 3835 1 0 Mar04 ? 00:00:01 cupsd root 3889 1 0 Mar04 ? 00:00:01 /usr/sbin/sshd root 3904 1 0 Mar04 ? 00:00:00 xinetd -stayalive -pidfile /var/run/xinetd.pid ntp 3920 1 0 Mar04 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid root 3930 1 0 Mar04 ? 00:00:00 /usr/sbin/vsftpd /etc/vsftpd/vsftpd.conf root 3940 1 0 Mar04 ? 00:00:00 gpm -m /dev/input/mice -t imps2 htt 3979 1 0 Mar04 ? 00:00:00 /usr/sbin/htt -retryonerror 0 htt 3980 3979 0 Mar04 ? 00:00:00 htt_server -nodaemon canna 3992 1 0 Mar04 ? 00:00:00 /usr/sbin/cannaserver -syslog -u canna root 4004 1 0 Mar04 ? 00:00:00 crond xfs 4042 1 0 Mar04 ? 00:00:00 xfs -droppriv -daemon root 4061 1 0 Mar04 ? 00:00:00 /usr/sbin/atd dbus 4077 1 0 Mar04 ? 00:00:00 dbus-daemon-1 --system root 4088 1 0 Mar04 ? 00:00:00 cups-config-daemon root 4099 1 0 Mar04 ? 00:00:01 hald root 4108 1 0 Mar04 ? 00:00:00 login -- root root 4109 1 0 Mar04 tty2 00:00:00 /sbin/mingetty tty2 root 4110 1 0 Mar04 tty3 00:00:00 /sbin/mingetty tty3 root 4111 1 0 Mar04 tty4 00:00:00 /sbin/mingetty tty4 root 4112 1 0 Mar04 tty5 00:00:00 /sbin/mingetty tty5 root 4113 1 0 Mar04 tty6 00:00:00 /sbin/mingetty tty6 root 7050 4108 0 Mar04 tty1 00:00:00 -bash root 19482 11 0 Mar04 ? 00:00:00 [ib_mcast] root 19487 11 0 Mar04 ? 00:00:00 [ib_cm/0] root 19488 11 0 Mar04 ? 00:00:00 [ib_cm/1] root 19489 11 0 Mar04 ? 00:00:00 [ib_cm/2] root 19490 11 0 Mar04 ? 00:00:00 [ib_cm/3] root 19511 11 0 Mar04 ? 00:00:00 [local_sa] root 19516 10 0 Mar04 ? 00:00:00 [ib_addr_wq] root 19521 11 0 Mar04 ? 00:00:00 [iw_cm_wq] root 19526 11 0 Mar04 ? 00:00:00 [rdma_cm_wq] root 19531 11 0 Mar04 ? 00:00:00 [sdp] root 23638 3889 0 13:01 ? 00:00:00 sshd: root at notty root 23640 23638 0 13:01 ? 00:00:00 bash -c cd /regtest/./setup.d/install-ofed; ./dispatch 2>> /tmp/regtest/regtest-install-ofed-1173092151.log root 23661 23640 0 13:01 ? 00:00:00 /usr/bin/perl ./dispatch root 26272 12 0 13:03 ? 00:00:00 [pdflush] root 18162 13 0 13:07 ? 00:00:01 [pdflush] root 18280 23661 0 13:55 ? 00:00:00 sh -c service openibd start 1>>/tmp/regtest/regtest-install-ofed-1173092151.log 2>&1 root 18281 18280 0 13:55 ? 00:00:00 /bin/sh /sbin/service openibd start root 18284 18281 1 13:55 ? 00:00:00 /bin/bash /etc/init.d/openibd start root 18314 11 0 13:55 ? 00:00:00 [mthcacatas] root 18315 14 0 13:55 ? 00:00:00 /bin/sh /sbin/hotplug infiniband root 18316 18315 0 13:55 ? 00:00:00 /etc/hotplug.d/default/05-wait_for_sysfs.hotplug infiniband root 18317 11 0 13:55 ? 00:00:00 [ib_mad1] root 18318 11 0 13:55 ? 00:00:00 [ib_mad2] root 18320 14 0 13:55 ? 00:00:00 /bin/sh /sbin/hotplug infiniband_cm root 18324 18320 0 13:55 ? 00:00:00 /etc/hotplug.d/default/05-wait_for_sysfs.hotplug infiniband_cm root 18447 18284 0 13:55 ? 00:00:00 /bin/ps -ef ############################################## -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Wed Mar 7 01:23:13 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 7 Mar 2007 01:23:13 -0800 (PST) Subject: [ofa-general] [Bug 423] kernel panic during service openibd start on i686 In-Reply-To: Message-ID: <20070307092313.93F77E60386@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=423 mst at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #1 from mst at mellanox.co.il 2007-03-07 01:23 ------- You had an old version of modules loaded (thats why there are version conflicts). Do a full reboot and the problem will go away. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From kliteyn at dev.mellanox.co.il Wed Mar 7 01:54:12 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 07 Mar 2007 11:54:12 +0200 Subject: [ofa-general] Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c In-Reply-To: <1173261364.3876.21.camel@localhost> References: <45ED90C4.60204@dev.mellanox.co.il> <20070306185613.GD16562@mellanox.co.il> <45EE7629.4070706@dev.mellanox.co.il> <1173261364.3876.21.camel@localhost> Message-ID: <45EE8BC4.6080406@dev.mellanox.co.il> Sasha Khapyorsky wrote: > On Wed, 2007-03-07 at 10:22 +0200, Yevgeny Kliteynik wrote: >> Michael S. Tsirkin wrote: >>>> Hi Hal. >>>> >>>> Converting the the C++ code to C. >>>> >>>> Please apply both to trunk and to 1.2 >>>> >>>> Thanks. >>>> >>>> Signed-off-by: Yevgeny Kliteynik >>> NAK. >>> 1. I don't see any C++ here. >>> >>> 2. Why do we need this on ofed branch? >>> Only bugfixes should go there. What bug does it fix? >> There are 3 things in this patch: >> 1. int i -> uint16_t i > > Why it is needed at all? 'i' is compared with 'unsigned int', if you > have signedness warnings you will want to change this to 'unsigned int' > and not to 'uint16_t'. Agree, "unsigned int" it is. >> 2. Moving variable declaration (switch_bitmap) to the beginning >> of the function (currently, it is declared after OSM_LOG_ENTER) >> 3. Changing C99 dynamically allocated array to the old style. >> >> First two can be categorized as bugs. >> The third one is for compiler on windows. >> >> Each of these elements breaks OSM compilation on Windows. > > Is this breakage enforced by compilation flags? No, by compiler that doesn't support all the C99 syntax yet. Anyway, on second thought - no need to check it into the branch. I'll resend the V2 of the patch. -- Yevgeny > Sasha > >> If we don't include either of these, then OFED 1.2 OpenSM compilation >> on windows will be broken. >> >> -- Yevgeny >> >>>> --- >>>> osm/opensm/osm_ucast_lash.c | 23 ++++++++++++++++------- >>>> 1 files changed, 16 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c >>>> index 0afa43c..8c9172e 100644 >>>> --- a/osm/opensm/osm_ucast_lash.c >>>> +++ b/osm/opensm/osm_ucast_lash.c >>>> @@ -406,7 +406,7 @@ static int get_phys_connection(switch_t >>>> static void shortest_path(lash_t *p_lash, int ir) >>>> { >>>> switch_t **switches = p_lash->switches, *sw, *swi; >>>> - int i; >>>> + uint16_t i; >>>> cl_list_t bfsq; >>>> >>>> cl_list_construct(&bfsq); >>>> @@ -986,11 +986,18 @@ static int lash_core(lash_t *p_lash) >>>> int output_link2, i_next_switch2; >>>> int cycle_found2 = 0; >>>> int status = IB_SUCCESS; >>>> + int * switch_bitmap = NULL; >>>> >>>> OSM_LOG_ENTER(p_log, lash_core); >>>> >>>> - //Bitmap to check if we have processed this pair >>>> - int switch_bitmap[num_switches][num_switches]; >>>> + switch_bitmap = (int *)malloc(num_switches * num_switches * sizeof(int)); >>>> + if (!switch_bitmap) >>>> + { >>>> + osm_log(p_log, OSM_LOG_ERROR, >>>> + "lash_core: ERR 4D04: " >>>> + "Failed allocating switch_bitmap - out of memory\n"); >>>> + goto Exit; >>>> + } >>>> >>>> for(i=0; i>>> >>>> @@ -1006,7 +1013,7 @@ static int lash_core(lash_t *p_lash) >>>> >>>> for(j=0; j>>> for(k=0; k>>> - switch_bitmap[j][k] = 0; >>>> + switch_bitmap[j * num_switches + k] = 0; >>>> } >>>> switches[j]->used_channels = 0; >>>> switches[j]->q_state = UNQUEUED; >>>> @@ -1015,7 +1022,7 @@ static int lash_core(lash_t *p_lash) >>>> >>>> for(i=0; i>>> for(dest_switch=0; dest_switch>>> - if(dest_switch != i && switch_bitmap[i][dest_switch] == 0) { >>>> + if(dest_switch != i && switch_bitmap[i * num_switches + dest_switch] == 0) { >>>> v_lane = 0; >>>> stop = 0; >>>> while(v_lane < lanes_needed && stop == 0) { >>>> @@ -1078,8 +1085,8 @@ static int lash_core(lash_t *p_lash) >>>> p_lash->virtual_location[i][dest_switch][v_lane] = 1; >>>> p_lash->virtual_location[dest_switch][i][v_lane] = 1; >>>> >>>> - switch_bitmap[i][dest_switch] = 1; >>>> - switch_bitmap[dest_switch][i] = 1; >>>> + switch_bitmap[i * num_switches + dest_switch] = 1; >>>> + switch_bitmap[dest_switch * num_switches + i] = 1; >>>> } >>>> >>>> for(j=0; j>>> @@ -1115,6 +1122,8 @@ static int lash_core(lash_t *p_lash) >>>> "Lane requirements (%d) exceed available lanes (%d)\n", >>>> p_lash->vl_min, lanes_needed); >>>> Exit: >>>> + if (switch_bitmap) >>>> + free(switch_bitmap); >>>> OSM_LOG_EXIT(p_log); >>>> return status; >>>> } >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From kliteyn at dev.mellanox.co.il Wed Mar 7 02:01:28 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 07 Mar 2007 12:01:28 +0200 Subject: [ofa-general] [PATCHv2] osm: Converting the the C99 code to C in osm_ucast_lash.c Message-ID: <45EE8D78.50702@dev.mellanox.co.il> Hi Hal V2 of this patch - trivial fix of data type and converting C99 code for compilation on windows. Please apply to trunk only. Thanks Signed-off-by: Yevgeny Kliteynik --- osm/opensm/osm_ucast_lash.c | 23 ++++++++++++++++------- 1 files changed, 16 insertions(+), 7 deletions(-) diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c index 0afa43c..8c9172e 100644 --- a/osm/opensm/osm_ucast_lash.c +++ b/osm/opensm/osm_ucast_lash.c @@ -406,7 +406,7 @@ static int get_phys_connection(switch_t static void shortest_path(lash_t *p_lash, int ir) { switch_t **switches = p_lash->switches, *sw, *swi; - int i; + unsigned int i; cl_list_t bfsq; cl_list_construct(&bfsq); @@ -986,11 +986,18 @@ static int lash_core(lash_t *p_lash) int output_link2, i_next_switch2; int cycle_found2 = 0; int status = IB_SUCCESS; + int * switch_bitmap = NULL; /* Bitmap to check if we have processed this pair */ OSM_LOG_ENTER(p_log, lash_core); - //Bitmap to check if we have processed this pair - int switch_bitmap[num_switches][num_switches]; + switch_bitmap = (int *)malloc(num_switches * num_switches * sizeof(int)); + if (!switch_bitmap) + { + osm_log(p_log, OSM_LOG_ERROR, + "lash_core: ERR 4D04: " + "Failed allocating switch_bitmap - out of memory\n"); + goto Exit; + } for(i=0; iused_channels = 0; switches[j]->q_state = UNQUEUED; @@ -1015,7 +1022,7 @@ static int lash_core(lash_t *p_lash) for(i=0; ivirtual_location[i][dest_switch][v_lane] = 1; p_lash->virtual_location[dest_switch][i][v_lane] = 1; - switch_bitmap[i][dest_switch] = 1; - switch_bitmap[dest_switch][i] = 1; + switch_bitmap[i * num_switches + dest_switch] = 1; + switch_bitmap[dest_switch * num_switches + i] = 1; } for(j=0; jvl_min, lanes_needed); Exit: + if (switch_bitmap) + free(switch_bitmap); OSM_LOG_EXIT(p_log); return status; } -- 1.4.4.1.GIT From vlad at lists.openfabrics.org Wed Mar 7 02:15:32 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Wed, 7 Mar 2007 02:15:32 -0800 (PST) Subject: [ofa-general] ofa_1_2_kernel 20070307-0200 daily build status Message-ID: <20070307101532.D04ECE607F1@openfabrics.org> This email was generated automatically, please do not reply Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.15 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.16.21-0.8-default Failed: From kliteyn at dev.mellanox.co.il Wed Mar 7 02:14:00 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 07 Mar 2007 12:14:00 +0200 Subject: [ofa-general] [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c In-Reply-To: <45EE6AB2.5020801@dev.mellanox.co.il> References: <45ED90C4.60204@dev.mellanox.co.il> <20070306175114.GJ11411@obsidianresearch.com> <1173205862.4546.363189.camel@hal.voltaire.com> <45EE6AB2.5020801@dev.mellanox.co.il> Message-ID: <45EE9068.2050601@dev.mellanox.co.il> Yevgeny Kliteynik wrote: > Hal Rosenstock wrote: >> On Tue, 2007-03-06 at 12:51, Jason Gunthorpe wrote: >>> On Tue, Mar 06, 2007 at 06:03:16PM +0200, Yevgeny Kliteynik wrote: >>>> Hi Hal. >>>> >>>> Converting the the C++ code to C. >>> This is actually valid C99 code. This is the method that ISO >>> standardized in C99 to do dynamic stack allocations (alloca is not >>> an ISO C function). >>> >>> Since it is now 2007 is there really still a desire to not use C99 >>> features? >> My guess is that the Windows compiler didn't like this though :-( > > Correct :( > > BTW, there's still a C++ element there - switch_bitmap is declared > after using OSM_LOG_ENTER. I take it back (the C++ part) - in C99 declaration can appear at any place within a block. Didn't know it. -- Yevgeny > --Yevgeny > >> -- Hal >> >>> Jason > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Wed Mar 7 03:46:41 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Mar 2007 06:46:41 -0500 Subject: [ofa-general] Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c In-Reply-To: <20070307084058.GH31276@mellanox.co.il> References: <45ED90C4.60204@dev.mellanox.co.il> <20070306185613.GD16562@mellanox.co.il> <45EE7629.4070706@dev.mellanox.co.il> <20070307084058.GH31276@mellanox.co.il> Message-ID: <1173268000.4546.426357.camel@hal.voltaire.com> On Wed, 2007-03-07 at 03:40, Michael S. Tsirkin wrote: > > Quoting Yevgeny Kliteynik : > > Subject: Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c > > > > > > Michael S. Tsirkin wrote: > > >> Hi Hal. > > >> > > >> Converting the the C++ code to C. > > >> > > >> Please apply both to trunk and to 1.2 > > >> > > >> Thanks. > > >> > > >> Signed-off-by: Yevgeny Kliteynik > > > > > > NAK. > > > 1. I don't see any C++ here. > > > > > > 2. Why do we need this on ofed branch? > > > Only bugfixes should go there. What bug does it fix? > > > > There are 3 things in this patch: > > 1. int i -> uint16_t i > > 2. Moving variable declaration (switch_bitmap) to the beginning > > of the function (currently, it is declared after OSM_LOG_ENTER) > > 3. Changing C99 dynamically allocated array to the old style. > > > > First two can be categorized as bugs. > > > > The third one is for compiler on windows. > > > > Each of these elements breaks OSM compilation on Windows. > > > > If we don't include either of these, then OFED 1.2 OpenSM compilation > > on windows will be broken. > > Ultimately, whether to merge this this and where is up to the maintainer. But I > note that OFED 1.2 goals do not include windows builds. While not a formal OFED 1.2 goal, doesn't this depend on whether there is intended to be a Windows equivalent to the OFED 1.2 OpenSM ? -- Hal > Why aren't you using the > master branch on windows? > > Change 3 seems fairly big, and since it's not a bugfix, I'd be inclined > not to put it on ofed_1_2 branch. From monil at voltaire.com Wed Mar 7 04:02:19 2007 From: monil at voltaire.com (Moni Levy) Date: Wed, 7 Mar 2007 14:02:19 +0200 Subject: [ofa-general] RE: [Bug 420] PKey table reordering caused by SM failover stops ipoib traffic In-Reply-To: <20070307080045.248CFE603C4@openfabrics.org> Message-ID: <3857BB049D83424D9DB82753D37CEA552C9328@taurus.voltaire.com> On Wednesday, March 07, 2007 10:01 AM, bugzilla-daemon at lists.openfabrics.org wrote: > https://bugs.openfabrics.org/show_bug.cgi?id=420 > > > mst at mellanox.co.il changed: > > What |Removed |Added > ------------------------------------------------------------------------ ---- > Severity|blocker |critical > > > > > ------- Comment #1 from mst at mellanox.co.il 2007-03-07 00:00 ------- > I don't think its a beta blocker - restarting ipoib is a simple > workaround. You can't restart IPoIB if the SM failover is not an intentional one but some kind of a real problem. You just can't be prepared for that. What do you think ? --Moni From xma at us.ibm.com Wed Mar 7 04:07:04 2007 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 7 Mar 2007 04:07:04 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: <20070307042653.GA17273@obsidianresearch.com> Message-ID: Jason, The bottom line of this patch is the same DAD bahavior w/o this patch, let's see how DAD behaves w/i, w/o this patch: 1. If multicast join failure forever, then the IPv6 address is useless in both cases, but IPv4 will work in fabrics w/i this patch, but IPv4 doesn't work w/o this patch. 2. If multicast join successfully right away, then most likely we will receive the NA on time, then there is no DAD issue in both cases. 3. If multicast join successfully within the DAD timer, most likely we will receive the NA on time in both cases. 4. If mulitcast join succesffully after the DAD timer, then we have DAD issue in both cases. Only a tiny timing difference (a few lines code) w/i w/o the patch, that's I don't understand why you think this patch impacts the DAD behavior, but I think this patch does turn on carrier on on right time. Look at RFC IPoIB in section 5: Thus, the IPoIB link is formed by the IPoIB nodes joining the broadcast group. There is no physical demarcation of the IPoIB link other than that determined by the broadcast group membership. >If IPoIB is asked to send to a multicast address that is not yet >joined (join is pending to the SA, or whatever) then it uses the >broadcast MLID for the packet. This is similar to how ethernet works >and would let DAD and RS work correctly in all cases. Yes this is another patch I am working on. For IPv4 we use broadcast MLID, for IPv6 we use all host multicast MLID for the packet. Here is the RFC regarding not finding the multicast entry, but it only mentions remote subnet, not for local subnet. We do need a patch for IB multicast join for different subnets in the fabrics. It is missing in IPoIB right now. 10. Sending and Receiving IP Multicast Packets Multicast in InfiniBand differs in a number of ways from multicast in ethernet. This adds some complexity to an IPoIB implementation when supporting IP multicast over IB. A) An IB multicast group must be explicitly created through the SA before it can be used. This implies that in order to send a packet destined for an IP multicast address, the IPoIB implementation must check with the SA on the outbound link first for a "MCMemberRecord" that matches the MGID. If one does exist, the Multicast Local Identifier (MLID) associated with the multicast group is used as the Destination Local Identifier (DLID) for the packet. Otherwise, it implies no member exists on the local link. If the scope of the IP multicast group is beyond link-local, the packet must be sent to the on-link routers through the use of the all-router multicast group or the broadcast group. This is to allow local routers to forward the packet to multicast listeners on remote networks. The all-router multicast group is preferred over the broadcast group for better efficiency. If the all-router multicast group does not exist, the sender can assume that there are no routers on the local link; hence the packet can be safely dropped. I had started another thread discussing the second patch before. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Wed Mar 7 04:14:42 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Mar 2007 14:14:42 +0200 Subject: [ofa-general] Re: [PATCH] osm: Converting the the C++ code to C inosm_ucast_lash.c In-Reply-To: <1173268000.4546.426357.camel@hal.voltaire.com> References: <45ED90C4.60204@dev.mellanox.co.il> <20070306185613.GD16562@mellanox.co.il> <45EE7629.4070706@dev.mellanox.co.il> <20070307084058.GH31276@mellanox.co.il> <1173268000.4546.426357.camel@hal.voltaire.com> Message-ID: <20070307121442.GB30927@mellanox.co.il> > > > If we don't include either of these, then OFED 1.2 OpenSM compilation > > > on windows will be broken. > > > > Ultimately, whether to merge this this and where is up to the maintainer. But I > > note that OFED 1.2 goals do not include windows builds. > > While not a formal OFED 1.2 goal, doesn't this depend on whether there > is intended to be a Windows equivalent to the OFED 1.2 OpenSM ? I guess. And is there? -- MST From kliteyn at dev.mellanox.co.il Wed Mar 7 04:12:35 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 07 Mar 2007 14:12:35 +0200 Subject: [ofa-general] Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c In-Reply-To: <1173268000.4546.426357.camel@hal.voltaire.com> References: <45ED90C4.60204@dev.mellanox.co.il> <20070306185613.GD16562@mellanox.co.il> <45EE7629.4070706@dev.mellanox.co.il> <20070307084058.GH31276@mellanox.co.il> <1173268000.4546.426357.camel@hal.voltaire.com> Message-ID: <45EEAC33.2080200@dev.mellanox.co.il> Hal Rosenstock wrote: > On Wed, 2007-03-07 at 03:40, Michael S. Tsirkin wrote: >>> Quoting Yevgeny Kliteynik : >>> Subject: Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c >>> >>> >>> Michael S. Tsirkin wrote: >>>>> Hi Hal. >>>>> >>>>> Converting the the C++ code to C. >>>>> >>>>> Please apply both to trunk and to 1.2 >>>>> >>>>> Thanks. >>>>> >>>>> Signed-off-by: Yevgeny Kliteynik >>>> NAK. >>>> 1. I don't see any C++ here. >>>> >>>> 2. Why do we need this on ofed branch? >>>> Only bugfixes should go there. What bug does it fix? >>> There are 3 things in this patch: >>> 1. int i -> uint16_t i >>> 2. Moving variable declaration (switch_bitmap) to the beginning >>> of the function (currently, it is declared after OSM_LOG_ENTER) >>> 3. Changing C99 dynamically allocated array to the old style. >>> >>> First two can be categorized as bugs. >>> >>> The third one is for compiler on windows. >>> >>> Each of these elements breaks OSM compilation on Windows. >>> >>> If we don't include either of these, then OFED 1.2 OpenSM compilation >>> on windows will be broken. >> Ultimately, whether to merge this this and where is up to the maintainer. But I >> note that OFED 1.2 goals do not include windows builds. > > While not a formal OFED 1.2 goal, doesn't this depend on whether there > is intended to be a Windows equivalent to the OFED 1.2 OpenSM ? I'm not aware of any plans for windows equivalent to the OFED 1.2 OpenSM, so I'm dropping my efforts to keep it "windows compilable". In the V2 of this patch I've mentioned that it should be applied to trunk only. Thanks. -- Yevgeny > -- Hal > >> Why aren't you using the >> master branch on windows? >> >> Change 3 seems fairly big, and since it's not a bugfix, I'd be inclined >> not to put it on ofed_1_2 branch. > From mst at mellanox.co.il Wed Mar 7 04:17:51 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Mar 2007 14:17:51 +0200 Subject: [ofa-general] Re: [Bug 420] PKey table reordering caused by SM failover stops ipoib traffic In-Reply-To: <3857BB049D83424D9DB82753D37CEA552C9328@taurus.voltaire.com> References: <20070307080045.248CFE603C4@openfabrics.org> <3857BB049D83424D9DB82753D37CEA552C9328@taurus.voltaire.com> Message-ID: <20070307121751.GC30927@mellanox.co.il> > > ------- Comment #1 from mst at mellanox.co.il 2007-03-07 00:00 ------- > > I don't think its a beta blocker - restarting ipoib is a simple > > workaround. > > You can't restart IPoIB if the SM failover is not an intentional one but > some kind of a real problem. You just can't be prepared for that. > What do you think ? I think we'll fix it, it just does not have to block the beta. -- MST From halr at voltaire.com Wed Mar 7 05:12:48 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Mar 2007 08:12:48 -0500 Subject: [ofa-general] Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c In-Reply-To: <45EEAC33.2080200@dev.mellanox.co.il> References: <45ED90C4.60204@dev.mellanox.co.il> <20070306185613.GD16562@mellanox.co.il> <45EE7629.4070706@dev.mellanox.co.il> <20070307084058.GH31276@mellanox.co.il> <1173268000.4546.426357.camel@hal.voltaire.com> <45EEAC33.2080200@dev.mellanox.co.il> Message-ID: <1173273156.4546.431372.camel@hal.voltaire.com> On Wed, 2007-03-07 at 07:12, Yevgeny Kliteynik wrote: > Hal Rosenstock wrote: > > On Wed, 2007-03-07 at 03:40, Michael S. Tsirkin wrote: > >>> Quoting Yevgeny Kliteynik : > >>> Subject: Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c > >>> > >>> > >>> Michael S. Tsirkin wrote: > >>>>> Hi Hal. > >>>>> > >>>>> Converting the the C++ code to C. > >>>>> > >>>>> Please apply both to trunk and to 1.2 > >>>>> > >>>>> Thanks. > >>>>> > >>>>> Signed-off-by: Yevgeny Kliteynik > >>>> NAK. > >>>> 1. I don't see any C++ here. > >>>> > >>>> 2. Why do we need this on ofed branch? > >>>> Only bugfixes should go there. What bug does it fix? > >>> There are 3 things in this patch: > >>> 1. int i -> uint16_t i > >>> 2. Moving variable declaration (switch_bitmap) to the beginning > >>> of the function (currently, it is declared after OSM_LOG_ENTER) > >>> 3. Changing C99 dynamically allocated array to the old style. > >>> > >>> First two can be categorized as bugs. > >>> > >>> The third one is for compiler on windows. > >>> > >>> Each of these elements breaks OSM compilation on Windows. > >>> > >>> If we don't include either of these, then OFED 1.2 OpenSM compilation > >>> on windows will be broken. > >> Ultimately, whether to merge this this and where is up to the maintainer. But I > >> note that OFED 1.2 goals do not include windows builds. > > > > While not a formal OFED 1.2 goal, doesn't this depend on whether there > > is intended to be a Windows equivalent to the OFED 1.2 OpenSM ? > > I'm not aware of any plans for windows equivalent to the OFED 1.2 OpenSM, Should there be ? master may be less stable and certainly is likely to be less tested than OFED 1.2 at any point in time. -- Hal > so I'm dropping my efforts to keep it "windows compilable". > In the V2 of this patch I've mentioned that it should be applied > to trunk only. > > Thanks. > > -- Yevgeny > > > -- Hal > > > >> Why aren't you using the > >> master branch on windows? > >> > >> Change 3 seems fairly big, and since it's not a bugfix, I'd be inclined > >> not to put it on ofed_1_2 branch. > > From mst at mellanox.co.il Wed Mar 7 05:18:46 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Mar 2007 15:18:46 +0200 Subject: [ofa-general] Re: [PATCH] osm: Converting the the C++ code to C inosm_ucast_lash.c In-Reply-To: <1173273156.4546.431372.camel@hal.voltaire.com> References: <45ED90C4.60204@dev.mellanox.co.il> <20070306185613.GD16562@mellanox.co.il> <45EE7629.4070706@dev.mellanox.co.il> <20070307084058.GH31276@mellanox.co.il> <1173268000.4546.426357.camel@hal.voltaire.com> <45EEAC33.2080200@dev.mellanox.co.il> <1173273156.4546.431372.camel@hal.voltaire.com> Message-ID: <20070307131846.GF30927@mellanox.co.il> > Quoting Hal Rosenstock : > Subject: Re: [PATCH] osm: Converting the the C++ code to C inosm_ucast_lash.c > > On Wed, 2007-03-07 at 07:12, Yevgeny Kliteynik wrote: > > Hal Rosenstock wrote: > > > On Wed, 2007-03-07 at 03:40, Michael S. Tsirkin wrote: > > >>> Quoting Yevgeny Kliteynik : > > >>> Subject: Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c > > >>> > > >>> > > >>> Michael S. Tsirkin wrote: > > >>>>> Hi Hal. > > >>>>> > > >>>>> Converting the the C++ code to C. > > >>>>> > > >>>>> Please apply both to trunk and to 1.2 > > >>>>> > > >>>>> Thanks. > > >>>>> > > >>>>> Signed-off-by: Yevgeny Kliteynik > > >>>> NAK. > > >>>> 1. I don't see any C++ here. > > >>>> > > >>>> 2. Why do we need this on ofed branch? > > >>>> Only bugfixes should go there. What bug does it fix? > > >>> There are 3 things in this patch: > > >>> 1. int i -> uint16_t i > > >>> 2. Moving variable declaration (switch_bitmap) to the beginning > > >>> of the function (currently, it is declared after OSM_LOG_ENTER) > > >>> 3. Changing C99 dynamically allocated array to the old style. > > >>> > > >>> First two can be categorized as bugs. > > >>> > > >>> The third one is for compiler on windows. > > >>> > > >>> Each of these elements breaks OSM compilation on Windows. > > >>> > > >>> If we don't include either of these, then OFED 1.2 OpenSM compilation > > >>> on windows will be broken. > > >> Ultimately, whether to merge this this and where is up to the maintainer. But I > > >> note that OFED 1.2 goals do not include windows builds. > > > > > > While not a formal OFED 1.2 goal, doesn't this depend on whether there > > > is intended to be a Windows equivalent to the OFED 1.2 OpenSM ? > > > > I'm not aware of any plans for windows equivalent to the OFED 1.2 OpenSM, > > Should there be ? Isn't that what we need to know to decide whether to merge this patch? > master may be less stable and certainly is likely to > be less tested than OFED 1.2 at any point in time. I guess openib-windows guys will be able to branch off from ofed 1.2 branch if they like. But even if you fix compilation issues on ofed 1.2 now, it's unlikely a windows release won't include other changes as compared to the linux one. So why bother? -- MST From halr at voltaire.com Wed Mar 7 05:27:26 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Mar 2007 08:27:26 -0500 Subject: [ofa-general] Re: [PATCH] osm: Converting the the C++ code to C inosm_ucast_lash.c In-Reply-To: <20070307131846.GF30927@mellanox.co.il> References: <45ED90C4.60204@dev.mellanox.co.il> <20070306185613.GD16562@mellanox.co.il> <45EE7629.4070706@dev.mellanox.co.il> <20070307084058.GH31276@mellanox.co.il> <1173268000.4546.426357.camel@hal.voltaire.com> <45EEAC33.2080200@dev.mellanox.co.il> <1173273156.4546.431372.camel@hal.voltaire.com> <20070307131846.GF30927@mellanox.co.il> Message-ID: <1173274038.4546.432186.camel@hal.voltaire.com> On Wed, 2007-03-07 at 08:18, Michael S. Tsirkin wrote: > > Quoting Hal Rosenstock : > > Subject: Re: [PATCH] osm: Converting the the C++ code to C inosm_ucast_lash.c > > > > On Wed, 2007-03-07 at 07:12, Yevgeny Kliteynik wrote: > > > Hal Rosenstock wrote: > > > > On Wed, 2007-03-07 at 03:40, Michael S. Tsirkin wrote: > > > >>> Quoting Yevgeny Kliteynik : > > > >>> Subject: Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c > > > >>> > > > >>> > > > >>> Michael S. Tsirkin wrote: > > > >>>>> Hi Hal. > > > >>>>> > > > >>>>> Converting the the C++ code to C. > > > >>>>> > > > >>>>> Please apply both to trunk and to 1.2 > > > >>>>> > > > >>>>> Thanks. > > > >>>>> > > > >>>>> Signed-off-by: Yevgeny Kliteynik > > > >>>> NAK. > > > >>>> 1. I don't see any C++ here. > > > >>>> > > > >>>> 2. Why do we need this on ofed branch? > > > >>>> Only bugfixes should go there. What bug does it fix? > > > >>> There are 3 things in this patch: > > > >>> 1. int i -> uint16_t i > > > >>> 2. Moving variable declaration (switch_bitmap) to the beginning > > > >>> of the function (currently, it is declared after OSM_LOG_ENTER) > > > >>> 3. Changing C99 dynamically allocated array to the old style. > > > >>> > > > >>> First two can be categorized as bugs. > > > >>> > > > >>> The third one is for compiler on windows. > > > >>> > > > >>> Each of these elements breaks OSM compilation on Windows. > > > >>> > > > >>> If we don't include either of these, then OFED 1.2 OpenSM compilation > > > >>> on windows will be broken. > > > >> Ultimately, whether to merge this this and where is up to the maintainer. But I > > > >> note that OFED 1.2 goals do not include windows builds. > > > > > > > > While not a formal OFED 1.2 goal, doesn't this depend on whether there > > > > is intended to be a Windows equivalent to the OFED 1.2 OpenSM ? > > > > > > I'm not aware of any plans for windows equivalent to the OFED 1.2 OpenSM, > > > > Should there be ? > > Isn't that what we need to know to decide whether to merge this patch? to OFED 1.2, yes. > > master may be less stable and certainly is likely to > > be less tested than OFED 1.2 at any point in time. > > I guess openib-windows guys will be able to branch off from ofed 1.2 branch > if they like. But even if you fix compilation issues on ofed 1.2 now, it's unlikely > a windows release won't include other changes as compared to the linux one. I'm not sure what changes you are referring to but I would think the more tested the base is, the easier this is and fewer changes are involved. > So why bother? You're right that it's probably not worth the effort. -- Hal From kliteyn at dev.mellanox.co.il Wed Mar 7 05:41:52 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 07 Mar 2007 15:41:52 +0200 Subject: [ofa-general] Re: [PATCH] osm: Converting the the C++ code to C inosm_ucast_lash.c In-Reply-To: <1173274038.4546.432186.camel@hal.voltaire.com> References: <45ED90C4.60204@dev.mellanox.co.il> <20070306185613.GD16562@mellanox.co.il> <45EE7629.4070706@dev.mellanox.co.il> <20070307084058.GH31276@mellanox.co.il> <1173268000.4546.426357.camel@hal.voltaire.com> <45EEAC33.2080200@dev.mellanox.co.il> <1173273156.4546.431372.camel@hal.voltaire.com> <20070307131846.GF30927@mellanox.co.il> <1173274038.4546.432186.camel@hal.voltaire.com> Message-ID: <45EEC120.2020109@dev.mellanox.co.il> Hal Rosenstock wrote: > On Wed, 2007-03-07 at 08:18, Michael S. Tsirkin wrote: >>> Quoting Hal Rosenstock : >>> Subject: Re: [PATCH] osm: Converting the the C++ code to C inosm_ucast_lash.c >>> >>> On Wed, 2007-03-07 at 07:12, Yevgeny Kliteynik wrote: >>>> Hal Rosenstock wrote: >>>>> On Wed, 2007-03-07 at 03:40, Michael S. Tsirkin wrote: >>>>>>> Quoting Yevgeny Kliteynik : >>>>>>> Subject: Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c >>>>>>> >>>>>>> >>>>>>> Michael S. Tsirkin wrote: >>>>>>>>> Hi Hal. >>>>>>>>> >>>>>>>>> Converting the the C++ code to C. >>>>>>>>> >>>>>>>>> Please apply both to trunk and to 1.2 >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> Signed-off-by: Yevgeny Kliteynik >>>>>>>> NAK. >>>>>>>> 1. I don't see any C++ here. >>>>>>>> >>>>>>>> 2. Why do we need this on ofed branch? >>>>>>>> Only bugfixes should go there. What bug does it fix? >>>>>>> There are 3 things in this patch: >>>>>>> 1. int i -> uint16_t i >>>>>>> 2. Moving variable declaration (switch_bitmap) to the beginning >>>>>>> of the function (currently, it is declared after OSM_LOG_ENTER) >>>>>>> 3. Changing C99 dynamically allocated array to the old style. >>>>>>> >>>>>>> First two can be categorized as bugs. >>>>>>> >>>>>>> The third one is for compiler on windows. >>>>>>> >>>>>>> Each of these elements breaks OSM compilation on Windows. >>>>>>> >>>>>>> If we don't include either of these, then OFED 1.2 OpenSM compilation >>>>>>> on windows will be broken. >>>>>> Ultimately, whether to merge this this and where is up to the maintainer. But I >>>>>> note that OFED 1.2 goals do not include windows builds. >>>>> While not a formal OFED 1.2 goal, doesn't this depend on whether there >>>>> is intended to be a Windows equivalent to the OFED 1.2 OpenSM ? >>>> I'm not aware of any plans for windows equivalent to the OFED 1.2 OpenSM, >>> Should there be ? >> Isn't that what we need to know to decide whether to merge this patch? > > to OFED 1.2, yes. > >>> master may be less stable and certainly is likely to >>> be less tested than OFED 1.2 at any point in time. >> I guess openib-windows guys will be able to branch off from ofed 1.2 branch >> if they like. But even if you fix compilation issues on ofed 1.2 now, it's unlikely >> a windows release won't include other changes as compared to the linux one. > > I'm not sure what changes you are referring to but I would think the > more tested the base is, the easier this is and fewer changes are > involved. > >> So why bother? > > You're right that it's probably not worth the effort. I checked it with the Windows team - they don't have any intention to use OFED 1.2 OpenSM. -- Yevgeny > -- Hal > From Koen.SEGERS at VRT.BE Wed Mar 7 06:59:01 2007 From: Koen.SEGERS at VRT.BE (SEGERS Koen) Date: Wed, 7 Mar 2007 15:59:01 +0100 Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/or VERBS Message-ID: Hi all! We are trying to bond two ports on 1 HCA so that we are able aggregate the throughput. We are also interested in bonding ports of different HCA's. Is this possible with the OFED driver? If so, can you give the command? We know TopSpin has support for this feature. Sadly, Topspin has no driver that runs on our system (SLES 10). We currently installed OFED-1.2 of 20070306 and the stable OFED-1.1 driver, but we can't figure out how this bonding is started in either versions. It is important that we offload the bonding. We don't want to use the standard linux bonding. That is why we think that bonding over different HCA's is not going to work. Is this assumption correct? Is bonding possible when running SDP? And VERBS? Greetings Koen *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Wed Mar 7 07:50:55 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Mar 2007 10:50:55 -0500 Subject: [ofa-general] Re: [PATCHv2] osm: Converting the the C99 code to C in osm_ucast_lash.c In-Reply-To: <45EE8D78.50702@dev.mellanox.co.il> References: <45EE8D78.50702@dev.mellanox.co.il> Message-ID: <1173282652.4546.438793.camel@hal.voltaire.com> On Wed, 2007-03-07 at 05:01, Yevgeny Kliteynik wrote: > Hi Hal > > V2 of this patch - trivial fix of data type and converting C99 > code for compilation on windows. > > Please apply to trunk only. > > Thanks > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied (to trunk only). -- Hal From dotanb at dev.mellanox.co.il Wed Mar 7 07:57:53 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Wed, 07 Mar 2007 17:57:53 +0200 Subject: [ofa-general] [mstflint] fix warnings that were reported by sparse Message-ID: <1173283074.9453.1.camel@mtldesk014.lab.mtl.com> Fixed warnings that were reported by sparse. Signed-off-by: Dotan Barak --- Index: gen2_devel_user/src/userspace/mstflint/mtcr.h =================================================================== --- gen2_devel_user.orig/src/userspace/mstflint/mtcr.h 2007-02-18 16:41:19.000000000 +0200 +++ gen2_devel_user/src/userspace/mstflint/mtcr.h 2007-02-19 10:37:37.000000000 +0200 @@ -335,7 +335,7 @@ mfile *mopen(const char *name) char file_name[]="/proc/bus/pci/0000:00/00.0"; mf=(mfile*)malloc(sizeof(mfile)); - if (!mf) return 0; + if (!mf) return NULL; //If device name starts with /proc/bus/pci we'll use configuration cycles if (!strncmp(name,"/proc/bus/pci/",strlen("/proc/bus/pci/"))) @@ -449,7 +449,7 @@ open_failed: #if CONFIG_ENABLE_MMAP find_failed: #endif - return 0; + return NULL; } #endif From bugzilla-daemon at lists.openfabrics.org Wed Mar 7 08:36:40 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 7 Mar 2007 08:36:40 -0800 (PST) Subject: [ofa-general] [Bug 417] can't unload OFED 1.2 drivers on SLES10 x86_64 In-Reply-To: Message-ID: <20070307163640.69E3DE6038C@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=417 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |tziporet at mellanox.co.il ------- Comment #1 from sweitzen at cisco.com 2007-03-07 08:36 ------- Tziporet, can you please assign to someone? -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From jlentini at netapp.com Wed Mar 7 08:45:57 2007 From: jlentini at netapp.com (James Lentini) Date: Wed, 7 Mar 2007 11:45:57 -0500 (EST) Subject: [ofa-general] IB switches: managed or not? In-Reply-To: <20070305144129.657516a3.weiny2@llnl.gov> References: <20070305144129.657516a3.weiny2@llnl.gov> Message-ID: On Mon, 5 Mar 2007, Ira Weiny wrote: > > On Tue, 6 Mar 2007 08:47:00 +1100 > "Andrew Robbie (GMail)" wrote: > > > My other query is about diagnostic software. With an ethernet switch it is > > pretty easy to fire up Ethereal (sorry Wireshark, but it is such a silly > > name) or Etherape and get a look at what is going on. If I buy a Cisco or > > Voltaire etc do they come with tools that let me get accurate > > representations of what is going on? Or are their tools really for large IB > > networks? > > Their are some issues with diags. However, I don't know of any product from > any vendor which captures on the wire packets like Wireshark. (But I could be > wrong and would love to know about it if it was out there.) CATC, now part of LeCroy, used to sell 1X and 4X InfiniBand network analyzers. They stopped selling these a few years ago, but they are now advertising them again on the LeCroy website: http://www.lecroy.com/tm/products/ProtocolAnalyzers/infiniband.asp?menuid=62 They may be selling these again. From mst at mellanox.co.il Wed Mar 7 09:57:24 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Mar 2007 19:57:24 +0200 Subject: [ofa-general] Re: infiniband bonding/merging/aggregation with SDP and/orVERBS In-Reply-To: References: Message-ID: <20070307175724.GC8303@mellanox.co.il> > Is bonding possible when running SDP? And VERBS? SDP does not seem to support bonding without protocol extensions. Verbs is an API, not a protocol, so you can implement bonding on top of it if you wish. -- MST From mst at mellanox.co.il Wed Mar 7 10:10:41 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Mar 2007 20:10:41 +0200 Subject: [ofa-general] Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy In-Reply-To: References: Message-ID: <20070307181041.GD8303@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy > > What do you think of something like this, plus merging the async event > and command interface EQs? Looks good, except that spin_lock/spin_unlock need a lock pointer. Here's a (compile tested only) patch for merging async/command queues. commit a0bc6fd18d8a1918a251576537350a369fb902ed Author: Michael S. Tsirkin Date: Wed Mar 7 17:47:30 2007 +0200 Merge CMD and ASYNC EQs Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h index b7e42ef..a87903f 100644 --- a/drivers/infiniband/hw/mthca/mthca_dev.h +++ b/drivers/infiniband/hw/mthca/mthca_dev.h @@ -93,9 +93,8 @@ enum { }; enum { - MTHCA_EQ_CMD, - MTHCA_EQ_ASYNC, MTHCA_EQ_COMP, + MTHCA_EQ_ASYNC, MTHCA_NUM_EQ }; diff --git a/drivers/infiniband/hw/mthca/mthca_eq.c b/drivers/infiniband/hw/mthca/mthca_eq.c index 8ec9fa1..f7a41b8 100644 --- a/drivers/infiniband/hw/mthca/mthca_eq.c +++ b/drivers/infiniband/hw/mthca/mthca_eq.c @@ -110,11 +110,11 @@ enum { (1ULL << MTHCA_EVENT_TYPE_WQ_ACCESS_ERROR) | \ (1ULL << MTHCA_EVENT_TYPE_LOCAL_CATAS_ERROR) | \ (1ULL << MTHCA_EVENT_TYPE_PORT_CHANGE) | \ - (1ULL << MTHCA_EVENT_TYPE_ECC_DETECT)) + (1ULL << MTHCA_EVENT_TYPE_ECC_DETECT)) | \ + (1ULL << MTHCA_EVENT_TYPE_CMD) #define MTHCA_SRQ_EVENT_MASK ((1ULL << MTHCA_EVENT_TYPE_SRQ_CATAS_ERROR) | \ (1ULL << MTHCA_EVENT_TYPE_SRQ_QP_LAST_WQE) | \ (1ULL << MTHCA_EVENT_TYPE_SRQ_LIMIT)) -#define MTHCA_CMD_EVENT_MASK (1ULL << MTHCA_EVENT_TYPE_CMD) #define MTHCA_EQ_DB_INC_CI (1 << 24) #define MTHCA_EQ_DB_REQ_NOT (2 << 24) @@ -863,23 +863,17 @@ int mthca_init_eq_table(struct mthca_dev *dev) if (err) goto err_out_unmap; - err = mthca_create_eq(dev, MTHCA_NUM_ASYNC_EQE + MTHCA_NUM_SPARE_EQE, + err = mthca_create_eq(dev, MTHCA_NUM_CMD_EQE + MTHCA_NUM_ASYNC_EQE + + MTHCA_NUM_SPARE_EQE, (dev->mthca_flags & MTHCA_FLAG_MSI_X) ? 129 : intr, &dev->eq_table.eq[MTHCA_EQ_ASYNC]); if (err) goto err_out_comp; - err = mthca_create_eq(dev, MTHCA_NUM_CMD_EQE + MTHCA_NUM_SPARE_EQE, - (dev->mthca_flags & MTHCA_FLAG_MSI_X) ? 130 : intr, - &dev->eq_table.eq[MTHCA_EQ_CMD]); - if (err) - goto err_out_async; - if (dev->mthca_flags & MTHCA_FLAG_MSI_X) { static const char *eq_name[] = { [MTHCA_EQ_COMP] = DRV_NAME " (comp)", [MTHCA_EQ_ASYNC] = DRV_NAME " (async)", - [MTHCA_EQ_CMD] = DRV_NAME " (cmd)" }; for (i = 0; i < MTHCA_NUM_EQ; ++i) { @@ -889,7 +883,7 @@ int mthca_init_eq_table(struct mthca_dev *dev) mthca_tavor_msi_x_interrupt, 0, eq_name[i], dev->eq_table.eq + i); if (err) - goto err_out_cmd; + goto err_out_async; dev->eq_table.eq[i].have_irq = 1; } } else { @@ -899,7 +893,7 @@ int mthca_init_eq_table(struct mthca_dev *dev) mthca_tavor_interrupt, IRQF_SHARED, DRV_NAME, dev); if (err) - goto err_out_cmd; + goto err_out_async; dev->eq_table.have_irq = 1; } @@ -912,15 +906,6 @@ int mthca_init_eq_table(struct mthca_dev *dev) mthca_warn(dev, "MAP_EQ for async EQ %d returned status 0x%02x\n", dev->eq_table.eq[MTHCA_EQ_ASYNC].eqn, status); - err = mthca_MAP_EQ(dev, MTHCA_CMD_EVENT_MASK, - 0, dev->eq_table.eq[MTHCA_EQ_CMD].eqn, &status); - if (err) - mthca_warn(dev, "MAP_EQ for cmd EQ %d failed (%d)\n", - dev->eq_table.eq[MTHCA_EQ_CMD].eqn, err); - if (status) - mthca_warn(dev, "MAP_EQ for cmd EQ %d returned status 0x%02x\n", - dev->eq_table.eq[MTHCA_EQ_CMD].eqn, status); - for (i = 0; i < MTHCA_NUM_EQ; ++i) if (mthca_is_memfree(dev)) arbel_eq_req_not(dev, dev->eq_table.eq[i].eqn_mask); @@ -929,11 +914,8 @@ int mthca_init_eq_table(struct mthca_dev *dev) return 0; -err_out_cmd: - mthca_free_irqs(dev); - mthca_free_eq(dev, &dev->eq_table.eq[MTHCA_EQ_CMD]); - err_out_async: + mthca_free_irqs(dev); mthca_free_eq(dev, &dev->eq_table.eq[MTHCA_EQ_ASYNC]); err_out_comp: @@ -956,8 +938,6 @@ void mthca_cleanup_eq_table(struct mthca_dev *dev) mthca_MAP_EQ(dev, async_mask(dev), 1, dev->eq_table.eq[MTHCA_EQ_ASYNC].eqn, &status); - mthca_MAP_EQ(dev, MTHCA_CMD_EVENT_MASK, - 1, dev->eq_table.eq[MTHCA_EQ_CMD].eqn, &status); for (i = 0; i < MTHCA_NUM_EQ; ++i) mthca_free_eq(dev, &dev->eq_table.eq[i]); diff --git a/drivers/infiniband/hw/mthca/mthca_main.c b/drivers/infiniband/hw/mthca/mthca_main.c index 0d9b7d0..5bfef62 100644 --- a/drivers/infiniband/hw/mthca/mthca_main.c +++ b/drivers/infiniband/hw/mthca/mthca_main.c @@ -835,7 +835,7 @@ static int mthca_setup_hca(struct mthca_dev *dev) if (err || status) { mthca_err(dev, "NOP command failed to generate interrupt (IRQ %d), aborting.\n", dev->mthca_flags & MTHCA_FLAG_MSI_X ? - dev->eq_table.eq[MTHCA_EQ_CMD].msi_x_vector : + dev->eq_table.eq[MTHCA_EQ_ASYNC].msi_x_vector : dev->pdev->irq); if (dev->mthca_flags & (MTHCA_FLAG_MSI | MTHCA_FLAG_MSI_X)) mthca_err(dev, "Try again with MSI/MSI-X disabled.\n"); @@ -976,12 +976,11 @@ static void mthca_release_regions(struct pci_dev *pdev, static int mthca_enable_msi_x(struct mthca_dev *mdev) { - struct msix_entry entries[3]; + struct msix_entry entries[2]; int err; entries[0].entry = 0; entries[1].entry = 1; - entries[2].entry = 2; err = pci_enable_msix(mdev->pdev, entries, ARRAY_SIZE(entries)); if (err) { @@ -993,7 +992,6 @@ static int mthca_enable_msi_x(struct mthca_dev *mdev) mdev->eq_table.eq[MTHCA_EQ_COMP ].msi_x_vector = entries[0].vector; mdev->eq_table.eq[MTHCA_EQ_ASYNC].msi_x_vector = entries[1].vector; - mdev->eq_table.eq[MTHCA_EQ_CMD ].msi_x_vector = entries[2].vector; return 0; } -- MST From rdreier at cisco.com Wed Mar 7 10:17:56 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 07 Mar 2007 10:17:56 -0800 Subject: [ofa-general] Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy In-Reply-To: <20070307181041.GD8303@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 7 Mar 2007 20:10:41 +0200") References: <20070307181041.GD8303@mellanox.co.il> Message-ID: > Looks good, except that spin_lock/spin_unlock need a lock pointer. Yep, caught that as soon as I tried to build it... > Here's a (compile tested only) patch for merging async/command queues. OK, can you test that and my patch with the test that started this thread? If it looks good we can merge it for 2.6.21. thanks... - R. From sean.hefty at intel.com Wed Mar 7 10:48:32 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 7 Mar 2007 10:48:32 -0800 Subject: [ofa-general] [PATCH] ib_sa: set src_path_bits correctly in ib_init_ah_from_path Message-ID: <000001c760e9$34155c10$8698070a@amr.corp.intel.com> The src_path_bits needs to mask off the base LID value. Signed-off-by: Sean Hefty --- Here's a first cut at setting the src_path_bits correctly. I don't think that this is important enough to push for 2.6.21, but if it looks okay I can queue it until 2.6.22 starts up. diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 68db633..9a7eaad 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -57,6 +57,7 @@ MODULE_LICENSE("Dual BSD/GPL"); struct ib_sa_sm_ah { struct ib_ah *ah; struct kref ref; + u8 src_path_mask; }; struct ib_sa_port { @@ -380,6 +381,7 @@ static void update_sm_ah(struct work_struct *work) } kref_init(&new_ah->ref); + new_ah->src_path_mask = (1 << port_attr.lmc) - 1; memset(&ah_attr, 0, sizeof ah_attr); ah_attr.dlid = port_attr.sm_lid; @@ -460,6 +462,25 @@ void ib_sa_cancel_query(int id, struct ib_sa_query *query) } EXPORT_SYMBOL(ib_sa_cancel_query); +static u8 get_src_path_mask(struct ib_device *device, u8 port_num) +{ + struct ib_sa_device *sa_dev; + struct ib_sa_port *port; + unsigned long flags; + u8 src_path_mask; + + sa_dev = ib_get_client_data(device, &sa_client); + if (!sa_dev) + return 0x7f; + + port = &sa_dev->port[port_num - sa_dev->start_port]; + spin_lock_irqsave(&port->ah_lock, flags); + src_path_mask = port->sm_ah ? port->sm_ah->src_path_mask : 0x7f; + spin_unlock_irqrestore(&port->ah_lock, flags); + + return src_path_mask; +} + int ib_init_ah_from_path(struct ib_device *device, u8 port_num, struct ib_sa_path_rec *rec, struct ib_ah_attr *ah_attr) { @@ -469,7 +490,8 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num, memset(ah_attr, 0, sizeof *ah_attr); ah_attr->dlid = be16_to_cpu(rec->dlid); ah_attr->sl = rec->sl; - ah_attr->src_path_bits = be16_to_cpu(rec->slid) & 0x7f; + ah_attr->src_path_bits = be16_to_cpu(rec->slid) & + get_src_path_mask(device, port_num); ah_attr->port_num = port_num; ah_attr->static_rate = rec->rate; From mst at mellanox.co.il Wed Mar 7 11:55:53 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Mar 2007 21:55:53 +0200 Subject: [ofa-general] Re: OFED 1.2 beta blocking bugs In-Reply-To: References: Message-ID: <20070307195553.GB9817@mellanox.co.il> > I've been testing OFED-1.2-20070306-0807 today, and here's a current list of > bugs I'd like fixed for OFED 1.2 beta. Sean there's a critical bug related to multicast module, and a blocker bug related to ucma module assigned to you. Are you looking at these issues? Blocker at least needs to be addressed before beta. -- MST From Thomas.Talpey at netapp.com Wed Mar 7 12:04:04 2007 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Wed, 07 Mar 2007 15:04:04 -0500 Subject: [ofa-general] ipoib performance (and xplot) In-Reply-To: <200703062008.l26K80EK019312@cmf.nrl.navy.mil> References: <200703062008.l26K80EK019312@cmf.nrl.navy.mil> Message-ID: Interesting data. What app are you using to generate the TCP flow, and what options are you using on it? Also, what are the scales on the x- and y-axes (seconds and decimal kilobytes)? I have some comments but they are only speculation without knowing this. By "800Mb/s" do you megabytes or megabits? For ipoib, 800 megabytes/s (MB/s) seems very high and 800 megabits/s (Mb/s) seems very low. In my experience it gets 200-300 megabytes before running out of cpu (checksum calculations mainly). But I haven't looked at it in a while. Tom. At 03:07 PM 3/6/2007, chas williams - CONTRACTOR wrote: >while looking at some ipoib performance i had a chance to graph the >tcp flow in xplot (see http://www.xplot.org/). the graph appears very >strange and is attached to this message. > >the lower solid line represent acks coming back from the tcp server, the >up line represent the window size (i disabled window scaling btw, this >doesnt affect performance). the up/down arrows (they look like diamonds >due to scale) represent packets. this is a view from the tcp client. > >the initial part of the graph is tcp slow-start/congestion. what is >curious to how the returning acks (after slow start is finished) seem >to get quiet periodically. then the next ack that returns, then acks >the entire window. this seems to be leading to a very bursty behavior. > >i would normally expect to see two data packets followed by an >ack as can be seen between the 'burst' regions. > >i see aboue 800Mb/s (good put) between the hosts which i understand >to get typical for ipoib. there dont appear to be any link errors >either. > >so why the long pauses? > >ftp://ftp.cmf.nrl.navy.mil/pub/chas/ipoib.jpg >_______________________________________________ >general mailing list >general at lists.openfabrics.org >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From jsquyres at cisco.com Wed Mar 7 12:08:22 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 7 Mar 2007 15:08:22 -0500 Subject: [ofa-general] MPI question for OFED 1.2 Message-ID: <249F3286-C3CC-4604-9729-990438FE9C45@cisco.com> When both the 32 and 64 bit versions of libibverbs are present, which should the MPI build against? What do MVAPICH / MVAPICH2? -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From mshefty at ichips.intel.com Wed Mar 7 13:35:59 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Mar 2007 13:35:59 -0800 Subject: [ofa-general] Re: OFED 1.2 beta blocking bugs In-Reply-To: <20070307195553.GB9817@mellanox.co.il> References: <20070307195553.GB9817@mellanox.co.il> Message-ID: <45EF303F.5090607@ichips.intel.com> > Sean there's a critical bug related to multicast module, > and a blocker bug related to ucma module assigned to you. Not sure who assigned me the bugs, but I did look at them and added comments. The multicast bug appears to be related to the ipoib HA, rather than the multicast module. I haven't done anything with backport patches, and I'm not sure who has. - Sean From sweitzen at cisco.com Wed Mar 7 13:51:15 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 7 Mar 2007 13:51:15 -0800 Subject: [ofa-general] RE: OFED 1.2 beta blocking bugs In-Reply-To: <45EF303F.5090607@ichips.intel.com> References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> Message-ID: If you look at the Bug Activity, you can see who assigned them to you. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Wednesday, March 07, 2007 1:36 PM > To: Michael S. Tsirkin > Cc: Scott Weitzenkamp (sweitzen); Openfabrics-ewg at openib.org; > vlad at mellanox.co.il; Tziporet Koren; Pavel Shamis (Pasha); > Jeff Squyres (jsquyres); Shaun Rowland; Woodruff, Robert J; OPENIB > Subject: Re: OFED 1.2 beta blocking bugs > > > Sean there's a critical bug related to multicast module, > > and a blocker bug related to ucma module assigned to you. > > Not sure who assigned me the bugs, but I did look at them and > added comments. > > The multicast bug appears to be related to the ipoib HA, > rather than the > multicast module. > > I haven't done anything with backport patches, and I'm not > sure who has. > > - Sean > From arlin.r.davis at intel.com Wed Mar 7 14:29:11 2007 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 7 Mar 2007 14:29:11 -0800 Subject: [ofa-general] RE: [Bug 408] dapltest compilation fails on x86 [PATCH] In-Reply-To: <20070306122340.24C7FE607F6@openfabrics.org> Message-ID: <000001c76108$07829720$4297070a@amr.corp.intel.com> James, please review this fix for dapltest build issue. Signed-off by: Arlin Davis ardavis at ichips.intel.com diff --git a/test/dapltest/mdep/linux/dapl_mdep_user.h b/test/dapltest/mdep/linux/dapl_mdep_user.h index c05dd30..2903e78 100644 --- a/test/dapltest/mdep/linux/dapl_mdep_user.h +++ b/test/dapltest/mdep/linux/dapl_mdep_user.h @@ -117,7 +117,7 @@ typedef unsigned long long int DT_Mdep_TimeStamp; static _INLINE_ DT_Mdep_TimeStamp DT_Mdep_GetTimeStamp ( void ) { -#if defined(__GNUC__) && defined(__PENTIUM__) +#if defined(__GNUC__) && defined(__i386__) DT_Mdep_TimeStamp x; __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x)); return x; @@ -143,7 +143,7 @@ DT_Mdep_GetTimeStamp ( void ) asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); return ((unsigned long)__a) | (((unsigned long)__d)<<32); #else -#error "Non-Pentium and Non-PPC Linux - unimplemented" +#error "Linux CPU architecture - unimplemented" #endif #endif #endif From arlin.r.davis at intel.com Wed Mar 7 14:31:48 2007 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 7 Mar 2007 14:31:48 -0800 Subject: [ofa-general] [PATCH] uDAPL specfile cleanup Message-ID: <000101c76108$64e79640$4297070a@amr.corp.intel.com> Updated the libdat.spec.in file to build dapl RPM's correctly. Signed-off by: Arlin Davis ardavis at ichips.intel.com diff --git a/libdat.spec.in b/libdat.spec.in index 4cb1bdc..bcd78ad 100644 --- a/libdat.spec.in +++ b/libdat.spec.in @@ -31,19 +31,20 @@ # # $Id: $ -%define ver 1.2.0 -%define RELEASE 1.2 +%define ver 1.2.1 +%define RELEASE pre %define rel %{?CUSTOM_RELEASE} %{!?CUSTOM_RELEASE:%RELEASE} Summary: Userspace DAT and DAPL API. Name: dapl Version: %ver -Release: %rel +Release: %rel%{?dist} + License: Dual GPL/BSD/CPL Group: System Environment/Libraries BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) -Source: http://openib.org/downloads/%{name}-%{version}.tar.gz -Url: http://openib.org/ +Source: http://openfabrics.org/~ardavis/%{name}-%{version}-%{release}.tgz +Url: http://openfabrics.org/ %description Along with the OpenIB kernel drivers, libdat and libdapl provides a userspace @@ -57,10 +58,19 @@ Requires: %{name} = %{version}-%{release} %description devel Static libraries and header files for the libdat and libdapl library. +%package utils +Summary: Test suites for uDAPL library +Group: System Environment/Libraries +Requires: %{name} = %{version}-%{release} + +%description utils +Useful test suites to validate uDAPL library API's. + %prep -%setup -q -n %{name}-%{ver} +%setup -q -n %{name} %build +./autogen.sh %configure make @@ -79,7 +89,7 @@ rm -rf $RPM_BUILD_ROOT %defattr(-,root,root) %{_libdir}/libda*.so.* %{_sysconfdir}/dat.conf -%doc AUTHORS COPYING ChangeLog NEWS README +%doc AUTHORS README %files devel %defattr(-,root,root,-) @@ -96,3 +106,18 @@ rm -rf $RPM_BUILD_ROOT %{_includedir}/dat/udat_redirection.h %{_includedir}/dat/udat_vendor_specific.h %{_sysconfdir}/dat.conf + +%files utils +%defattr(-,root,root,-) +%{_bindir}/* +%{_mandir}/man1/* + +%changelog +* Wed Mar 7 2007 Arlin Davis - 1.2.1.pre +- OFED 1.2-alpha, added dtest and dapltest utilies to package + +* Fri Oct 20 2006 Arlin Davis - 1.2.0 +- OFED 1.1, + +* Wed May 31 2006 Arlin Davis - 1.2.0 +- OFED 1.0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Wed Mar 7 14:57:38 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 07 Mar 2007 16:57:38 -0600 Subject: [ofa-general] possible mvapich2 problem Message-ID: <1173308258.14835.86.camel@stevo-desktop> Hey Shaun, I have a MPI test program that is detecting a buffer corruption when run on mvapich2-0.9.8-5. The same program works on mvapich2-0.9.8-4. The corruption happens over IB as well as iWARP on alpha libs and a recent set of kernel modules from ofa 1.2. At this point in this (complicated) test, all ranks enter into a MPI_Bcast(). The root rank, who is sending the data, checksums a bit of the data buffer before entering MPI_Bcast(), and afterwards if there was no error to validate that the data wasn't corrupted in the send buffer. The buffer checksum differs after the bcast. So somehow the data in the buffer was altered presumably by the MPI layer (but I don't know that yet). Have ya'll seen this problem? Maybe it was fixed in -6? I'm going to try and reduce this to a simple test, but I wanted to see if this is a known mvapich2 problem with the 0.9.8-5 release. Steve. From sweitzen at cisco.com Wed Mar 7 15:01:19 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 7 Mar 2007 15:01:19 -0800 Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS In-Reply-To: References: Message-ID: I have not tried the OFED 1.2 IPoIB bonding kernel driver, and can only speak for the userspace IPoIB HA ipoib_ha.pl script. Both Topspin IPoIB and OFED IPoIB have active/passive IPoIB high availability, neither can aggregate IPoIB throughput, and neither has SDP high availability. We will have Tosppin driver SLES10 drivers in beta soon, let me know if you are interested. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of SEGERS Koen Sent: Wednesday, March 07, 2007 6:59 AM To: general at lists.openfabrics.org Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS Hi all! We are trying to bond two ports on 1 HCA so that we are able aggregate the throughput. We are also interested in bonding ports of different HCA's. Is this possible with the OFED driver? If so, can you give the command? We know TopSpin has support for this feature. Sadly, Topspin has no driver that runs on our system (SLES 10). We currently installed OFED-1.2 of 20070306 and the stable OFED-1.1 driver, but we can't figure out how this bonding is started in either versions. It is important that we offload the bonding. We don't want to use the standard linux bonding. That is why we think that bonding over different HCA's is not going to work. Is this assumption correct? Is bonding possible when running SDP? And VERBS? Greetings Koen *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Wed Mar 7 15:23:19 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 07 Mar 2007 17:23:19 -0600 Subject: [ofa-general] possible mvapich2 problem In-Reply-To: <1173308258.14835.86.camel@stevo-desktop> References: <1173308258.14835.86.camel@stevo-desktop> Message-ID: <1173309799.14835.90.camel@stevo-desktop> Um, Please ignore this email. This is a test case problem... :-\ Steve. On Wed, 2007-03-07 at 16:57 -0600, Steve Wise wrote: > Hey Shaun, > > I have a MPI test program that is detecting a buffer corruption when run > on mvapich2-0.9.8-5. The same program works on mvapich2-0.9.8-4. The > corruption happens over IB as well as iWARP on alpha libs and a recent > set of kernel modules from ofa 1.2. > > At this point in this (complicated) test, all ranks enter into a > MPI_Bcast(). The root rank, who is sending the data, checksums a bit of > the data buffer before entering MPI_Bcast(), and afterwards if there was > no error to validate that the data wasn't corrupted in the send buffer. > The buffer checksum differs after the bcast. So somehow the data in the > buffer was altered presumably by the MPI layer (but I don't know that > yet). > > Have ya'll seen this problem? Maybe it was fixed in -6? I'm going to > try and reduce this to a simple test, but I wanted to see if this is a > known mvapich2 problem with the 0.9.8-5 release. > > Steve. > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From greg.lindahl at qlogic.com Wed Mar 7 15:41:23 2007 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Wed, 7 Mar 2007 15:41:23 -0800 Subject: [ofa-general] IB switches: managed or not? In-Reply-To: <20070305144129.657516a3.weiny2@llnl.gov> References: <20070305144129.657516a3.weiny2@llnl.gov> Message-ID: <20070307234123.GA6610@localhost.localdomain> On Mon, Mar 05, 2007 at 02:41:29PM -0800, Ira Weiny wrote: > Their are some issues with diags. However, I don't know of any product from > any vendor which captures on the wire packets like Wireshark. (But I could be > wrong and would love to know about it if it was out there.) It's trivial to capture InfiniBand packets off the wire by hacking up the InfiniPath driver a little. Analyzing it (at line rate) is left as an exercise for the reader. ;-) And, of course, that doesn't capture all the things that go outside packets. -- greg From sweitzen at cisco.com Wed Mar 7 16:56:52 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 7 Mar 2007 16:56:52 -0800 Subject: [ofa-general] MPI question for OFED 1.2 In-Reply-To: <249F3286-C3CC-4604-9729-990438FE9C45@cisco.com> References: <249F3286-C3CC-4604-9729-990438FE9C45@cisco.com> Message-ID: MVAPICH and MVAPICH2 use 64-bit, and Open MPI should too. MVAPICH and MVAPICH2 only support 64-bit linking in this environment (they don't build 32-bit MPI libs), at least they way they compile from OFED install.sh. [releng at svbu-qa1850-3 tmp]$ cat /etc/SuSE-release SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10 MVAPICH [releng at svbu-qa1850-3 tmp]$ /usr/local/ofed/mpi/gcc/mvapich-0.9.9/bin/mpicc -o osu_latency.mvapich osu_latency.c [releng at svbu-qa1850-3 tmp]$ ldd osu_latency.mvapich libmpich.so.1.0 => not found libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002b0a2e0d 7000) libibumad.so.1 => /usr/local/ofed/lib64/libibumad.so.1 (0x00002b0a2e1e20 00) libibcommon.so.1 => /usr/local/ofed/lib64/libibcommon.so.1 (0x00002b0a2e 2ed000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b0a2e3f0000) librt.so.1 => /lib64/librt.so.1 (0x00002b0a2e506000) libc.so.6 => /lib64/libc.so.6 (0x00002b0a2e610000) libdl.so.2 => /lib64/libdl.so.2 (0x00002b0a2e840000) /lib64/ld-linux-x86-64.so.2 (0x00002b0a2dfba000) [releng at svbu-qa1850-3 tmp]$ /usr/local/ofed/mpi/gcc/mvapich-0.9.9/bin/mpicc -m32 -o osu_latency.mvapich.32 osu_latency.c /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin /ld: ski pping incompatible /usr/local/ofed/mpi/gcc/mvapich-0.9.9/lib/shared/libmpich.so when searching for -lmpich /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin /ld: ski pping incompatible /usr/local/ofed/mpi/gcc/mvapich-0.9.9/lib/libmpich.a when sea rching for -lmpich /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin /ld: can not find -lmpich collect2: ld returned 1 exit status MVAPICH2: [releng at svbu-qa1850-3 tmp]$ /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-7/bin/mpicc -o osu_latency.mvapich2 osu_latency.c [releng at svbu-qa1850-3 tmp]$ ldd osu_latency.mvapich2 libmpich.so => /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-7/lib/libmpich.so (0x00002abaef60e000) librdmacm.so => /usr/local/ofed/lib64/librdmacm.so (0x00002abaef8bb000) libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002abaef9c 0000) libibumad.so.1 => /usr/local/ofed/lib64/libibumad.so.1 (0x00002abaefacb0 00) libpthread.so.0 => /lib64/libpthread.so.0 (0x00002abaefbd6000) librt.so.1 => /lib64/librt.so.1 (0x00002abaefcec000) libc.so.6 => /lib64/libc.so.6 (0x00002abaefdf5000) libdl.so.2 => /lib64/libdl.so.2 (0x00002abaf0026000) libibcommon.so.1 => /usr/local/ofed/lib64/libibcommon.so.1 (0x00002abaf0 12a000) /lib64/ld-linux-x86-64.so.2 (0x00002abaef4f1000) [releng at svbu-qa1850-3 tmp]$ /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-7/bin/mpicc -m32 -o osu_latency.mvapich2.32 osu_latency.c /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin /ld: ski pping incompatible /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-7/lib/libmpich.so when searching for -lmpich /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin /ld: ski pping incompatible /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-7/lib/libmpich.a when searching for -lmpich /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin /ld: can not find -lmpich collect2: ld returned 1 exit status Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Jeff Squyres (jsquyres) > Sent: Wednesday, March 07, 2007 12:08 PM > To: OpenFabrics General > Subject: [ofa-general] MPI question for OFED 1.2 > > When both the 32 and 64 bit versions of libibverbs are > present, which > should the MPI build against? > > What do MVAPICH / MVAPICH2? > > -- > Jeff Squyres > Server Virtualization Business Unit > Cisco Systems > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From jsquyres at cisco.com Wed Mar 7 17:04:59 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 7 Mar 2007 20:04:59 -0500 Subject: [ofa-general] MPI question for OFED 1.2 In-Reply-To: References: <249F3286-C3CC-4604-9729-990438FE9C45@cisco.com> Message-ID: Ok, thanks. On Mar 7, 2007, at 7:56 PM, Scott Weitzenkamp ((sweitzen)) wrote: > MVAPICH and MVAPICH2 use 64-bit, and Open MPI should too. MVAPICH and > MVAPICH2 only support 64-bit linking in this environment (they don't > build 32-bit MPI libs), at least they way they compile from OFED > install.sh. > > [releng at svbu-qa1850-3 tmp]$ cat /etc/SuSE-release > SUSE Linux Enterprise Server 10 (x86_64) > VERSION = 10 > > MVAPICH > > [releng at svbu-qa1850-3 tmp]$ > /usr/local/ofed/mpi/gcc/mvapich-0.9.9/bin/mpicc -o osu_latency.mvapich > osu_latency.c > [releng at svbu-qa1850-3 tmp]$ ldd osu_latency.mvapich > libmpich.so.1.0 => not found > libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 > (0x00002b0a2e0d > 7000) > libibumad.so.1 => /usr/local/ofed/lib64/libibumad.so.1 > (0x00002b0a2e1e20 > 00) > libibcommon.so.1 => /usr/local/ofed/lib64/libibcommon.so.1 > (0x00002b0a2e > 2ed000) > libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b0a2e3f0000) > librt.so.1 => /lib64/librt.so.1 (0x00002b0a2e506000) > libc.so.6 => /lib64/libc.so.6 (0x00002b0a2e610000) > libdl.so.2 => /lib64/libdl.so.2 (0x00002b0a2e840000) > /lib64/ld-linux-x86-64.so.2 (0x00002b0a2dfba000) > [releng at svbu-qa1850-3 tmp]$ > /usr/local/ofed/mpi/gcc/mvapich-0.9.9/bin/mpicc -m32 > -o osu_latency.mvapich.32 osu_latency.c > /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse- > linux/bin > /ld: ski > pping incompatible > /usr/local/ofed/mpi/gcc/mvapich-0.9.9/lib/shared/libmpich.so > when searching for -lmpich > /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse- > linux/bin > /ld: ski > pping incompatible /usr/local/ofed/mpi/gcc/mvapich-0.9.9/lib/ > libmpich.a > when sea > rching for -lmpich > /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse- > linux/bin > /ld: can > not find -lmpich > collect2: ld returned 1 exit status > > MVAPICH2: > > [releng at svbu-qa1850-3 tmp]$ > /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-7/bin/mpicc -o > osu_latency.mvapich2 osu_latency.c > [releng at svbu-qa1850-3 tmp]$ ldd osu_latency.mvapich2 > libmpich.so => > /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-7/lib/libmpich.so > (0x00002abaef60e000) > librdmacm.so => /usr/local/ofed/lib64/librdmacm.so > (0x00002abaef8bb000) > libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 > (0x00002abaef9c > 0000) > libibumad.so.1 => /usr/local/ofed/lib64/libibumad.so.1 > (0x00002abaefacb0 > 00) > libpthread.so.0 => /lib64/libpthread.so.0 (0x00002abaefbd6000) > librt.so.1 => /lib64/librt.so.1 (0x00002abaefcec000) > libc.so.6 => /lib64/libc.so.6 (0x00002abaefdf5000) > libdl.so.2 => /lib64/libdl.so.2 (0x00002abaf0026000) > libibcommon.so.1 => /usr/local/ofed/lib64/libibcommon.so.1 > (0x00002abaf0 > 12a000) > /lib64/ld-linux-x86-64.so.2 (0x00002abaef4f1000) > [releng at svbu-qa1850-3 tmp]$ > /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-7/bin/mpicc -m32 -o > osu_latency.mvapich2.32 osu_latency.c > /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse- > linux/bin > /ld: ski > pping incompatible > /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-7/lib/libmpich.so when > searching for -lmpich > /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse- > linux/bin > /ld: ski > pping incompatible > /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-7/lib/libmpich.a when > searching for -lmpich > /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse- > linux/bin > /ld: can > not find -lmpich > collect2: ld returned 1 exit status > > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > >> -----Original Message----- >> From: general-bounces at lists.openfabrics.org >> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of >> Jeff Squyres (jsquyres) >> Sent: Wednesday, March 07, 2007 12:08 PM >> To: OpenFabrics General >> Subject: [ofa-general] MPI question for OFED 1.2 >> >> When both the 32 and 64 bit versions of libibverbs are >> present, which >> should the MPI build against? >> >> What do MVAPICH / MVAPICH2? >> >> -- >> Jeff Squyres >> Server Virtualization Business Unit >> Cisco Systems >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From mst at mellanox.co.il Wed Mar 7 22:39:38 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Mar 2007 08:39:38 +0200 Subject: [ofa-general] Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy In-Reply-To: References: Message-ID: <20070308063921.GB26304@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy > > > Looks good, except that spin_lock/spin_unlock need a lock pointer. > > Yep, caught that as soon as I tried to build it... > > > Here's a (compile tested only) patch for merging async/command queues. > > OK, can you test that and my patch with the test that started this > thread? If it looks good we can merge it for 2.6.21. OK. And I think we also need the following on top of this. Please comment. commit 1cbe0687501636891ae367ae94b37fce9f73275d Author: Michael S. Tsirkin Date: Thu Mar 8 08:33:23 2007 +0200 IB/mthca: sync IRQs on QP reset and destroy It is common practice to put a pointer/index to a per-QP structure inside the wrid. This data is available after poll_cq returns, when cq lock is not taken. If this pointer is used directly inside the event handler, the ULP that is moving QP to reset has no way to know when is it safe to free data it points to, unless the verbs provider synchronizes with the IRQ handler before the verbs returns. Signed-off-by: Michael S. Tsirkin --- diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 0cba284..3d6591b 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -37,6 +37,7 @@ #include #include +#include #include @@ -864,6 +865,11 @@ int mthca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask, if (qp->ibqp.send_cq != qp->ibqp.recv_cq) mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, NULL); + if (dev->mthca_flags & MTHCA_FLAG_MSI_X) + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_COMP].msi_x_vector); + else + synchronize_irq(dev->pdev->irq); + mthca_wq_reset(&qp->sq); qp->sq.last = get_send_wqe(qp, qp->sq.max - 1); @@ -1420,6 +1426,11 @@ void mthca_free_qp(struct mthca_dev *dev, mthca_unlock_cqs(send_cq, recv_cq); + if (dev->mthca_flags & MTHCA_FLAG_MSI_X) + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_COMP].msi_x_vector); + else + synchronize_irq(dev->pdev->irq); + wait_event(qp->wait, !get_qp_refcount(dev, qp)); /* -- MST From monisonlists at gmail.com Thu Mar 8 00:44:28 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Thu, 08 Mar 2007 10:44:28 +0200 Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/or VERBS In-Reply-To: References: Message-ID: <45EFCCEC.4010205@gmail.com> SEGERS Koen wrote: Hi, My answers below refer only to the ib-bonding package that comes with OFED-1.2 > > Hi all! > > We are trying to bond two ports on 1 HCA so that we are able aggregate > the throughput. We are also interested in bonding ports of different HCA's. ib-bonding currently supports High Availability but not link aggregation. > > Is this possible with the OFED driver? If so, can you give the command? > We know TopSpin has support for this feature. Sadly, Topspin has no > driver that runs on our system (SLES 10). > > We currently installed OFED-1.2 of 20070306 and the stable OFED-1.1 > driver, but we can't figure out how this bonding is started in either > versions. It is important that we offload the bonding. We don't want to > use the standard linux bonding. That is why we think that bonding over > different HCA's is not going to work. Is this assumption correct? ib-bonding is based on standard Linux bonding with some required changes to make it work with IPoIB. > > Is bonding possible when running SDP? And VERBS? ib-bonding only works with IPoIB. > > Greetings > > Koen > > *** Disclaimer *** > > Vlaamse Radio- en Televisieomroep > Auguste Reyerslaan 52, 1043 Brussel > > nv van publiek recht > BTW BE 0244.142.664 > RPR Brussel > http://www.vrt.be/disclaimer > > > ------------------------------------------------------------------------ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From Koen.SEGERS at VRT.BE Thu Mar 8 01:03:07 2007 From: Koen.SEGERS at VRT.BE (SEGERS Koen) Date: Thu, 8 Mar 2007 10:03:07 +0100 Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS References: Message-ID: Are you talking about a kernel patch when you refer to the "bonding kernel driver"? I can't find a specific bonding command that allows bonding two or more ports. So if I understand it correct, with SDP you can't have redundancy (active/passive) or aggregation (active/active) with the current OFED-1.2 driver. Renaud Larsen of Cisco told us that bonding is possible in the Topspin driver with the "ipoibcfg merge" command. We are wondering if this also applies for SDP. That is why we are very interested in the beta drivers of Topspin! We are supposed to get them (from Renaud) within a few days, but if you can send it to me earlier, it is always better :) Greetings, Koen ________________________________ Van: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Verzonden: do 8/03/2007 0:01 Aan: SEGERS Koen; general at lists.openfabrics.org Onderwerp: RE: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS I have not tried the OFED 1.2 IPoIB bonding kernel driver, and can only speak for the userspace IPoIB HA ipoib_ha.pl script. Both Topspin IPoIB and OFED IPoIB have active/passive IPoIB high availability, neither can aggregate IPoIB throughput, and neither has SDP high availability. We will have Tosppin driver SLES10 drivers in beta soon, let me know if you are interested. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of SEGERS Koen Sent: Wednesday, March 07, 2007 6:59 AM To: general at lists.openfabrics.org Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS Hi all! We are trying to bond two ports on 1 HCA so that we are able aggregate the throughput. We are also interested in bonding ports of different HCA's. Is this possible with the OFED driver? If so, can you give the command? We know TopSpin has support for this feature. Sadly, Topspin has no driver that runs on our system (SLES 10). We currently installed OFED-1.2 of 20070306 and the stable OFED-1.1 driver, but we can't figure out how this bonding is started in either versions. It is important that we offload the bonding. We don't want to use the standard linux bonding. That is why we think that bonding over different HCA's is not going to work. Is this assumption correct? Is bonding possible when running SDP? And VERBS? Greetings Koen *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at lists.openfabrics.org Thu Mar 8 02:14:32 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Thu, 8 Mar 2007 02:14:32 -0800 (PST) Subject: [ofa-general] ofa_1_2_kernel 20070308-0200 daily build status Message-ID: <20070308101433.0775DE60848@openfabrics.org> This email was generated automatically, please do not reply Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.16 Passed on ia64 with linux-2.6.12 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.17 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.15 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.5-7.244-smp Failed: From jsquyres at cisco.com Thu Mar 8 03:58:28 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 8 Mar 2007 06:58:28 -0500 Subject: [ofa-general] openfabrics.org DNS problems Message-ID: <35379C25-204E-4135-B181-32BB0942A186@cisco.com> openfabrics.org is currently having some DNS problems; we're working on it. Developers: the server IP address is 146.246.248.81. As a temporary workaround, add this IP address in /etc/hosts for *.openfabrics.org and you should be able to continue working. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From jsquyres at cisco.com Thu Mar 8 04:00:47 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 8 Mar 2007 07:00:47 -0500 Subject: [ofa-general] openfabrics.org DNS problems In-Reply-To: <35379C25-204E-4135-B181-32BB0942A186@cisco.com> References: <35379C25-204E-4135-B181-32BB0942A186@cisco.com> Message-ID: <8E4D6E89-89AB-4234-BD99-3882A2EE855F@cisco.com> On Mar 8, 2007, at 6:58 AM, Jeff Squyres wrote: > Developers: the server IP address is 146.246.248.81. Wrong address, sorry; it should be: 69.55.231.195. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From sashak at voltaire.com Thu Mar 8 05:45:02 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 8 Mar 2007 15:45:02 +0200 Subject: [ofa-general] [PATCH 0/4] opensm: more routing optimizations Message-ID: <11733615061517-git-send-email-sashak@voltaire.com> Mostly it implements the "switch only" optimization idea and affects the min hops matrices building phase (for both up/down and default builders). The main trick is to keep the min hop tables _ONLY_ for switches base LIDs and don't bother with CAs, routers LIDs and any secondary LIDs in case when LMC > 0 - this saves a lot of memory and cpu cycles needed for calculation and storing the huge whole fabric matrices. For CA and router ports we will refer its neighbor switch's min hop vectors. And for LMC > 0 case we will use base LID's min hop vectors for any secondary LIDs (for CAs and routers it will neighbor switch's base LID). Preliminary testing shows 3-4x speedup in the min-hop generation phase and yet another 2x for up/down. Sasha From sashak at voltaire.com Thu Mar 8 05:45:03 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 8 Mar 2007 15:45:03 +0200 Subject: [ofa-general] [PATCH 1/4] opensm: use only switches min hop vectors In-Reply-To: <11733615061517-git-send-email-sashak@voltaire.com> References: <11733615061517-git-send-email-sashak@voltaire.com> Message-ID: <11733615092932-git-send-email-sashak@voltaire.com> Use only switch base LIDs min hop vectors in the best port lookup routines - osm_switch_recommend*_path(). Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_switch.h | 11 ++++++-- osm/opensm/osm_mcast_mgr.c | 11 +++---- osm/opensm/osm_switch.c | 53 ++++++++++++++++++++++++++++---------- osm/opensm/osm_ucast_mgr.c | 18 +++++++++--- 4 files changed, 65 insertions(+), 28 deletions(-) diff --git a/osm/include/opensm/osm_switch.h b/osm/include/opensm/osm_switch.h index 7edacc4..0bb4ca3 100644 --- a/osm/include/opensm/osm_switch.h +++ b/osm/include/opensm/osm_switch.h @@ -935,6 +935,7 @@ osm_switch_get_mft_max_position( uint8_t osm_switch_recommend_path( IN const osm_switch_t* const p_sw, + IN osm_port_t *p_port, IN const uint16_t lid_ho, IN const boolean_t ignore_existing, IN OUT uint64_t *remote_sys_guids, @@ -946,6 +947,10 @@ osm_switch_recommend_path( * p_sw * [in] Pointer to the switch object. * +* p_port +* [in] Pointer to the port object for which to get a path +* advisory. +* * lid_ho * [in] LID value (host order) for which to get a path advisory. * @@ -991,7 +996,7 @@ osm_switch_recommend_path( uint8_t osm_switch_recommend_mcast_path( IN osm_switch_t* const p_sw, - IN const uint16_t lid_ho, + IN osm_port_t *p_port, IN const uint16_t mlid_ho, IN const boolean_t ignore_existing ); /* @@ -999,8 +1004,8 @@ osm_switch_recommend_mcast_path( * p_sw * [in] Pointer to the switch object. * -* lid_ho -* [in] LID value (host order) for of the node for with to get +* p_port +* [in] Pointer to the port object for which to get * the multicast path. * * mlid_ho diff --git a/osm/opensm/osm_mcast_mgr.c b/osm/opensm/osm_mcast_mgr.c index 28c4dcd..4464689 100644 --- a/osm/opensm/osm_mcast_mgr.c +++ b/osm/opensm/osm_mcast_mgr.c @@ -531,7 +531,6 @@ __osm_mcast_mgr_subdivide( { uint8_t port_num; uint16_t mlid_ho; - uint16_t lid_ho; boolean_t ignore_existing; osm_mcast_work_obj_t* p_wobj; @@ -553,10 +552,8 @@ __osm_mcast_mgr_subdivide( while( (p_wobj = (osm_mcast_work_obj_t*)cl_qlist_remove_head( p_list )) != (osm_mcast_work_obj_t*)cl_qlist_end( p_list ) ) { - lid_ho = cl_ntoh16( osm_port_get_base_lid( p_wobj->p_port ) ); - port_num = osm_switch_recommend_mcast_path( - p_sw, lid_ho, mlid_ho, ignore_existing ); + p_sw, p_wobj->p_port, mlid_ho, ignore_existing ); if( port_num == OSM_NO_PATH ) { @@ -571,7 +568,8 @@ __osm_mcast_mgr_subdivide( "Error routing MLID 0x%X through switch 0x%" PRIx64 "\n" "\t\t\t\tNo multicast paths from this switch for port " "with LID 0x%X\n", - mlid_ho, node_guid_ho, lid_ho ); + mlid_ho, node_guid_ho, + cl_ntoh16(osm_port_get_base_lid(p_wobj->p_port)) ); __osm_mcast_work_obj_delete( p_wobj ); continue; @@ -585,7 +583,8 @@ __osm_mcast_mgr_subdivide( "Error routing MLID 0x%X through switch 0x%" PRIx64 "\n" "\t\t\t\tNo multicast paths from this switch to port " "with LID 0x%X\n", - mlid_ho, node_guid_ho, lid_ho ); + mlid_ho, node_guid_ho, + cl_ntoh16(osm_port_get_base_lid(p_wobj->p_port)) ); __osm_mcast_work_obj_delete( p_wobj ); diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c index 913f34b..1707f9f 100644 --- a/osm/opensm/osm_switch.c +++ b/osm/opensm/osm_switch.c @@ -233,6 +233,7 @@ osm_switch_get_fwd_tbl_block( uint8_t osm_switch_recommend_path( IN const osm_switch_t* const p_sw, + IN osm_port_t *p_port, IN const uint16_t lid_ho, IN const boolean_t ignore_existing, IN OUT uint64_t *remote_sys_guids, @@ -254,6 +255,7 @@ osm_switch_recommend_path( boolean_t routing_for_lmc = remote_sys_guids && remote_node_guids && p_num_used_sys && p_num_used_nodes; boolean_t sys_used, node_used; + uint16_t base_lid; uint16_t i; uint8_t hops; uint8_t least_hops; @@ -281,9 +283,24 @@ osm_switch_recommend_path( CL_ASSERT( lid_ho > 0 ); + if (p_port->p_node->sw) { + if (p_port->p_node->sw == p_sw) + return 0; + base_lid = osm_port_get_base_lid(p_port); + } else { + p_physp = osm_port_get_default_phys_ptr(p_port); + if (!p_physp || !p_physp->p_remote_physp || + !p_physp->p_remote_physp->p_node->sw) + return OSM_NO_PATH; + if (p_physp->p_remote_physp->p_node->sw == p_sw) + return p_physp->p_remote_physp->port_num; + base_lid = osm_node_get_base_lid(p_physp->p_remote_physp->p_node, 0); + } + base_lid = cl_ntoh16(base_lid); + num_ports = p_sw->num_ports; - least_hops = osm_switch_get_least_hops( p_sw, lid_ho ); + least_hops = osm_switch_get_least_hops( p_sw, base_lid ); if ( least_hops == OSM_NO_PATH ) return (OSM_NO_PATH); @@ -312,7 +329,7 @@ osm_switch_recommend_path( osm_physp_is_healthy(p_physp) && osm_physp_get_remote(p_physp) ) { - hops = osm_switch_get_hop_count( p_sw, lid_ho, port_num ); + hops = osm_switch_get_hop_count( p_sw, base_lid, port_num ); /* If we aren't using pre-defined user routes function, then we need to make sure that the current path is the minimum one. @@ -330,9 +347,6 @@ osm_switch_recommend_path( } } - if ( osm_node_get_base_lid(p_sw->p_node, 0) == cl_hton16(lid_ho) ) - return 0; - /* This algorithm selects a port based on a static load balanced selection across equal hop-count ports. @@ -350,7 +364,7 @@ osm_switch_recommend_path( /* port number starts with zero and num_ports is 1 + num phys ports */ for ( port_num = 1; port_num < num_ports; port_num++ ) { - if ( osm_switch_get_hop_count( p_sw, lid_ho, port_num ) == least_hops) + if ( osm_switch_get_hop_count( p_sw, base_lid, port_num ) == least_hops) { /* let us make sure it is not down or unhealthy */ p_physp = osm_node_get_physp_ptr(p_sw->p_node, port_num); @@ -533,18 +547,32 @@ osm_switch_prepare_path_rebuild( uint8_t osm_switch_recommend_mcast_path( IN osm_switch_t* const p_sw, - IN uint16_t const lid_ho, + IN osm_port_t* p_port, IN uint16_t const mlid_ho, IN boolean_t const ignore_existing ) { + uint16_t base_lid; uint8_t hops; uint8_t port_num; uint8_t num_ports; uint8_t least_hops; - CL_ASSERT( lid_ho > 0 ); CL_ASSERT( mlid_ho >= IB_LID_MCAST_START_HO ); + if (p_port->p_node->sw) { + if (p_port->p_node->sw == p_sw) + return 0; + base_lid = osm_port_get_base_lid(p_port); + } else { + osm_physp_t *p_physp = osm_port_get_default_phys_ptr(p_port); + if (!p_physp || !p_physp->p_remote_physp || + !p_physp->p_remote_physp->p_node->sw) + return OSM_NO_PATH; + if (p_physp->p_remote_physp->p_node->sw == p_sw) + return p_physp->p_remote_physp->port_num; + base_lid = osm_node_get_base_lid(p_physp->p_remote_physp->p_node, 0); + } + base_lid = cl_ntoh16(base_lid); num_ports = p_sw->num_ports; /* @@ -565,7 +593,7 @@ osm_switch_recommend_mcast_path( Don't be too trusting of the current forwarding table! Verify that the LID is reachable through this port. */ - hops = osm_switch_get_hop_count( p_sw, lid_ho, port_num ); + hops = osm_switch_get_hop_count( p_sw, base_lid, port_num ); if( hops != OSM_NO_PATH ) { return( port_num ); @@ -574,9 +602,6 @@ osm_switch_recommend_mcast_path( } } - if (osm_node_get_base_lid(p_sw->p_node, 0) == cl_hton16(lid_ho)) - return 0; - /* Either no existing mcast paths reach this port or we are ignoring existing paths. @@ -591,10 +616,10 @@ osm_switch_recommend_mcast_path( multicast packet will go around and around, inevitably creating a black hole that will destroy the Earth in a firey conflagration. */ - least_hops = osm_switch_get_least_hops( p_sw, lid_ho ); + least_hops = osm_switch_get_least_hops( p_sw, base_lid ); for( port_num = 1; port_num < num_ports; port_num++ ) { - if( osm_switch_get_hop_count( p_sw, lid_ho, port_num ) == least_hops ) + if( osm_switch_get_hop_count( p_sw, base_lid, port_num ) == least_hops ) break; } diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index ee6b3f9..c674d6d 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -248,6 +248,7 @@ __osm_ucast_mgr_dump_ucast_routes( IN void *cxt ) { const osm_node_t* p_node; + osm_port_t * p_port; uint8_t port_num; uint8_t num_hops; uint8_t best_hops; @@ -272,6 +273,13 @@ __osm_ucast_mgr_dump_ucast_routes( { fprintf(file, "0x%04X : ", lid_ho); + p_port = cl_ptr_vector_get(&p_mgr->p_subn->port_lid_tbl, lid_ho); + if (!p_port) + { + fprintf( file, "UNREACHABLE\n" ); + continue; + } + port_num = osm_switch_get_port_by_lid( p_sw, lid_ho ); if( port_num == OSM_NO_PATH ) { @@ -305,7 +313,7 @@ __osm_ucast_mgr_dump_ucast_routes( else { best_port = osm_switch_recommend_path( - p_sw, lid_ho, TRUE, + p_sw, p_port, lid_ho, TRUE, NULL, NULL, NULL, NULL ); /* No LMC Optimization */ fprintf( file, "No %u hop path possible via port %u!", best_hops, best_port ); @@ -689,7 +697,7 @@ static void __osm_ucast_mgr_process_port( IN osm_ucast_mgr_t* const p_mgr, IN osm_switch_t* const p_sw, - IN const osm_port_t* const p_port ) + IN osm_port_t* const p_port ) { uint16_t min_lid_ho; uint16_t max_lid_ho; @@ -775,12 +783,12 @@ __osm_ucast_mgr_process_port( { /* Use the enhanced algorithm only for LMC > 0 */ if (lids_per_port > 1) - port = osm_switch_recommend_path( p_sw, lid_ho, + port = osm_switch_recommend_path( p_sw, p_port, lid_ho, p_mgr->p_subn->ignore_existing_lfts, remote_sys_guids, &num_used_sys, remote_node_guids, &num_used_nodes ); else - port = osm_switch_recommend_path( p_sw, lid_ho, + port = osm_switch_recommend_path( p_sw, p_port, lid_ho, p_mgr->p_subn->ignore_existing_lfts, NULL, NULL, NULL, NULL ); @@ -1009,7 +1017,7 @@ __osm_ucast_mgr_process_tbl( osm_switch_t* const p_sw = (osm_switch_t*)p_map_item; osm_ucast_mgr_t* const p_mgr = (osm_ucast_mgr_t*)context; osm_node_t *p_node; - const osm_port_t *p_port; + osm_port_t *p_port; const cl_qmap_t* p_port_tbl; OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_process_tbl ); -- 1.5.0.3.307.gcf89 From sashak at voltaire.com Thu Mar 8 05:45:04 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 8 Mar 2007 15:45:04 +0200 Subject: [ofa-general] [PATCH 2/4] opensm: mcast_mgr uses osm_switch_get_port_least_hops() In-Reply-To: <11733615061517-git-send-email-sashak@voltaire.com> References: <11733615061517-git-send-email-sashak@voltaire.com> Message-ID: <11733615123337-git-send-email-sashak@voltaire.com> Instead of the direct accessing min hop tables mcast_mgr uses function osm_switch_get_port_least_hops() which will evaluate only switches' base LID min hop vectors. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_switch.h | 33 +++++++++++++++++++++++++++++++++ osm/opensm/osm_mcast_mgr.c | 8 ++------ osm/opensm/osm_switch.c | 26 ++++++++++++++++++++++++++ 3 files changed, 61 insertions(+), 6 deletions(-) diff --git a/osm/include/opensm/osm_switch.h b/osm/include/opensm/osm_switch.h index 0bb4ca3..f224468 100644 --- a/osm/include/opensm/osm_switch.h +++ b/osm/include/opensm/osm_switch.h @@ -373,6 +373,39 @@ osm_switch_get_least_hops( * Switch object *********/ +/****f* OpenSM: Switch/osm_switch_get_port_least_hops +* NAME +* osm_switch_get_port_least_hops +* +* DESCRIPTION +* Returns the number of hops in the short path to this port from +* any port on the switch. +* +* SYNOPSIS +*/ +uint8_t +osm_switch_get_port_least_hops( + IN const osm_switch_t* const p_sw, + IN const osm_port_t *p_port ); +/* +* PARAMETERS +* p_sw +* [in] Pointer to an osm_switch_t object. +* +* p_port +* [in] Pointer to an osm_port_t object for which to +* retrieve the shortest hop count. +* +* RETURN VALUES +* Returns the number of hops in the short path to this lid from +* any port on the switch. +* +* NOTES +* +* SEE ALSO +* Switch object +*********/ + /****f* OpenSM: Switch/osm_switch_get_port_by_lid * NAME * osm_switch_get_port_by_lid diff --git a/osm/opensm/osm_mcast_mgr.c b/osm/opensm/osm_mcast_mgr.c index 4464689..0cdcc0e 100644 --- a/osm/opensm/osm_mcast_mgr.c +++ b/osm/opensm/osm_mcast_mgr.c @@ -156,7 +156,6 @@ osm_mcast_mgr_compute_avg_hops( float avg_hops = 0; uint32_t hops = 0; uint32_t num_ports = 0; - uint16_t base_lid_ho; const osm_port_t* p_port; const osm_mcm_port_t* p_mcm_port; const cl_qmap_t* p_mcm_tbl; @@ -191,8 +190,7 @@ osm_mcast_mgr_compute_avg_hops( continue; } - base_lid_ho = cl_ntoh16( osm_port_get_base_lid( p_port ) ); - hops += osm_switch_get_least_hops( p_sw, base_lid_ho ); + hops += osm_switch_get_port_least_hops( p_sw, p_port ); num_ports++; } @@ -220,7 +218,6 @@ osm_mcast_mgr_compute_max_hops( { uint32_t max_hops = 0; uint32_t hops = 0; - uint16_t base_lid_ho; const osm_port_t* p_port; const osm_mcm_port_t* p_mcm_port; const cl_qmap_t* p_mcm_tbl; @@ -256,8 +253,7 @@ osm_mcast_mgr_compute_max_hops( continue; } - base_lid_ho = cl_ntoh16( osm_port_get_base_lid( p_port ) ); - hops = osm_switch_get_least_hops( p_sw, base_lid_ho ); + hops = osm_switch_get_port_least_hops( p_sw, p_port ); if (hops > max_hops) max_hops = hops; } diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c index 1707f9f..b11abe4 100644 --- a/osm/opensm/osm_switch.c +++ b/osm/opensm/osm_switch.c @@ -545,6 +545,32 @@ osm_switch_prepare_path_rebuild( /********************************************************************** **********************************************************************/ uint8_t +osm_switch_get_port_least_hops( + IN const osm_switch_t* const p_sw, + IN const osm_port_t *p_port ) +{ + uint16_t lid; + if (p_port->p_node->sw) { + if (p_port->p_node->sw == p_sw) + return 0; + lid = osm_node_get_base_lid(p_port->p_node, 0); + return osm_switch_get_least_hops(p_sw, cl_ntoh16(lid)); + } else { + osm_physp_t *p = osm_port_get_default_phys_ptr(p_port); + uint8_t hops; + if (!p || !p->p_remote_physp || !p->p_remote_physp->p_node->sw) + return OSM_NO_PATH; + if (p->p_remote_physp->p_node->sw == p_sw) + return 1; + lid = osm_node_get_base_lid(p->p_remote_physp->p_node, 0); + hops = osm_switch_get_least_hops(p_sw, cl_ntoh16(lid)); + return hops != OSM_NO_PATH ? hops + 1 : OSM_NO_PATH; + } +} + +/********************************************************************** + **********************************************************************/ +uint8_t osm_switch_recommend_mcast_path( IN osm_switch_t* const p_sw, IN osm_port_t* p_port, -- 1.5.0.3.307.gcf89 From sashak at voltaire.com Thu Mar 8 05:45:05 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 8 Mar 2007 15:45:05 +0200 Subject: [ofa-general] [PATCH 3/4] opensm: build min hop tables only for switches base LIDs In-Reply-To: <11733615061517-git-send-email-sashak@voltaire.com> References: <11733615061517-git-send-email-sashak@voltaire.com> Message-ID: <11733615163275-git-send-email-sashak@voltaire.com> up/down and default min hop builder will calculate min hop matrices only for switches base LIDs - don't bother with CA or router ports and LMC. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_ucast_mgr.c | 280 ++++++------------------------------------ osm/opensm/osm_ucast_updn.c | 96 --------------- 2 files changed, 40 insertions(+), 336 deletions(-) diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index c674d6d..4746d19 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -412,59 +412,35 @@ static void __osm_ucast_mgr_dump_tables(osm_ucast_mgr_t *p_mgr) } /********************************************************************** - Add each switch's own LID(s) to its LID matrix + Add each switch's own and neighbor LIDs to its LID matrix **********************************************************************/ static void -__osm_ucast_mgr_process_hop_0( +__osm_ucast_mgr_process_hop_0_1( IN cl_map_item_t* const p_map_item, IN void* context ) { osm_switch_t* const p_sw = (osm_switch_t*)p_map_item; - osm_ucast_mgr_t* const p_mgr = (osm_ucast_mgr_t*)context; - osm_node_t *p_node; - uint16_t lid_ho, base_lid_ho, max_lid_ho; - cl_status_t status; - uint8_t lmc; - - OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_process_hop_0 ); - - p_node = p_sw->p_node; - - CL_ASSERT( p_node ); - CL_ASSERT( osm_node_get_type( p_node ) == IB_NODE_TYPE_SWITCH ); + osm_node_t *p_remote_node; + uint16_t lid, remote_lid; + uint8_t i, remote_port; - base_lid_ho = cl_ntoh16( osm_node_get_base_lid( p_node, 0 ) ); - if (osm_switch_sp0_is_lmc_capable( p_sw, p_mgr->p_subn )) - lmc = osm_node_get_lmc( p_node, 0 ); - else - lmc = 0; - max_lid_ho = (uint16_t)( base_lid_ho + (1 << lmc) - 1 ); + lid = osm_node_get_base_lid(p_sw->p_node, 0); + lid = cl_ntoh16(lid); + osm_switch_set_hops(p_sw, lid, 0, 0); - for (lid_ho = base_lid_ho; lid_ho <= max_lid_ho; lid_ho++) + for( i = 1; i < p_sw->num_ports; i++ ) { - if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) - { - osm_log( p_mgr->p_log, OSM_LOG_DEBUG, - "__osm_ucast_mgr_process_hop_0: " - "Processing switch GUID 0x%" PRIx64 ", LID 0x%X\n", - cl_ntoh64( osm_node_get_node_guid( p_node ) ), - lid_ho ); - } + p_remote_node = osm_node_get_remote_node( p_sw->p_node, i, &remote_port ); - status = osm_switch_set_hops( p_sw, lid_ho, 0, 0 ); - if( status != CL_SUCCESS ) + if( p_remote_node && p_remote_node->sw && + (p_remote_node != p_sw->p_node ) ) { - osm_log( p_mgr->p_log, OSM_LOG_ERROR, - "__osm_ucast_mgr_process_hop_0: ERR 3A02: " - "Setting hop count failed (%s) for " - "switch GUID 0x%" PRIx64 ", LID 0x%X\n", - CL_STATUS_MSG( status ), - cl_ntoh64( osm_node_get_node_guid( p_node ) ), - lid_ho ); + remote_lid = osm_node_get_base_lid(p_remote_node, 0); + remote_lid = cl_ntoh16(remote_lid); + osm_switch_set_hops( p_sw, remote_lid, i, 1 ); + osm_switch_set_hops( p_remote_node->sw, lid, remote_port, 1 ); } } - - OSM_LOG_EXIT( p_mgr->p_log ); } /********************************************************************** @@ -472,219 +448,47 @@ __osm_ucast_mgr_process_hop_0( static void __osm_ucast_mgr_process_neighbor( IN osm_ucast_mgr_t* const p_mgr, - IN osm_switch_t* const p_sw, + IN osm_switch_t* const p_this_sw, IN osm_switch_t* const p_remote_sw, IN const uint8_t port_num, IN const uint8_t remote_port_num ) { + osm_switch_t *p_sw, *p_next_sw; uint16_t lid_ho; - uint16_t max_lid_ho; - osm_node_t* p_node; - const osm_node_t* p_remote_node; uint8_t hops; - cl_status_t status; OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_process_neighbor ); - CL_ASSERT( p_sw ); - CL_ASSERT( p_remote_sw ); - CL_ASSERT( port_num ); - CL_ASSERT( remote_port_num ); - - p_node = p_sw->p_node; - p_remote_node = p_remote_sw->p_node; - - CL_ASSERT( p_node ); - CL_ASSERT( p_remote_node ); - - CL_ASSERT( osm_node_get_type( p_node ) == IB_NODE_TYPE_SWITCH ); - CL_ASSERT( osm_node_get_type( p_remote_node ) == IB_NODE_TYPE_SWITCH ); - if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) { osm_log( p_mgr->p_log, OSM_LOG_DEBUG, "__osm_ucast_mgr_process_neighbor: " "Node 0x%" PRIx64 ", remote node 0x%" PRIx64 ", port 0x%X, remote port 0x%X\n", - cl_ntoh64( osm_node_get_node_guid( p_node ) ), - cl_ntoh64( osm_node_get_node_guid( p_remote_node ) ), + cl_ntoh64( osm_node_get_node_guid( p_this_sw->p_node ) ), + cl_ntoh64( osm_node_get_node_guid( p_remote_sw->p_node ) ), port_num, remote_port_num ); } - /* - Iterate through all the LIDs in the neighbor switch. - */ - max_lid_ho = p_remote_sw->max_lid_ho; - - hops = OSM_NO_PATH; - for( lid_ho = 1; lid_ho <= max_lid_ho; lid_ho++ ) - { - /* - Find the lowest hop count value to this LID. - */ - hops = osm_switch_get_least_hops( p_remote_sw, lid_ho ); - - if( hops != OSM_NO_PATH ) - { - /* - Increment hop count of the neighbor by 1, since it - takes 1 hop to get to the neighbor. - */ - hops++; - - CL_ASSERT( hops <= osm_switch_get_hop_count( p_sw, lid_ho, - port_num ) ); - if( osm_switch_get_hop_count( p_sw, lid_ho, - port_num ) > hops ) - { - if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) - { - osm_log( p_mgr->p_log, OSM_LOG_DEBUG, - "__osm_ucast_mgr_process_neighbor: " - "New best path is %u hops for LID 0x%X\n", - hops, lid_ho ); - } - - /* mark the fact we have got to change anything */ - __some_hop_count_set = TRUE; - - status = osm_switch_set_hops( p_sw, lid_ho, - port_num, hops ); - if( status != CL_SUCCESS ) - { - osm_log( p_mgr->p_log, OSM_LOG_ERROR, - "__osm_ucast_mgr_process_neighbor: ERR 3A03: " - "Setting hop count failed (%s)\n", - CL_STATUS_MSG( status ) ); - } - } - } - } - - OSM_LOG_EXIT( p_mgr->p_log ); -} - -/********************************************************************** - **********************************************************************/ -static void -__osm_ucast_mgr_process_leaf( - IN osm_ucast_mgr_t* const p_mgr, - IN osm_switch_t* const p_sw, - IN osm_node_t* const p_node, - IN const uint8_t port_num, - IN osm_node_t* const p_remote_node, - IN const uint8_t remote_port_num ) -{ - uint16_t i; - uint16_t base_lid_ho; - uint16_t max_lid_ho; - uint8_t lmc; - - OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_process_leaf ); - - CL_ASSERT( p_node ); - CL_ASSERT( p_remote_node ); - CL_ASSERT( port_num ); - CL_ASSERT( remote_port_num ); - - switch( osm_node_get_type( p_remote_node ) ) + p_next_sw = (osm_switch_t*)cl_qmap_head( &p_mgr->p_subn->sw_guid_tbl ); + while( p_next_sw != (osm_switch_t*)cl_qmap_end( &p_mgr->p_subn->sw_guid_tbl ) ) { - case IB_NODE_TYPE_CA: - case IB_NODE_TYPE_ROUTER: - base_lid_ho = cl_ntoh16( osm_node_get_base_lid( - p_remote_node, remote_port_num ) ); - lmc = osm_node_get_lmc( p_remote_node, remote_port_num ); - break; -#if 0 - case IB_NODE_TYPE_SWITCH: - base_lid_ho = cl_ntoh16( osm_node_get_base_lid( - p_remote_node, 0 ) ); - lmc = 0; - break; -#endif - - default: - osm_log( p_mgr->p_log, OSM_LOG_ERROR, - "__osm_ucast_mgr_process_leaf: ERR 3A01: " - "Bad node type %u, GUID 0x%" PRIx64 "\n", - osm_node_get_type( p_remote_node ), - cl_ntoh64( osm_node_get_node_guid( p_node ) )); - goto Exit; - } - - max_lid_ho = (uint16_t)(base_lid_ho + (1 << lmc) - 1 ); - - if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) - { - osm_log( p_mgr->p_log, OSM_LOG_DEBUG, - "__osm_ucast_mgr_process_leaf: " - "Discovered LIDs [0x%X,0x%X]\n" - "\t\t\t\tport number 0x%X, node 0x%" PRIx64 "\n", - base_lid_ho, max_lid_ho, - port_num, cl_ntoh64( osm_node_get_node_guid( p_node ) )); - } - - for( i = base_lid_ho; i <= max_lid_ho; i++ ) - osm_switch_set_hops( p_sw, i, port_num, 1 ); - - Exit: - OSM_LOG_EXIT( p_mgr->p_log ); -} - -/********************************************************************** - **********************************************************************/ -static void -__osm_ucast_mgr_process_leaves( - IN cl_map_item_t* const p_map_item, - IN void* context ) -{ - osm_switch_t* const p_sw = (osm_switch_t*)p_map_item; - osm_ucast_mgr_t* const p_mgr = (osm_ucast_mgr_t*)context; - osm_node_t *p_node; - osm_node_t *p_remote_node; - uint32_t port_num; - uint8_t remote_port_num; - uint32_t num_ports; - - OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_process_leaves ); - - p_node = p_sw->p_node; - - CL_ASSERT( p_node ); - CL_ASSERT( osm_node_get_type( p_node ) == IB_NODE_TYPE_SWITCH ); - - if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) - { - osm_log( p_mgr->p_log, OSM_LOG_DEBUG, - "__osm_ucast_mgr_process_leaves: " - "Processing switch 0x%" PRIx64 "\n", - cl_ntoh64( osm_node_get_node_guid( p_node ) )); - } - - /* - Add the LIDs of all leaves of this switch to the LID matrix. - Don't bother processing loopback paths from one port of - this switch to the another port. - Don't process neighbor switches yet. - Start with port 1 to skip the switch's management port. - */ - num_ports = osm_node_get_num_physp( p_node ); - - for( port_num = 1; port_num < num_ports; port_num++ ) - { - p_remote_node = osm_node_get_remote_node( p_node, - (uint8_t)port_num, &remote_port_num ); - - if( p_remote_node && (p_remote_node != p_node ) - && (osm_node_get_type( p_remote_node ) != IB_NODE_TYPE_SWITCH ) ) - { - __osm_ucast_mgr_process_leaf( - p_mgr, - p_sw, - p_node, - (uint8_t)port_num, - p_remote_node, - remote_port_num ); + p_sw = p_next_sw; + p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item ); + lid_ho = osm_node_get_base_lid(p_sw->p_node, 0); + lid_ho = cl_ntoh16(lid_ho); + hops = osm_switch_get_least_hops(p_remote_sw, lid_ho); + if (hops == OSM_NO_PATH) + continue; + hops++; + if (hops < osm_switch_get_hop_count(p_this_sw, lid_ho, port_num)) { + if (osm_switch_set_hops(p_this_sw, lid_ho, port_num, hops) != 0) + osm_log( p_mgr->p_log, OSM_LOG_ERROR, + "__osm_ucast_mgr_process_neighbor: " + "cannot set hops for lid %u at switch 0x%" PRIx64 "\n", + lid_ho, + cl_ntoh64(osm_node_get_node_guid(p_this_sw->p_node))); + __some_hop_count_set = TRUE; } } @@ -1098,8 +902,7 @@ __osm_ucast_mgr_process_neighbors( p_remote_node = osm_node_get_remote_node( p_node, (uint8_t)port_num, &remote_port_num ); - if( p_remote_node && (p_remote_node != p_node ) - && p_remote_node->sw ) + if( p_remote_node && p_remote_node->sw && (p_remote_node != p_node ) ) { /* make sure the link is healthy. If it is not - don't propagate through it. */ @@ -1137,10 +940,7 @@ osm_ucast_mgr_build_lid_matrices( then set the lid matrices for the each switch's leaf nodes. */ cl_qmap_apply_func( p_sw_guid_tbl, - __osm_ucast_mgr_process_hop_0, p_mgr ); - - cl_qmap_apply_func( p_sw_guid_tbl, - __osm_ucast_mgr_process_leaves, p_mgr ); + __osm_ucast_mgr_process_hop_0_1, p_mgr ); /* Get the switch matrices for each switch's neighbors. diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c index 1ec9017..b15fe5e 100644 --- a/osm/opensm/osm_ucast_updn.c +++ b/osm/opensm/osm_ucast_updn.c @@ -509,54 +509,6 @@ updn_subn_rank( /********************************************************************** **********************************************************************/ static int -populate_min_hops_for_cas( - osm_subn_t *p_subn, - osm_switch_t *p_sw ) -{ - osm_port_t *p_next_port,*p_port; - osm_physp_t *p_physp; - uint16_t lid, sw_lid; - uint8_t i, hops; - - p_next_port = (osm_port_t*)cl_qmap_head( &p_subn->port_guid_tbl ); - while( p_next_port != (osm_port_t*)cl_qmap_end( &p_subn->port_guid_tbl ) ) - { - p_port = p_next_port; - p_next_port = (osm_port_t*)cl_qmap_next( &p_port->map_item ); - - if (p_port->p_node->sw) - continue; - p_physp = osm_port_get_default_phys_ptr(p_port); - if (!p_physp || !p_physp->p_remote_physp || - !p_physp->p_remote_physp->p_node->sw) - continue; - - lid = osm_physp_get_base_lid(p_physp); - lid = cl_ntoh16(lid); - - if (p_physp->p_remote_physp->p_node->sw == p_sw) - { - osm_switch_set_hops(p_sw, lid, p_physp->p_remote_physp->port_num, 1); - continue; - } - - sw_lid = osm_node_get_base_lid(p_physp->p_remote_physp->p_node, 0); - sw_lid = cl_ntoh16(sw_lid); - - for (i = 1 ; i < p_sw->num_ports ; i++) - { - hops = osm_switch_get_hop_count(p_sw, sw_lid, i); - if (hops == OSM_NO_PATH) - continue; - osm_switch_set_hops(p_sw, lid, i, hops + 1); - } - } - return 0; -} - -/********************************************************************** - **********************************************************************/ -static int __osm_subn_set_up_down_min_hop_table( IN updn_t* p_updn ) { @@ -598,14 +550,6 @@ __osm_subn_set_up_down_min_hop_table( __updn_bfs_by_node(p_log, p_subn, p_sw); } - p_next_sw = (osm_switch_t*)cl_qmap_head( &p_subn->sw_guid_tbl ); - while( p_next_sw != (osm_switch_t*)cl_qmap_end( &p_subn->sw_guid_tbl ) ) - { - p_sw = p_next_sw; - p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item ); - populate_min_hops_for_cas(p_subn, p_sw); - } - osm_log( p_log, OSM_LOG_VERBOSE, "__osm_subn_set_up_down_min_hop_table: " "BFS through all port guids in the subnet ]\n" ); @@ -668,44 +612,6 @@ __osm_subn_calc_up_down_min_hop_table( /********************************************************************** **********************************************************************/ -static void -expand_lid_matrices_for_lmc( - osm_subn_t *p_subn ) -{ - cl_map_item_t *p_next_port, *p_next_sw; - osm_port_t *p_port; - osm_switch_t *p_sw; - uint16_t lid, min_lid, max_lid; - uint8_t port, num_ports, hops; - - p_next_port = cl_qmap_head( &p_subn->port_guid_tbl ); - while (p_next_port != cl_qmap_end(&p_subn->port_guid_tbl)) - { - p_port = (osm_port_t *)p_next_port; - p_next_port = cl_qmap_next(p_next_port); - if (p_port->p_node->sw && - !osm_switch_sp0_is_lmc_capable(p_port->p_node->sw, p_subn)) - continue; - osm_port_get_lid_range_ho(p_port, &min_lid, &max_lid); - if (!min_lid || min_lid == max_lid) - continue; - p_next_sw = cl_qmap_head(&p_subn->sw_guid_tbl); - while (p_next_sw != cl_qmap_end(&p_subn->sw_guid_tbl)) - { - p_sw = (osm_switch_t *)p_next_sw; - p_next_sw = cl_qmap_next(p_next_sw); - num_ports = p_sw->num_ports; - for (port = 0; port < num_ports; port++) { - hops = osm_switch_get_hop_count(p_sw, min_lid, port); - for (lid = min_lid + 1 ; lid <= max_lid; lid++) - osm_switch_set_hops(p_sw, lid, port, hops); - } - } - } -} - -/********************************************************************** - **********************************************************************/ static struct updn_node * create_updn_node( osm_switch_t *sw ) @@ -774,8 +680,6 @@ __osm_updn_call( __osm_subn_calc_up_down_min_hop_table( p_updn->updn_ucast_reg_inputs.num_guids, p_updn->updn_ucast_reg_inputs.guid_list, p_updn ); - if (p_updn->p_osm->subn.opt.lmc) - expand_lid_matrices_for_lmc(&p_updn->p_osm->subn); } else osm_log( &p_updn->p_osm->log, OSM_LOG_INFO, -- 1.5.0.3.307.gcf89 From sashak at voltaire.com Thu Mar 8 05:45:06 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 8 Mar 2007 15:45:06 +0200 Subject: [ofa-general] [PATCH 4/4] opensm: dump functions adoption In-Reply-To: <11733615061517-git-send-email-sashak@voltaire.com> References: <11733615061517-git-send-email-sashak@voltaire.com> Message-ID: <11733615191040-git-send-email-sashak@voltaire.com> This adopts routing dump functions to work properly with reduced min hop tables. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_ucast_mgr.c | 34 ++++++++++++++++++++++++++++++---- 1 files changed, 30 insertions(+), 4 deletions(-) diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index 4746d19..22a99ad 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -254,7 +254,7 @@ __osm_ucast_mgr_dump_ucast_routes( uint8_t best_hops; uint8_t best_port; uint16_t max_lid_ho; - uint16_t lid_ho; + uint16_t lid_ho, base_lid; osm_switch_t* p_sw = (osm_switch_t *)p_map_item; osm_ucast_mgr_t* p_mgr = ((struct ucast_mgr_dump_context *)cxt)->p_mgr; FILE *file = ((struct ucast_mgr_dump_context *)cxt)->file; @@ -298,14 +298,39 @@ __osm_ucast_mgr_dump_ucast_routes( Therefore, ensure that the hop count is better than OSM_NO_PATH. */ - num_hops = osm_switch_get_hop_count( p_sw, lid_ho, port_num ); + if( p_port->p_node->sw ) + { + base_lid = osm_node_get_base_lid(p_port->p_node, 0); + base_lid = cl_ntoh16(base_lid); + num_hops = osm_switch_get_hop_count( p_sw, base_lid, port_num ); + } + else + { + osm_physp_t *p_physp = osm_port_get_default_phys_ptr(p_port); + if( !p_physp || !p_physp->p_remote_physp || + !p_physp->p_remote_physp->p_node->sw ) + num_hops = OSM_NO_PATH; + else + { + base_lid = osm_node_get_base_lid(p_physp->p_remote_physp->p_node, 0); + base_lid = cl_ntoh16(base_lid); + num_hops = p_physp->p_remote_physp->p_node->sw == p_sw ? + 0 : osm_switch_get_hop_count( p_sw, base_lid, port_num ); + } + } + if( num_hops == OSM_NO_PATH ) { fprintf( file, "UNREACHABLE\n" ); continue; } - best_hops = osm_switch_get_least_hops( p_sw, lid_ho ); + best_hops = osm_switch_get_least_hops( p_sw, base_lid ); + if (!p_port->p_node->sw) { + best_hops++; + num_hops++; + } + fprintf( file, "%03u : %02u : ", port_num, num_hops ); if( best_hops == num_hops ) @@ -343,7 +368,8 @@ ucast_mgr_dump_lid_matrix(cl_map_item_t *p_map_item, void *cxt) cl_ntoh64(osm_node_get_node_guid(p_node))); for (lid = 1; lid <= max_lid; lid++) { osm_port_t *p_port; - + if (osm_switch_get_least_hops(p_sw, lid) == OSM_NO_PATH) + continue; fprintf(file, "0x%04x:", lid); for (port = 0 ; port < max_port ; port++) fprintf(file, " %02x", -- 1.5.0.3.307.gcf89 From mst at mellanox.co.il Thu Mar 8 06:07:03 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Mar 2007 16:07:03 +0200 Subject: [ofa-general] Re: OFED 1.2 beta blocking bugs In-Reply-To: <45EF303F.5090607@ichips.intel.com> References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> Message-ID: <20070308140703.GC23302@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: OFED 1.2 beta blocking bugs > > > Sean there's a critical bug related to multicast module, > > and a blocker bug related to ucma module assigned to you. > > Not sure who assigned me the bugs, I did. > but I did look at them and added comments. Yes. but they are not very helpful ones. > The multicast bug appears to be related to the ipoib HA, rather than the > multicast module. ipoib HA is just a script that brings interfaces up/down. Juding by the error messages, we are trying to attach QP to a non-existent MC group. You implemented both the multicast module and ported ipoib to it, so ... > I haven't done anything with backport patches, and I'm not sure who has. I did the backport for ucma, since it seemed no one else would bother. The patch in question is kernel_patches/backport/2.6.19/2_misc_device_to_2_6_19.patch (and same for other kernels) All I did, though, was revert the patch that updated ucma to new miscdevice API: I don't however use ucma at all, and I only made sure the code compiles. And, I have my hands full with IPoIB and SDP. Please feel free to take over and replace this patch with whatever you find fit. -- MST From vlad at dev.mellanox.co.il Thu Mar 8 06:44:55 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 08 Mar 2007 16:44:55 +0200 Subject: [ofa-general] openfabrics.org DNS problems In-Reply-To: <8E4D6E89-89AB-4234-BD99-3882A2EE855F@cisco.com> References: <35379C25-204E-4135-B181-32BB0942A186@cisco.com> <8E4D6E89-89AB-4234-BD99-3882A2EE855F@cisco.com> Message-ID: <45F02167.6010305@dev.mellanox.co.il> Jeff Squyres wrote: > On Mar 8, 2007, at 6:58 AM, Jeff Squyres wrote: > >> Developers: the server IP address is 146.246.248.81. > > Wrong address, sorry; it should be: 69.55.231.195. > Jeff, Can you add git.openfabrics.org to /etc/hosts on openfabrics server - OFED-1.2 daily build fails... Thanks, Regards, Vladimir From jsquyres at cisco.com Thu Mar 8 06:54:18 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 8 Mar 2007 09:54:18 -0500 Subject: [ofa-general] openfabrics.org DNS problems In-Reply-To: <45F02167.6010305@dev.mellanox.co.il> References: <35379C25-204E-4135-B181-32BB0942A186@cisco.com> <8E4D6E89-89AB-4234-BD99-3882A2EE855F@cisco.com> <45F02167.6010305@dev.mellanox.co.il> Message-ID: <03470BB7-5BD2-417F-A6CE-97250D77D67B@cisco.com> Done. Try now. On Mar 8, 2007, at 9:44 AM, Vladimir Sokolovsky wrote: > Jeff Squyres wrote: >> On Mar 8, 2007, at 6:58 AM, Jeff Squyres wrote: >> >>> Developers: the server IP address is 146.246.248.81. >> >> Wrong address, sorry; it should be: 69.55.231.195. >> > Jeff, > Can you add git.openfabrics.org to /etc/hosts on openfabrics server > - OFED-1.2 daily build fails... > > Thanks, > > Regards, > Vladimir -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From vlad at dev.mellanox.co.il Thu Mar 8 07:18:30 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 08 Mar 2007 17:18:30 +0200 Subject: [ofa-general] openfabrics.org DNS problems In-Reply-To: <03470BB7-5BD2-417F-A6CE-97250D77D67B@cisco.com> References: <35379C25-204E-4135-B181-32BB0942A186@cisco.com> <8E4D6E89-89AB-4234-BD99-3882A2EE855F@cisco.com> <45F02167.6010305@dev.mellanox.co.il> <03470BB7-5BD2-417F-A6CE-97250D77D67B@cisco.com> Message-ID: <45F02946.30607@dev.mellanox.co.il> Jeff Squyres wrote: > Done. Try now. > Thanks, OFED-1.2-20070308-0708.tgz is ready. Regards, Vladimir From troy at scl.ameslab.gov Wed Mar 7 19:50:19 2007 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Wed, 7 Mar 2007 21:50:19 -0600 Subject: [ofa-general] infiniband on XD1? Message-ID: <66BFA71F-E8AE-454D-8555-3BFE00BFD326@scl.ameslab.gov> Has anyone besides me been stupid enough to try and put a mellanox infiniband card in a Cray XD1? It looks like I get a similar problem that the XT3 used to have where the large PCI memory footprint makes stuff not work. Except the node is not even booting ;) It looks like the 'real' xd1 kernel is dieing in the IO-APIC initialization. From halr at voltaire.com Thu Mar 8 07:12:28 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 08 Mar 2007 10:12:28 -0500 Subject: [ofa-general] Re: [PATCH] opensm: yet another up/down speedup In-Reply-To: <20070303135946.GC12388@sashak.voltaire.com> References: <20070303135946.GC12388@sashak.voltaire.com> Message-ID: <1173366739.465.84121.camel@hal.voltaire.com> On Sat, 2007-03-03 at 08:59, Sasha Khapyorsky wrote: > The idea of this optimization is to perform all time consuming up/down > min hops calculation cycles only for switches, and when this is ready > just to populate (in one pass) calculated min hops values for CAs and > routers as its neighbour switch's min hops + 1. Tests show yet another > 6-7 times speedup. Excellent! > Signed-off-by: Sasha Khapyorsky Thanks. Applied to master only right now. ofed_1_2 will likely occur but later. -- Hal From jsquyres at cisco.com Thu Mar 8 09:23:42 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 8 Mar 2007 12:23:42 -0500 Subject: [ofa-general] Re: [ewg] openfabrics.org DNS problems In-Reply-To: References: <35379C25-204E-4135-B181-32BB0942A186@cisco.com> Message-ID: <44815458-3DB0-4878-8463-DAD10F771534@cisco.com> Ah yes, there are mulitple IP's involved, sorry: 69.55.231.195 lists.openfabrics.org www.openfabrics.org www2.openfabrics.org git.openfabrics.org 69.55.231.178 bugs.openfabrics.org 69.55.231.179 wiki.openfabrics.org 69.55.231.180 svn.openfabrics.org On Mar 8, 2007, at 12:22 PM, Scott Weitzenkamp ((sweitzen)) wrote: > How do I get to bugzilla? > > Scott > >> -----Original Message----- >> From: ewg-bounces at lists.openfabrics.org >> [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Jeff >> Squyres (jsquyres) >> Sent: Thursday, March 08, 2007 3:58 AM >> To: OpenFabrics General; OpenFabrics EWG >> Subject: [ewg] openfabrics.org DNS problems >> >> openfabrics.org is currently having some DNS problems; we're working >> on it. >> >> Developers: the server IP address is 146.246.248.81. As a temporary >> workaround, add this IP address in /etc/hosts for *.openfabrics.org >> and you should be able to continue working. >> >> -- >> Jeff Squyres >> Server Virtualization Business Unit >> Cisco Systems >> >> _______________________________________________ >> ewg mailing list >> ewg at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg >> -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From sweitzen at cisco.com Thu Mar 8 09:34:17 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 8 Mar 2007 09:34:17 -0800 Subject: [ofa-general] RE: OFED 1.2 beta blocking bugs In-Reply-To: <20070308140703.GC23302@mellanox.co.il> References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> Message-ID: Sean, can you please take a look? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Thursday, March 08, 2007 6:07 AM > To: Sean Hefty > Cc: Scott Weitzenkamp (sweitzen); Openfabrics-ewg at openib.org; > Vladimir Sokolovsky; Tziporet Koren; Pavel Shamis; Jeff > Squyres (jsquyres); Shaun Rowland; Woodruff, Robert J; > general at lists.openfabrics.org > Subject: Re: OFED 1.2 beta blocking bugs > > > Quoting Sean Hefty : > > Subject: Re: OFED 1.2 beta blocking bugs > > > > > Sean there's a critical bug related to multicast module, > > > and a blocker bug related to ucma module assigned to you. > > > > Not sure who assigned me the bugs, > > I did. > > > but I did look at them and added comments. > > Yes. but they are not very helpful ones. > > > The multicast bug appears to be related to the ipoib HA, > rather than the > > multicast module. > > ipoib HA is just a script that brings interfaces up/down. > Juding by the error > messages, we are trying to attach QP to a non-existent MC group. You > implemented both the multicast module and ported ipoib to it, so ... > > > I haven't done anything with backport patches, and I'm not > sure who has. > > I did the backport for ucma, since it seemed no one else would bother. > The patch in question is > kernel_patches/backport/2.6.19/2_misc_device_to_2_6_19.patch > (and same for other kernels) > > All I did, though, was revert the patch that updated ucma to > new miscdevice API: > I don't however use ucma at all, and I only made sure the > code compiles. > > And, I have my hands full with IPoIB and SDP. > > Please feel free to take over and replace this patch with > whatever you find fit. > > -- > MST > From jsquyres at cisco.com Thu Mar 8 09:55:31 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 8 Mar 2007 12:55:31 -0500 Subject: [ofa-general] DNS problems: fixed Message-ID: <0AEB7C74-AC6A-4088-BCCF-3807A3DEE5E2@cisco.com> The DNS problems have been fixed. You may still experience some DNS instability over the next 24-48 hours as the changes ripple out across the world. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From sweitzen at cisco.com Thu Mar 8 08:55:43 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 8 Mar 2007 08:55:43 -0800 Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS In-Reply-To: References: Message-ID: "ipoibcfg merge" only handles IPoIB, not SDP, and it's active/passive. ________________________________ From: SEGERS Koen [mailto:Koen.SEGERS at VRT.BE] Sent: Thursday, March 08, 2007 1:03 AM To: Scott Weitzenkamp (sweitzen); general at lists.openfabrics.org Subject: RE: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS Are you talking about a kernel patch when you refer to the "bonding kernel driver"? I can't find a specific bonding command that allows bonding two or more ports. So if I understand it correct, with SDP you can't have redundancy (active/passive) or aggregation (active/active) with the current OFED-1.2 driver. Renaud Larsen of Cisco told us that bonding is possible in the Topspin driver with the "ipoibcfg merge" command. We are wondering if this also applies for SDP. That is why we are very interested in the beta drivers of Topspin! We are supposed to get them (from Renaud) within a few days, but if you can send it to me earlier, it is always better :) Greetings, Koen ________________________________ Van: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Verzonden: do 8/03/2007 0:01 Aan: SEGERS Koen; general at lists.openfabrics.org Onderwerp: RE: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS I have not tried the OFED 1.2 IPoIB bonding kernel driver, and can only speak for the userspace IPoIB HA ipoib_ha.pl script. Both Topspin IPoIB and OFED IPoIB have active/passive IPoIB high availability, neither can aggregate IPoIB throughput, and neither has SDP high availability. We will have Tosppin driver SLES10 drivers in beta soon, let me know if you are interested. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of SEGERS Koen Sent: Wednesday, March 07, 2007 6:59 AM To: general at lists.openfabrics.org Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS Hi all! We are trying to bond two ports on 1 HCA so that we are able aggregate the throughput. We are also interested in bonding ports of different HCA's. Is this possible with the OFED driver? If so, can you give the command? We know TopSpin has support for this feature. Sadly, Topspin has no driver that runs on our system (SLES 10). We currently installed OFED-1.2 of 20070306 and the stable OFED-1.1 driver, but we can't figure out how this bonding is started in either versions. It is important that we offload the bonding. We don't want to use the standard linux bonding. That is why we think that bonding over different HCA's is not going to work. Is this assumption correct? Is bonding possible when running SDP? And VERBS? Greetings Koen *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Thu Mar 8 10:03:58 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 8 Mar 2007 10:03:58 -0800 Subject: [ofa-general] RE: OFED 1.2 beta blocking bugs In-Reply-To: <45F04DEC.10000@ichips.intel.com> References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> Message-ID: > I looked at the code, and didn't see anything obviously > wrong. Can you tell me > how to reproduce the issue? (I'm running 2.6.21-rc3.) Use a recent OFED 1.2 build, say OFED-1.2-20070308-0708. Configure IPoIB HA to use ib0 and ib1, for example. Make sure your openibd.conf has these lines: IPOIB_LOAD=yes SET_IPOIB_CM=yes IPOIBHA_ENABLE=yes PRIMARY_IPOIB_DEV=ib0 SECONDARY_IPOIB_DEV=ib1 After running "/etc/init.d/openibd restart", start some IPoIB traffic between the two hosts. Now start failing over IB ports, either by cable pulls or a script that takes switch ports up and down. You should see the IP address move back and forth between ib0 and ib1, and error messages in dmesg. Scott From sweitzen at cisco.com Thu Mar 8 08:54:38 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 8 Mar 2007 08:54:38 -0800 Subject: [ofa-general] infiniband bonding/merging/aggregation with SDPand/or VERBS In-Reply-To: <45EFCCEC.4010205@gmail.com> References: <45EFCCEC.4010205@gmail.com> Message-ID: Can the bonding driver load balance IPoIB traffic across multiple interfaces at the same time? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Moni Shoua > Sent: Thursday, March 08, 2007 12:44 AM > To: SEGERS Koen > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] infiniband > bonding/merging/aggregation with SDPand/or VERBS > > SEGERS Koen wrote: > Hi, > My answers below refer only to the ib-bonding package that > comes with OFED-1.2 > > > > Hi all! > > > > We are trying to bond two ports on 1 HCA so that we are > able aggregate > > the throughput. We are also interested in bonding ports of > different HCA's. > ib-bonding currently supports High Availability but not link > aggregation. > > > > Is this possible with the OFED driver? If so, can you give > the command? > > We know TopSpin has support for this feature. Sadly, Topspin has no > > driver that runs on our system (SLES 10). > > > > We currently installed OFED-1.2 of 20070306 and the stable OFED-1.1 > > driver, but we can't figure out how this bonding is started > in either > > versions. It is important that we offload the bonding. We > don't want to > > use the standard linux bonding. That is why we think that > bonding over > > different HCA's is not going to work. Is this assumption correct? > ib-bonding is based on standard Linux bonding with some > required changes to make it work with IPoIB. > > > > Is bonding possible when running SDP? And VERBS? > ib-bonding only works with IPoIB. > > > > Greetings > > > > Koen > > > > *** Disclaimer *** > > > > Vlaamse Radio- en Televisieomroep > > Auguste Reyerslaan 52, 1043 Brussel > > > > nv van publiek recht > > BTW BE 0244.142.664 > > RPR Brussel > > http://www.vrt.be/disclaimer > > > > > > > -------------------------------------------------------------- > ---------- > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From jsquyres at cisco.com Thu Mar 8 08:25:14 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 8 Mar 2007 11:25:14 -0500 Subject: [ofa-general] openfabrics.org DNS problems In-Reply-To: <45F02946.30607@dev.mellanox.co.il> References: <35379C25-204E-4135-B181-32BB0942A186@cisco.com> <8E4D6E89-89AB-4234-BD99-3882A2EE855F@cisco.com> <45F02167.6010305@dev.mellanox.co.il> <03470BB7-5BD2-417F-A6CE-97250D77D67B@cisco.com> <45F02946.30607@dev.mellanox.co.il> Message-ID: On Mar 8, 2007, at 10:18 AM, Vladimir Sokolovsky wrote: > OFED-1.2-20070308-0708.tgz is ready. Since the 32/64 bit builds have been enabled, OMPI has been unable to compile because of wonkyness in our configure script (see https:// bugs.openfabrics.org/show_bug.cgi?id=421 for details). This was just fixed this morning, but I didn't make the cutoff for the nightly tarball. As a workaround for today, you can: 1. get the OFED-1.2-20070308-0708.tgz tarball 2. expand it 3. "rm SRPMS/openmpi*" 4. download the new OMPI SRPM into the SRPMS directory from: http://www.open-mpi.org/~jsquyres/unofficial/ openmpi-1.2rc2ofedr13964-1.src.rpm -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From halr at voltaire.com Thu Mar 8 06:02:20 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 08 Mar 2007 09:02:20 -0500 Subject: [ofa-general] mail-archive.com OpenFabrics mail archive Message-ID: <1173362539.465.79597.camel@hal.voltaire.com> Hi, Anyone know who setup or who the contact is for the mail-archive.com OpenFabrics mail archive ? We should get this to point to the new mailing list (general at lists.openfabrics.org). Thanks. -- Hal From rdreier at cisco.com Thu Mar 8 10:07:40 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Mar 2007 10:07:40 -0800 Subject: [ofa-general] [PATCH, RFC] libibverbs: Add hooks for rereg_mr, memory windows In-Reply-To: <1334.85.65.223.188.1172833964.squirrel@dev.mellanox.co.il> (dotanb@dev.mellanox.co.il's message of "Fri, 2 Mar 2007 13:12:44 +0200 (IST)") References: <1334.85.65.223.188.1172833964.squirrel@dev.mellanox.co.il> Message-ID: OK, I pushed out the patch below. Please let me know if you see any problems caused by it, or if you think there may be a problem handling rereg MR and/or MWs with this ABI. Dotan, do you have any further comments about the completion channel closing issue? I would really like to freeze the libibverbs ABI as soon as possible. - R. diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index 49cd581..2ae50ab 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -1,7 +1,7 @@ /* * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. * Copyright (c) 2004 Intel Corporation. All rights reserved. - * Copyright (c) 2005, 2006 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2005, 2006, 2007 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2005 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two @@ -288,6 +288,13 @@ struct ibv_pd { uint32_t handle; }; +enum ibv_rereg_mr_flags { + IBV_REREG_MR_CHANGE_TRANSLATION = (1 << 0), + IBV_REREG_MR_CHANGE_PD = (1 << 1), + IBV_REREG_MR_CHANGE_ACCESS = (1 << 2), + IBV_REREG_MR_KEEP_VALID = (1 << 3) +}; + struct ibv_mr { struct ibv_context *context; struct ibv_pd *pd; @@ -298,6 +305,17 @@ struct ibv_mr { uint32_t rkey; }; +enum ibv_mw_type { + IBV_MW_TYPE_1 = 1, + IBV_MW_TYPE_2 = 2 +}; + +struct ibv_mw { + struct ibv_context *context; + struct ibv_pd *pd; + uint32_t rkey; +}; + struct ibv_global_route { union ibv_gid dgid; uint32_t flow_label; @@ -517,6 +535,15 @@ struct ibv_recv_wr { int num_sge; }; +struct ibv_mw_bind { + uint64_t wr_id; + struct ibv_mr *mr; + void *addr; + size_t length; + enum ibv_send_flags send_flags; + enum ibv_access_flags mw_access_flags; +}; + struct ibv_srq { struct ibv_context *context; void *srq_context; @@ -603,7 +630,16 @@ struct ibv_context_ops { int (*dealloc_pd)(struct ibv_pd *pd); struct ibv_mr * (*reg_mr)(struct ibv_pd *pd, void *addr, size_t length, enum ibv_access_flags access); + struct ibv_mr * (*rereg_mr)(struct ibv_mr *mr, + enum ibv_rereg_mr_flags flags, + struct ibv_pd *pd, void *addr, + size_t length, + enum ibv_access_flags access); int (*dereg_mr)(struct ibv_mr *mr); + struct ibv_mw * (*alloc_mw)(struct ibv_pd *pd, enum ibv_mw_type type); + int (*bind_mw)(struct ibv_qp *qp, struct ibv_mw *mw, + struct ibv_mw_bind *mw_bind); + int (*dealloc_mw)(struct ibv_mw *mw); struct ibv_cq * (*create_cq)(struct ibv_context *context, int cqe, struct ibv_comp_channel *channel, int comp_vector); From sweitzen at cisco.com Thu Mar 8 09:22:39 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 8 Mar 2007 09:22:39 -0800 Subject: [ofa-general] RE: [ewg] openfabrics.org DNS problems In-Reply-To: <35379C25-204E-4135-B181-32BB0942A186@cisco.com> References: <35379C25-204E-4135-B181-32BB0942A186@cisco.com> Message-ID: How do I get to bugzilla? Scott > -----Original Message----- > From: ewg-bounces at lists.openfabrics.org > [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Jeff > Squyres (jsquyres) > Sent: Thursday, March 08, 2007 3:58 AM > To: OpenFabrics General; OpenFabrics EWG > Subject: [ewg] openfabrics.org DNS problems > > openfabrics.org is currently having some DNS problems; we're working > on it. > > Developers: the server IP address is 146.246.248.81. As a temporary > workaround, add this IP address in /etc/hosts for *.openfabrics.org > and you should be able to continue working. > > -- > Jeff Squyres > Server Virtualization Business Unit > Cisco Systems > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > From somenath at veritas.com Wed Mar 7 10:27:27 2007 From: somenath at veritas.com (somenath) Date: Wed, 07 Mar 2007 10:27:27 -0800 Subject: [ofa-general] Re: [ewg] RE: OFED 1.2 beta blocking bugs In-Reply-To: References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> Message-ID: <45EF040F.3090305@veritas.com> Scott: do you have a script to bring the cisco switch port up and down? thanks, som. Scott Weitzenkamp (sweitzen) wrote: >>I looked at the code, and didn't see anything obviously >>wrong. Can you tell me >>how to reproduce the issue? (I'm running 2.6.21-rc3.) >> >> > >Use a recent OFED 1.2 build, say OFED-1.2-20070308-0708. > >Configure IPoIB HA to use ib0 and ib1, for example. Make sure your >openibd.conf has these lines: > >IPOIB_LOAD=yes >SET_IPOIB_CM=yes >IPOIBHA_ENABLE=yes >PRIMARY_IPOIB_DEV=ib0 >SECONDARY_IPOIB_DEV=ib1 > >After running "/etc/init.d/openibd restart", start some IPoIB traffic >between the two hosts. > >Now start failing over IB ports, either by cable pulls or a script that >takes switch ports up and down. You should see the IP address move back >and forth between ib0 and ib1, and error messages in dmesg. > >Scott > >_______________________________________________ >ewg mailing list >ewg at lists.openfabrics.org >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > From sweitzen at cisco.com Thu Mar 8 10:30:59 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 8 Mar 2007 10:30:59 -0800 Subject: [ofa-general] RE: OFED 1.2 beta blocking bugs In-Reply-To: <45F054CC.8020107@ichips.intel.com> References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> <45F054CC.8020107@ichips.intel.com> Message-ID: > Is there a way to reproduce this with the standard linux > build? (I didn't > closely follow the original IPOIB HA threads, so I will look > back over those.) Possibly, but perhaps it requires the OFED ipoibtools code to reproduce. Please try OFED 1.2, it is not hard to compile with install.sh. Scott From sweitzen at cisco.com Thu Mar 8 10:31:59 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 8 Mar 2007 10:31:59 -0800 Subject: [ofa-general] RE: [ewg] RE: OFED 1.2 beta blocking bugs In-Reply-To: <45EF040F.3090305@veritas.com> References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> <45EF040F.3090305@veritas.com> Message-ID: No standalone script, sorry. But it's simple to automate with expect or autoexpect using the switch CLI. Scott > -----Original Message----- > From: somenath [mailto:somenath at veritas.com] > Sent: Wednesday, March 07, 2007 10:27 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Sean Hefty; Michael S. Tsirkin; Hefty, Sean; Pavel > Shamis; Openfabrics-ewg at openib.org; general at lists.openfabrics.org > Subject: Re: [ewg] RE: OFED 1.2 beta blocking bugs > > Scott: > > do you have a script to bring the cisco switch port up and down? > > thanks, som. > > Scott Weitzenkamp (sweitzen) wrote: > > >>I looked at the code, and didn't see anything obviously > >>wrong. Can you tell me > >>how to reproduce the issue? (I'm running 2.6.21-rc3.) > >> > >> > > > >Use a recent OFED 1.2 build, say OFED-1.2-20070308-0708. > > > >Configure IPoIB HA to use ib0 and ib1, for example. Make sure your > >openibd.conf has these lines: > > > >IPOIB_LOAD=yes > >SET_IPOIB_CM=yes > >IPOIBHA_ENABLE=yes > >PRIMARY_IPOIB_DEV=ib0 > >SECONDARY_IPOIB_DEV=ib1 > > > >After running "/etc/init.d/openibd restart", start some IPoIB traffic > >between the two hosts. > > > >Now start failing over IB ports, either by cable pulls or a > script that > >takes switch ports up and down. You should see the IP > address move back > >and forth between ib0 and ib1, and error messages in dmesg. > > > >Scott > > > >_______________________________________________ > >ewg mailing list > >ewg at lists.openfabrics.org > >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > > > > From rdreier at cisco.com Thu Mar 8 10:34:56 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Mar 2007 10:34:56 -0800 Subject: [ofa-general] Re: [ewg] Re: OFED 1.2 beta blocking bugs In-Reply-To: <45F054CC.8020107@ichips.intel.com> (Sean Hefty's message of "Thu, 08 Mar 2007 10:24:12 -0800") References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> <45F054CC.8020107@ichips.intel.com> Message-ID: > Is there a way to reproduce this with the standard linux build? (I > didn't closely follow the original IPOIB HA threads, so I will look > back over those.) Not sure what you're asking, but just to be clear, this IPoIB HA is entirely in userspace (it's a crazy perl script that ups and downs ports in response to various events). >From a quick look at the code, it does look like there are some races in ipoib_multicast.c. The place where a QP is actually attached to a group is essentially (trimming debug prints): if (test_and_set_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) return 0; ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), &mcast->mcmember.mgid); and the place where a QP is detached is: if (test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) { ret = ipoib_mcast_detach(dev, be16_to_cpu(mcast->mcmember.mlid), &mcast->mcmember.mgid); with no further locking. So it looks entirely possible for one thread to do the test_and_set_bit(), and then have another thread come in and do the test_and_clear_bit (which will show the bit as set) and call ipoib_mcast_detach() before the first thread has reached the actual call to ipoib_mcast_attach. Maybe the solution is just to take the mcast_mutex around the full operation. There's some hokey and very old stuff around the multicast attach and detach verbs calls too. I'll post a patch later today if I get a chance. Unfortunately I haven't really kept up with all the OFED built stuff -- does anyone know an easy way for Scott to take a kernel patch and rebuild his OFED install? - R. From rdreier at cisco.com Thu Mar 8 10:39:29 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Mar 2007 10:39:29 -0800 Subject: [ofa-general] RE: [ewg] RE: OFED 1.2 beta blocking bugs In-Reply-To: (Scott Weitzenkamp's message of "Thu, 8 Mar 2007 10:31:59 -0800") References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> <45EF040F.3090305@veritas.com> Message-ID: > No standalone script, sorry. But it's simple to automate with expect or > autoexpect using the switch CLI. Or you can use SNMP to set the interface admin status. Something like this one-liner for example (using snmpset from the NET SNMP package): snmpset -v2c -cprivate interfaces.ifTable.ifEntry.ifAdminStatus. = 2 where port # for an IB port is the real physical port # + 64 (eg port 19 would use port number 83 in the snmp request). (And yes this is another advantage of a managed switch beyond the embedded SM :) - R. From sean.hefty at intel.com Thu Mar 8 10:56:11 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 8 Mar 2007 10:56:11 -0800 Subject: [ofa-general] RE: [ewg] Re: OFED 1.2 beta blocking bugs In-Reply-To: Message-ID: <000001c761b3$70c16030$ff0da8c0@amr.corp.intel.com> >Not sure what you're asking, but just to be clear, this IPoIB HA is >entirely in userspace (it's a crazy perl script that ups and downs >ports in response to various events). Thanks - this helps. >From a quick look at the code, it does look like there are some races >in ipoib_multicast.c. The place where a QP is actually attached to a >group is essentially (trimming debug prints): > > if (test_and_set_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) > return 0; > > ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), > &mcast->mcmember.mgid); > >and the place where a QP is detached is: > > if (test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) { > ret = ipoib_mcast_detach(dev, be16_to_cpu(mcast->mcmember.mlid), > &mcast->mcmember.mgid); > >with no further locking. So it looks entirely possible for one thread >to do the test_and_set_bit(), and then have another thread come in and >do the test_and_clear_bit (which will show the bit as set) and call >ipoib_mcast_detach() before the first thread has reached the actual >call to ipoib_mcast_attach. I was looking at this part of the code as well, and this explains the error messages in the bug report: ib1: ib_detach_mcast failed (result = -22) ib1: ipoib_mcast_detach failed (result = -22) But it seems like this would leave the QP incorrectly attached to the multicast group. It's still not clear to me why we see the following message: ib1: dev_queue_xmit failed to requeue packet or why traffic stops. >Maybe the solution is just to take the mcast_mutex around the full >operation. There's some hokey and very old stuff around the multicast >attach and detach verbs calls too. ipoib_mcast_attach() is only called from the multicast module callback. If ib_sa_free_multicast() were called earlier in ipoib_mcast_leave(), it would block until the callback completed, which should avoid the race as well. I need to spend more time studying the code to see if this works in all cases. - Sean From mshefty at ichips.intel.com Thu Mar 8 10:24:12 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 08 Mar 2007 10:24:12 -0800 Subject: [ofa-general] Re: OFED 1.2 beta blocking bugs In-Reply-To: References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> Message-ID: <45F054CC.8020107@ichips.intel.com> > Use a recent OFED 1.2 build, say OFED-1.2-20070308-0708. > > Configure IPoIB HA to use ib0 and ib1, for example. Make sure your > openibd.conf has these lines: > > IPOIB_LOAD=yes > SET_IPOIB_CM=yes > IPOIBHA_ENABLE=yes > PRIMARY_IPOIB_DEV=ib0 > SECONDARY_IPOIB_DEV=ib1 > > After running "/etc/init.d/openibd restart", start some IPoIB traffic > between the two hosts. Is there a way to reproduce this with the standard linux build? (I didn't closely follow the original IPOIB HA threads, so I will look back over those.) - Sean From mshefty at ichips.intel.com Thu Mar 8 09:54:52 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 08 Mar 2007 09:54:52 -0800 Subject: [ofa-general] Re: OFED 1.2 beta blocking bugs In-Reply-To: <20070308140703.GC23302@mellanox.co.il> References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> Message-ID: <45F04DEC.10000@ichips.intel.com> > ipoib HA is just a script that brings interfaces up/down. Juding by the error > messages, we are trying to attach QP to a non-existent MC group. You > implemented both the multicast module and ported ipoib to it, so ... I looked at the code, and didn't see anything obviously wrong. Can you tell me how to reproduce the issue? (I'm running 2.6.21-rc3.) > Please feel free to take over and replace this patch with whatever you find fit. Reverting the misc update patch should work fine. (I can't get to bugzilla atm to see the error.) But there were also backport patches for the umca in OFED 1.1. Failing that, I have a pf-2.6.18 branch in my rdma-dev.git tree that backports most of the IB code to 2.6.18. - Sean From Koen.SEGERS at VRT.BE Thu Mar 8 11:12:57 2007 From: Koen.SEGERS at VRT.BE (SEGERS Koen) Date: Thu, 8 Mar 2007 20:12:57 +0100 Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS References: Message-ID: Then we received wrong information. The command allready gave a clue, but our cisco contact assured this is bonding and nog HA with active/passive. Thx for the information! -----Oorspronkelijk bericht----- Van: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Verzonden: do 8-3-2007 17:55 Aan: SEGERS Koen; general at lists.openfabrics.org Onderwerp: RE: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS "ipoibcfg merge" only handles IPoIB, not SDP, and it's active/passive. ________________________________ From: SEGERS Koen [mailto:Koen.SEGERS at VRT.BE] Sent: Thursday, March 08, 2007 1:03 AM To: Scott Weitzenkamp (sweitzen); general at lists.openfabrics.org Subject: RE: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS Are you talking about a kernel patch when you refer to the "bonding kernel driver"? I can't find a specific bonding command that allows bonding two or more ports. So if I understand it correct, with SDP you can't have redundancy (active/passive) or aggregation (active/active) with the current OFED-1.2 driver. Renaud Larsen of Cisco told us that bonding is possible in the Topspin driver with the "ipoibcfg merge" command. We are wondering if this also applies for SDP. That is why we are very interested in the beta drivers of Topspin! We are supposed to get them (from Renaud) within a few days, but if you can send it to me earlier, it is always better :) Greetings, Koen ________________________________ Van: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Verzonden: do 8/03/2007 0:01 Aan: SEGERS Koen; general at lists.openfabrics.org Onderwerp: RE: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS I have not tried the OFED 1.2 IPoIB bonding kernel driver, and can only speak for the userspace IPoIB HA ipoib_ha.pl script. Both Topspin IPoIB and OFED IPoIB have active/passive IPoIB high availability, neither can aggregate IPoIB throughput, and neither has SDP high availability. We will have Tosppin driver SLES10 drivers in beta soon, let me know if you are interested. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of SEGERS Koen Sent: Wednesday, March 07, 2007 6:59 AM To: general at lists.openfabrics.org Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS Hi all! We are trying to bond two ports on 1 HCA so that we are able aggregate the throughput. We are also interested in bonding ports of different HCA's. Is this possible with the OFED driver? If so, can you give the command? We know TopSpin has support for this feature. Sadly, Topspin has no driver that runs on our system (SLES 10). We currently installed OFED-1.2 of 20070306 and the stable OFED-1.1 driver, but we can't figure out how this bonding is started in either versions. It is important that we offload the bonding. We don't want to use the standard linux bonding. That is why we think that bonding over different HCA's is not going to work. Is this assumption correct? Is bonding possible when running SDP? And VERBS? Greetings Koen *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Thu Mar 8 11:13:49 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 08 Mar 2007 11:13:49 -0800 Subject: [ofa-general] Re: [ewg] Re: OFED 1.2 beta blocking bugs In-Reply-To: References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> <45F054CC.8020107@ichips.intel.com> Message-ID: <45F0606D.7050005@ichips.intel.com> >From a quick look at the code, it does look like there are some races > in ipoib_multicast.c. The place where a QP is actually attached to a > group is essentially (trimming debug prints): > > if (test_and_set_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) > return 0; > > ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), > &mcast->mcmember.mgid); > > and the place where a QP is detached is: > > if (test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) { > ret = ipoib_mcast_detach(dev, be16_to_cpu(mcast->mcmember.mlid), > &mcast->mcmember.mgid); Going back to 2.6.20 (pre-multicast changes), this area of the code looks like it has the same race. Was IPoIB HA testing done on 2.6.20 or earlier versions of the code, and if so, were any issues found? (I'm not sure we've found all of the problems yet.) - Sean From sweitzen at cisco.com Thu Mar 8 11:17:20 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 8 Mar 2007 11:17:20 -0800 Subject: [ofa-general] RE: [ewg] Re: OFED 1.2 beta blocking bugs In-Reply-To: <45F0606D.7050005@ichips.intel.com> References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> <45F054CC.8020107@ichips.intel.com> <45F0606D.7050005@ichips.intel.com> Message-ID: > Going back to 2.6.20 (pre-multicast changes), this area of > the code looks like > it has the same race. Was IPoIB HA testing done on 2.6.20 or > earlier versions > of the code, and if so, were any issues found? (I'm not sure > we've found all of > the problems yet.) No testing was done with 2.6.20 AFAIK. Scott From mst at mellanox.co.il Thu Mar 8 11:25:30 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Mar 2007 21:25:30 +0200 Subject: [ofa-general] Re: OFED 1.2 beta blocking bugs In-Reply-To: <45F04DEC.10000@ichips.intel.com> References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> Message-ID: <20070308192530.GD17114@mellanox.co.il> > > Please feel free to take over and replace this patch with whatever you find fit. > > Reverting the misc update patch should work fine. (I can't get to bugzilla atm > to see the error.) But there were also backport patches for the umca in OFED > 1.1. That code was different: it was building without patches on 2.6.19. > Failing that, I have a pf-2.6.18 branch in my rdma-dev.git tree that > backports most of the IB code to 2.6.18. Post a replacement to 2_misc_device_to_2_6_19.patch, we'll test. -- MST From chas at cmf.nrl.navy.mil Thu Mar 8 12:05:28 2007 From: chas at cmf.nrl.navy.mil (chas williams - CONTRACTOR) Date: Thu, 08 Mar 2007 15:05:28 -0500 Subject: [ofa-general] ipoib performance (and xplot) In-Reply-To: Message-ID: <200703082005.l28K5SVO017312@cmf.nrl.navy.mil> you capture the transaction with tcpdump and use a perl script provided with xplot to convert it into something xplot can graph. megabits of course. so yeah, its pretty awful. and indeed the recv end of the tcp connection has saturated the cpu at 100%. apparently, this issue is further complicated by my architecture, the ia64. there doesnt seem to be an accelerated checksum/copy_to_user. so the ia64 makes two passes through the data. i should have suspected this. so the xplot graph was right. the ack's are coming back slow. thanks! In message ,"Talpey, Thomas " writes: >Interesting data. What app are you using to generate the TCP flow, >and what options are you using on it? Also, what are the scales on >the x- and y-axes (seconds and decimal kilobytes)? I have some >comments but they are only speculation without knowing this. > >By "800Mb/s" do you megabytes or megabits? For ipoib, 800 megabytes/s >(MB/s) seems very high and 800 megabits/s (Mb/s) seems very low. In my >experience it gets 200-300 megabytes before running out of cpu (checksum >calculations mainly). But I haven't looked at it in a while. > >Tom. From halr at voltaire.com Thu Mar 8 12:25:49 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 08 Mar 2007 15:25:49 -0500 Subject: [ofa-general] RE: [ewg] RE: OFED 1.2 beta blocking bugs In-Reply-To: References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> <45EF040F.3090305@veritas.com> Message-ID: <1173385548.465.104193.camel@hal.voltaire.com> On Thu, 2007-03-08 at 13:39, Roland Dreier wrote: > > No standalone script, sorry. But it's simple to automate with expect or > > autoexpect using the switch CLI. > > Or you can use SNMP to set the interface admin status. One can alter the IB port states with the diagnostics and perhaps get a similar thing. > Something like > this one-liner for example (using snmpset from the NET SNMP package): > > snmpset -v2c -cprivate interfaces.ifTable.ifEntry.ifAdminStatus. = 2 > > where port # for an IB port is the real physical port # + 64 (eg port > 19 would use port number 83 in the snmp request). > > (And yes this is another advantage of a managed switch beyond the > embedded SM :) Are you saying MIBs are only supported on "managed" switches ? You certainly don't need an embedded SM for this... -- Hal > - R. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Thu Mar 8 12:31:44 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Mar 2007 12:31:44 -0800 Subject: [ofa-general] RE: [ewg] RE: OFED 1.2 beta blocking bugs In-Reply-To: <1173385548.465.104193.camel@hal.voltaire.com> (Hal Rosenstock's message of "08 Mar 2007 15:25:49 -0500") References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> <45EF040F.3090305@veritas.com> <1173385548.465.104193.camel@hal.voltaire.com> Message-ID: Hal> Are you saying MIBs are only supported on "managed" switches Hal> ? You certainly don't need an embedded SM for this... Yes, I would say that any switch with SNMP support is "managed", almost by definition. Of course not all managed switches necessarily have an embedded SM, just as managed ethernet switches have different feature sets (L2 vs. L3, etc). In fact I was trying to tie this into the earlier thread and say that the advantages of having a managed switch go beyond having an SM, and it is useful to have a managed switch even if you don't want an embedded SM. - R. From sean.hefty at intel.com Thu Mar 8 12:31:53 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 8 Mar 2007 12:31:53 -0800 Subject: [ofa-general] [PATCH]] ucma backport to 2.6.19 In-Reply-To: <20070308192530.GD17114@mellanox.co.il> Message-ID: <000101c761c0$cf0d7810$ff0da8c0@amr.corp.intel.com> >Post a replacement to 2_misc_device_to_2_6_19.patch, we'll test. I did not test this patch, but you can try replacing the contents of the 2_misc_device_to_2_6_19.patch with the changes below. (It's possible that this may lead to some conflict further down in the patch chain...) The function prototype for show_abi_version changed between 2.6.20 to 2.6.19; this was the missing piece in the original backport patch. I would have expected a build warning for this. Signed-off-by: Sean Hefty --- --- ofa_kernel-1.2/drivers/infiniband/core/ucma.c 2007-03-08 12:11:37.000000000 -0800 +++ b/drivers/infiniband/core/ucma.c 2007-03-08 12:13:13.000000000 -0800 @@ -847,13 +847,11 @@ static struct miscdevice ucma_misc = { .fops = &ucma_fops, }; -static ssize_t show_abi_version(struct device *dev, - struct device_attribute *attr, - char *buf) +static ssize_t show_abi_version(struct class_device *class_dev, char *buf) { return sprintf(buf, "%d\n", RDMA_USER_CM_ABI_VERSION); } -static DEVICE_ATTR(abi_version, S_IRUGO, show_abi_version, NULL); +static CLASS_DEVICE_ATTR(abi_version, S_IRUGO, show_abi_version, NULL); static int __init ucma_init(void) { @@ -863,7 +861,8 @@ static int __init ucma_init(void) if (ret) return ret; - ret = device_create_file(ucma_misc.this_device, &dev_attr_abi_version); + ret = class_device_create_file(ucma_misc.class, + &class_device_attr_abi_version); if (ret) { printk(KERN_ERR "rdma_ucm: couldn't create abi_version attr\n"); goto err; @@ -876,7 +875,8 @@ err: static void __exit ucma_cleanup(void) { - device_remove_file(ucma_misc.this_device, &dev_attr_abi_version); + class_device_remove_file(ucma_misc.class, + &class_device_attr_abi_version); misc_deregister(&ucma_misc); idr_destroy(&ctx_idr); } From sweitzen at cisco.com Thu Mar 8 13:20:41 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 8 Mar 2007 13:20:41 -0800 Subject: [ofa-general] RE: OFED 1.2 beta blocking bugs In-Reply-To: <45F04DEC.10000@ichips.intel.com> References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> Message-ID: There are also IPoIB CM IP multicast problems, see bug 418. If you try to multicast packets > 2KB, you see: ib0: failed send event (status=1, wrid=35 vend_err 69) ib_mthca 0000:08:00.0: modify QP 3->3 returned status 10. ib0: failed to modify QP, ret = -22 ib0: couldn't attach QP to multicast group ff12:401b:ffff:0000:0000:0000:0001:01 01 ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:0001:0101, status - 22 Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Thursday, March 08, 2007 9:55 AM > To: Michael S. Tsirkin > Cc: Scott Weitzenkamp (sweitzen); Openfabrics-ewg at openib.org; > Vladimir Sokolovsky; Tziporet Koren; Pavel Shamis; Jeff > Squyres (jsquyres); Shaun Rowland; Woodruff, Robert J; > general at lists.openfabrics.org > Subject: Re: OFED 1.2 beta blocking bugs > > > ipoib HA is just a script that brings interfaces up/down. > Juding by the error > > messages, we are trying to attach QP to a non-existent MC > group. You > > implemented both the multicast module and ported ipoib to it, so ... > > I looked at the code, and didn't see anything obviously > wrong. Can you tell me > how to reproduce the issue? (I'm running 2.6.21-rc3.) > > > Please feel free to take over and replace this patch with > whatever you find fit. > > Reverting the misc update patch should work fine. (I can't > get to bugzilla atm > to see the error.) But there were also backport patches for > the umca in OFED > 1.1. Failing that, I have a pf-2.6.18 branch in my > rdma-dev.git tree that > backports most of the IB code to 2.6.18. > > - Sean > From sean.hefty at intel.com Thu Mar 8 14:09:30 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 8 Mar 2007 14:09:30 -0800 Subject: [ofa-general] RE: OFED 1.2 beta blocking bugs In-Reply-To: Message-ID: <000201c761ce$71f2e710$ff0da8c0@amr.corp.intel.com> >There are also IPoIB CM IP multicast problems, see bug 418. Bug 418 looks different than bug 400. >From the bug report, it sounds like this error is limited to IPoIB CM mode. Is this correct? >If you try to multicast packets > 2KB, you see: I'm not sure if your hardware supports a max MTU of 2K, but in general I thought multicast would work up to 4K. Can you verify that the device MTU is higher than the packet size? >ib0: failed send event (status=1, wrid=35 vend_err 69) It looks like this indicates a local length error on the send. >ib_mthca 0000:08:00.0: modify QP 3->3 returned status 10. >ib0: failed to modify QP, ret = -22 >ib0: couldn't attach QP to multicast group This looks like a RTS -> RTS QP transition to set the QKey. I'm not sure what the ib_mthca status code of 10 is, but that may give us a hint at the problem. This error and the attach QP to multicast group error may be related. - Sean From sweitzen at cisco.com Thu Mar 8 14:19:51 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 8 Mar 2007 14:19:51 -0800 Subject: [ofa-general] RE: OFED 1.2 beta blocking bugs In-Reply-To: <000201c761ce$71f2e710$ff0da8c0@amr.corp.intel.com> References: <000201c761ce$71f2e710$ff0da8c0@amr.corp.intel.com> Message-ID: Yes, it's limited to IPoIB CM. I'm talking about IP multicast not IB multicast, so the hardware MTU should be transparent. I'll reopen bug 418, who shall I assign it to? Scott > -----Original Message----- > From: Sean Hefty [mailto:sean.hefty at intel.com] > Sent: Thursday, March 08, 2007 2:10 PM > To: Scott Weitzenkamp (sweitzen); Sean Hefty; Michael S. Tsirkin > Cc: Pavel Shamis; Openfabrics-ewg at openib.org; > general at lists.openfabrics.org; Vladimir Sokolovsky > Subject: RE: [ofa-general] RE: OFED 1.2 beta blocking bugs > > >There are also IPoIB CM IP multicast problems, see bug 418. > > Bug 418 looks different than bug 400. > > From the bug report, it sounds like this error is limited to > IPoIB CM mode. Is > this correct? > > >If you try to multicast packets > 2KB, you see: > > I'm not sure if your hardware supports a max MTU of 2K, but > in general I thought > multicast would work up to 4K. Can you verify that the > device MTU is higher > than the packet size? > > >ib0: failed send event (status=1, wrid=35 vend_err 69) > > It looks like this indicates a local length error on the send. > > >ib_mthca 0000:08:00.0: modify QP 3->3 returned status 10. > >ib0: failed to modify QP, ret = -22 > >ib0: couldn't attach QP to multicast group > > This looks like a RTS -> RTS QP transition to set the QKey. > I'm not sure what > the ib_mthca status code of 10 is, but that may give us a > hint at the problem. > This error and the attach QP to multicast group error may be related. > > - Sean > From mshefty at ichips.intel.com Thu Mar 8 14:34:30 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 08 Mar 2007 14:34:30 -0800 Subject: [ofa-general] RE: OFED 1.2 beta blocking bugs In-Reply-To: References: <000201c761ce$71f2e710$ff0da8c0@amr.corp.intel.com> Message-ID: <45F08F76.60407@ichips.intel.com> Scott Weitzenkamp (sweitzen) wrote: > Yes, it's limited to IPoIB CM. > > I'm talking about IP multicast not IB multicast, so the hardware MTU > should be transparent. > > I'll reopen bug 418, who shall I assign it to? I think Michael owns the IPoIB CM code, so it should probable be assigned to him. I will look into it to see if there's an issue with the multicast code though. - Sean From halr at voltaire.com Thu Mar 8 14:49:15 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 08 Mar 2007 17:49:15 -0500 Subject: [ofa-general] RE: OFED 1.2 beta blocking bugs In-Reply-To: References: <000201c761ce$71f2e710$ff0da8c0@amr.corp.intel.com> Message-ID: <1173394109.465.112696.camel@hal.voltaire.com> On Thu, 2007-03-08 at 17:19, Scott Weitzenkamp (sweitzen) wrote: > Yes, it's limited to IPoIB CM. > > I'm talking about IP multicast not IB multicast, so the hardware MTU > should be transparent. Doesn't IPoIB CM only support IP unicast ? IP multicast should fall back to using normal IPoIB (UD) rather than CM (RC). -- Hal > I'll reopen bug 418, who shall I assign it to? > > Scott > > > -----Original Message----- > > From: Sean Hefty [mailto:sean.hefty at intel.com] > > Sent: Thursday, March 08, 2007 2:10 PM > > To: Scott Weitzenkamp (sweitzen); Sean Hefty; Michael S. Tsirkin > > Cc: Pavel Shamis; Openfabrics-ewg at openib.org; > > general at lists.openfabrics.org; Vladimir Sokolovsky > > Subject: RE: [ofa-general] RE: OFED 1.2 beta blocking bugs > > > > >There are also IPoIB CM IP multicast problems, see bug 418. > > > > Bug 418 looks different than bug 400. > > > > From the bug report, it sounds like this error is limited to > > IPoIB CM mode. Is > > this correct? > > > > >If you try to multicast packets > 2KB, you see: > > > > I'm not sure if your hardware supports a max MTU of 2K, but > > in general I thought > > multicast would work up to 4K. Can you verify that the > > device MTU is higher > > than the packet size? > > > > >ib0: failed send event (status=1, wrid=35 vend_err 69) > > > > It looks like this indicates a local length error on the send. > > > > >ib_mthca 0000:08:00.0: modify QP 3->3 returned status 10. > > >ib0: failed to modify QP, ret = -22 > > >ib0: couldn't attach QP to multicast group > > > > This looks like a RTS -> RTS QP transition to set the QKey. > > I'm not sure what > > the ib_mthca status code of 10 is, but that may give us a > > hint at the problem. > > This error and the attach QP to multicast group error may be related. > > > > - Sean > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sweitzen at cisco.com Thu Mar 8 14:52:29 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 8 Mar 2007 14:52:29 -0800 Subject: [ofa-general] RE: OFED 1.2 beta blocking bugs In-Reply-To: <1173394109.465.112696.camel@hal.voltaire.com> References: <000201c761ce$71f2e710$ff0da8c0@amr.corp.intel.com> <1173394109.465.112696.camel@hal.voltaire.com> Message-ID: > Doesn't IPoIB CM only support IP unicast ? IP multicast > should fall back > to using normal IPoIB (UD) rather than CM (RC). And that's the problem, it does not fallback. Scott From mshefty at ichips.intel.com Thu Mar 8 14:56:55 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 08 Mar 2007 14:56:55 -0800 Subject: [ofa-general] RE: OFED 1.2 beta blocking bugs In-Reply-To: References: <000201c761ce$71f2e710$ff0da8c0@amr.corp.intel.com> Message-ID: <45F094B7.7080406@ichips.intel.com> > I'm talking about IP multicast not IB multicast, so the hardware MTU > should be transparent. The IP multicast message would need fragmentation though if the hardware MTU were too small. If fragmentation wasn't occurring, then getting a local length error might be expected. I'm not familiar with the IPoIB CM code (looking at it now), but I wouldn't expect the CM related code to affect multicast traffic. - Sean From rdreier at cisco.com Thu Mar 8 15:36:30 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Mar 2007 15:36:30 -0800 Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish In-Reply-To: <20070307042653.GA17273@obsidianresearch.com> (Jason Gunthorpe's message of "Tue, 6 Mar 2007 21:26:53 -0700") References: <20070307010208.GO11411@obsidianresearch.com> <20070307042653.GA17273@obsidianresearch.com> Message-ID: > Don't get me wrong, I think your patch is ultimately the right way to > go, but it needs another part to address the problem IPv6 has - or at > least a plan on how to address it. I don't think ignoring > the synchronizing problem is the way to go. OK, I think I've convinced myself that the gain from this patch is worth the small risk of addr autoconf breakage. So I'll apply it for now. > Also, in my view, the problem you are seeing with MLID exhaustion is > purely a SM problem and has nothing to do with IPoIB and switch > limits. SMs need to treat MLIDs as a precious resource and share them > agressively. Especially IPv6 solicited node multicast addresses. Agree -- this patch is definitely a workaround for broken fabrics -- but most (all?) current SMs don't deal with MLID allocation correctly. - R. From rdreier at cisco.com Thu Mar 8 15:50:17 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Mar 2007 15:50:17 -0800 Subject: [ofa-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get various post-rc3 fixes: David Miller (1): RDMA/cxgb3: Fix build on sparc64 Hoang-Nam Nguyen (1): IB/ehca: Fix sync between completion handler and destroy cq Roland Dreier (2): IPoIB: Only handle async events for one port IB/mthca: Fix error path in mthca_alloc_memfree() Sean Hefty (2): RDMA/cma: Initialize rdma_bind_list in cma_alloc_any_port() RDMA/ucma: Avoid sending reject if backlog is full Shirley Ma (1): IPoIB: Turn on interface's carrier after broadcast group is joined Steve Wise (8): RDMA/cxgb3: Start ep timer on a MPA reject RDMA/cxgb3: Don't use mm after it's freed in iwch_mmap() RDMA/cxgb3: Fixes for "normal close" failures RDMA/cxgb3: Move QP to error on destroy if the state is IDLE RDMA/cxgb3: Stop EP timer when MPA exchange is aborted by peer RDMA/cxgb3: Squelch logging AE errors RDMA/cxgb3: Don't reuse skbs that are non-linear or cloned RDMA/cxgb3: Fix MR permission problems drivers/infiniband/core/cma.c | 2 +- drivers/infiniband/core/ucma.c | 2 +- drivers/infiniband/hw/cxgb3/cxio_hal.c | 1 + drivers/infiniband/hw/cxgb3/iwch_cm.c | 19 ++++--- drivers/infiniband/hw/cxgb3/iwch_ev.c | 12 ++-- drivers/infiniband/hw/cxgb3/iwch_provider.c | 40 +++++----------- drivers/infiniband/hw/cxgb3/iwch_provider.h | 33 +++++-------- drivers/infiniband/hw/cxgb3/iwch_qp.c | 2 +- drivers/infiniband/hw/ehca/ehca_classes.h | 6 ++- drivers/infiniband/hw/ehca/ehca_cq.c | 16 ++++++- drivers/infiniband/hw/ehca/ehca_irq.c | 59 ++++++++++++++++-------- drivers/infiniband/hw/ehca/ehca_main.c | 4 +- drivers/infiniband/hw/mthca/mthca_qp.c | 10 ++-- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 5 ++- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 13 +++-- 15 files changed, 122 insertions(+), 102 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d441815..fde92ce 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1821,7 +1821,7 @@ static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv, struct rdma_bind_list *bind_list; int port, ret; - bind_list = kmalloc(sizeof *bind_list, GFP_KERNEL); + bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL); if (!bind_list) return -ENOMEM; diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index b516b93..c859134 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -266,7 +266,7 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id, mutex_lock(&ctx->file->mut); if (event->event == RDMA_CM_EVENT_CONNECT_REQUEST) { if (!ctx->backlog) { - ret = -EDQUOT; + ret = -ENOMEM; kfree(uevent); goto out; } diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c index d737c73..818cf1a 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c @@ -36,6 +36,7 @@ #include #include #include +#include #include "cxio_resource.h" #include "cxio_hal.h" diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index b21fde8..d0ed1d3 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -305,8 +305,7 @@ static int status2errno(int status) */ static struct sk_buff *get_skb(struct sk_buff *skb, int len, gfp_t gfp) { - if (skb) { - BUG_ON(skb_cloned(skb)); + if (skb && !skb_is_nonlinear(skb) && !skb_cloned(skb)) { skb_trim(skb, 0); skb_get(skb); } else { @@ -1415,6 +1414,7 @@ static int peer_close(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) wake_up(&ep->com.waitq); break; case FPDU_MODE: + start_ep_timer(ep); __state_set(&ep->com, CLOSING); attrs.next_state = IWCH_QP_STATE_CLOSING; iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, @@ -1425,7 +1425,6 @@ static int peer_close(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) disconnect = 0; break; case CLOSING: - start_ep_timer(ep); __state_set(&ep->com, MORIBUND); disconnect = 0; break; @@ -1487,8 +1486,10 @@ static int peer_abort(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) case CONNECTING: break; case MPA_REQ_WAIT: + stop_ep_timer(ep); break; case MPA_REQ_SENT: + stop_ep_timer(ep); connect_reply_upcall(ep, -ECONNRESET); break; case MPA_REP_SENT: @@ -1507,9 +1508,10 @@ static int peer_abort(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) get_ep(&ep->com); break; case MORIBUND: + case CLOSING: stop_ep_timer(ep); + /*FALLTHROUGH*/ case FPDU_MODE: - case CLOSING: if (ep->com.cm_id && ep->com.qp) { attrs.next_state = IWCH_QP_STATE_ERROR; ret = iwch_modify_qp(ep->com.qp->rhp, @@ -1570,7 +1572,6 @@ static int close_con_rpl(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) spin_lock_irqsave(&ep->com.lock, flags); switch (ep->com.state) { case CLOSING: - start_ep_timer(ep); __state_set(&ep->com, MORIBUND); break; case MORIBUND: @@ -1586,6 +1587,8 @@ static int close_con_rpl(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) __state_set(&ep->com, DEAD); release = 1; break; + case ABORTING: + break; case DEAD: default: BUG_ON(1); @@ -1659,6 +1662,7 @@ static void ep_timeout(unsigned long arg) break; case MPA_REQ_WAIT: break; + case CLOSING: case MORIBUND: if (ep->com.cm_id && ep->com.qp) { attrs.next_state = IWCH_QP_STATE_ERROR; @@ -1687,12 +1691,11 @@ int iwch_reject_cr(struct iw_cm_id *cm_id, const void *pdata, u8 pdata_len) return -ECONNRESET; } BUG_ON(state_read(&ep->com) != MPA_REQ_RCVD); - state_set(&ep->com, CLOSING); if (mpa_rev == 0) abort_connection(ep, NULL, GFP_KERNEL); else { err = send_mpa_reject(ep, pdata, pdata_len); - err = send_halfclose(ep, GFP_KERNEL); + err = iwch_ep_disconnect(ep, 0, GFP_KERNEL); } return 0; } @@ -1957,11 +1960,11 @@ int iwch_ep_disconnect(struct iwch_ep *ep, int abrupt, gfp_t gfp) case MPA_REQ_RCVD: case MPA_REP_SENT: case FPDU_MODE: + start_ep_timer(ep); ep->com.state = CLOSING; close = 1; break; case CLOSING: - start_ep_timer(ep); ep->com.state = MORIBUND; close = 1; break; diff --git a/drivers/infiniband/hw/cxgb3/iwch_ev.c b/drivers/infiniband/hw/cxgb3/iwch_ev.c index 54362af..b406766 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_ev.c +++ b/drivers/infiniband/hw/cxgb3/iwch_ev.c @@ -47,12 +47,6 @@ static void post_qp_event(struct iwch_dev *rnicp, struct iwch_cq *chp, struct iwch_qp_attributes attrs; struct iwch_qp *qhp; - printk(KERN_ERR "%s - AE qpid 0x%x opcode %d status 0x%x " - "type %d wrid.hi 0x%x wrid.lo 0x%x \n", __FUNCTION__, - CQE_QPID(rsp_msg->cqe), CQE_OPCODE(rsp_msg->cqe), - CQE_STATUS(rsp_msg->cqe), CQE_TYPE(rsp_msg->cqe), - CQE_WRID_HI(rsp_msg->cqe), CQE_WRID_LOW(rsp_msg->cqe)); - spin_lock(&rnicp->lock); qhp = get_qhp(rnicp, CQE_QPID(rsp_msg->cqe)); @@ -73,6 +67,12 @@ static void post_qp_event(struct iwch_dev *rnicp, struct iwch_cq *chp, return; } + printk(KERN_ERR "%s - AE qpid 0x%x opcode %d status 0x%x " + "type %d wrid.hi 0x%x wrid.lo 0x%x \n", __FUNCTION__, + CQE_QPID(rsp_msg->cqe), CQE_OPCODE(rsp_msg->cqe), + CQE_STATUS(rsp_msg->cqe), CQE_TYPE(rsp_msg->cqe), + CQE_WRID_HI(rsp_msg->cqe), CQE_WRID_LOW(rsp_msg->cqe)); + atomic_inc(&qhp->refcnt); spin_unlock(&rnicp->lock); diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 9947a14..f2774ae 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -331,6 +331,7 @@ static int iwch_mmap(struct ib_ucontext *context, struct vm_area_struct *vma) int ret = 0; struct iwch_mm_entry *mm; struct iwch_ucontext *ucontext; + u64 addr; PDBG("%s pgoff 0x%lx key 0x%x len %d\n", __FUNCTION__, vma->vm_pgoff, key, len); @@ -345,10 +346,11 @@ static int iwch_mmap(struct ib_ucontext *context, struct vm_area_struct *vma) mm = remove_mmap(ucontext, key, len); if (!mm) return -EINVAL; + addr = mm->addr; kfree(mm); - if ((mm->addr >= rdev_p->rnic_info.udbell_physbase) && - (mm->addr < (rdev_p->rnic_info.udbell_physbase + + if ((addr >= rdev_p->rnic_info.udbell_physbase) && + (addr < (rdev_p->rnic_info.udbell_physbase + rdev_p->rnic_info.udbell_len))) { /* @@ -362,7 +364,7 @@ static int iwch_mmap(struct ib_ucontext *context, struct vm_area_struct *vma) vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND; vma->vm_flags &= ~VM_MAYREAD; ret = io_remap_pfn_range(vma, vma->vm_start, - mm->addr >> PAGE_SHIFT, + addr >> PAGE_SHIFT, len, vma->vm_page_prot); } else { @@ -370,7 +372,7 @@ static int iwch_mmap(struct ib_ucontext *context, struct vm_area_struct *vma) * Map WQ or CQ contig dma memory... */ ret = remap_pfn_range(vma, vma->vm_start, - mm->addr >> PAGE_SHIFT, + addr >> PAGE_SHIFT, len, vma->vm_page_prot); } @@ -463,9 +465,6 @@ static struct ib_mr *iwch_register_phys_mem(struct ib_pd *pd, php = to_iwch_pd(pd); rhp = php->rhp; - acc = iwch_convert_access(acc); - - mhp = kzalloc(sizeof(*mhp), GFP_KERNEL); if (!mhp) return ERR_PTR(-ENOMEM); @@ -491,12 +490,7 @@ static struct ib_mr *iwch_register_phys_mem(struct ib_pd *pd, mhp->attr.pdid = php->pdid; mhp->attr.zbva = 0; - /* NOTE: TPT perms are backwards from BIND WR perms! */ - mhp->attr.perms = (acc & 0x1) << 3; - mhp->attr.perms |= (acc & 0x2) << 1; - mhp->attr.perms |= (acc & 0x4) >> 1; - mhp->attr.perms |= (acc & 0x8) >> 3; - + mhp->attr.perms = iwch_ib_to_tpt_access(acc); mhp->attr.va_fbo = *iova_start; mhp->attr.page_size = shift - 12; @@ -525,7 +519,6 @@ static int iwch_reregister_phys_mem(struct ib_mr *mr, struct iwch_mr mh, *mhp; struct iwch_pd *php; struct iwch_dev *rhp; - int new_acc; __be64 *page_list = NULL; int shift = 0; u64 total_size; @@ -546,14 +539,12 @@ static int iwch_reregister_phys_mem(struct ib_mr *mr, if (rhp != php->rhp) return -EINVAL; - new_acc = mhp->attr.perms; - memcpy(&mh, mhp, sizeof *mhp); if (mr_rereg_mask & IB_MR_REREG_PD) php = to_iwch_pd(pd); if (mr_rereg_mask & IB_MR_REREG_ACCESS) - mh.attr.perms = iwch_convert_access(acc); + mh.attr.perms = iwch_ib_to_tpt_access(acc); if (mr_rereg_mask & IB_MR_REREG_TRANS) ret = build_phys_page_list(buffer_list, num_phys_buf, iova_start, @@ -568,7 +559,7 @@ static int iwch_reregister_phys_mem(struct ib_mr *mr, if (mr_rereg_mask & IB_MR_REREG_PD) mhp->attr.pdid = php->pdid; if (mr_rereg_mask & IB_MR_REREG_ACCESS) - mhp->attr.perms = acc; + mhp->attr.perms = iwch_ib_to_tpt_access(acc); if (mr_rereg_mask & IB_MR_REREG_TRANS) { mhp->attr.zbva = 0; mhp->attr.va_fbo = *iova_start; @@ -613,8 +604,6 @@ static struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, struct ib_umem *region, goto err; } - acc = iwch_convert_access(acc); - i = n = 0; list_for_each_entry(chunk, ®ion->chunk_list, list) @@ -630,10 +619,7 @@ static struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, struct ib_umem *region, mhp->rhp = rhp; mhp->attr.pdid = php->pdid; mhp->attr.zbva = 0; - mhp->attr.perms = (acc & 0x1) << 3; - mhp->attr.perms |= (acc & 0x2) << 1; - mhp->attr.perms |= (acc & 0x4) >> 1; - mhp->attr.perms |= (acc & 0x8) >> 3; + mhp->attr.perms = iwch_ib_to_tpt_access(acc); mhp->attr.va_fbo = region->virt_base; mhp->attr.page_size = shift - 12; mhp->attr.len = (u32) region->length; @@ -736,10 +722,8 @@ static int iwch_destroy_qp(struct ib_qp *ib_qp) qhp = to_iwch_qp(ib_qp); rhp = qhp->rhp; - if (qhp->attr.state == IWCH_QP_STATE_RTS) { - attrs.next_state = IWCH_QP_STATE_ERROR; - iwch_modify_qp(rhp, qhp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 0); - } + attrs.next_state = IWCH_QP_STATE_ERROR; + iwch_modify_qp(rhp, qhp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 0); wait_event(qhp->wait, !qhp->ep); remove_handle(rhp, &rhp->qpidr, qhp->wq.qpid); diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index de0fe1b..93bcc56 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -286,27 +286,20 @@ static inline int iwch_convert_state(enum ib_qp_state ib_state) } } -enum iwch_mem_perms { - IWCH_MEM_ACCESS_LOCAL_READ = 1 << 0, - IWCH_MEM_ACCESS_LOCAL_WRITE = 1 << 1, - IWCH_MEM_ACCESS_REMOTE_READ = 1 << 2, - IWCH_MEM_ACCESS_REMOTE_WRITE = 1 << 3, - IWCH_MEM_ACCESS_ATOMICS = 1 << 4, - IWCH_MEM_ACCESS_BINDING = 1 << 5, - IWCH_MEM_ACCESS_LOCAL = - (IWCH_MEM_ACCESS_LOCAL_READ | IWCH_MEM_ACCESS_LOCAL_WRITE), - IWCH_MEM_ACCESS_REMOTE = - (IWCH_MEM_ACCESS_REMOTE_WRITE | IWCH_MEM_ACCESS_REMOTE_READ) - /* cannot go beyond 1 << 31 */ -} __attribute__ ((packed)); - -static inline u32 iwch_convert_access(int acc) +static inline u32 iwch_ib_to_tpt_access(int acc) { - return (acc & IB_ACCESS_REMOTE_WRITE ? IWCH_MEM_ACCESS_REMOTE_WRITE : 0) - | (acc & IB_ACCESS_REMOTE_READ ? IWCH_MEM_ACCESS_REMOTE_READ : 0) | - (acc & IB_ACCESS_LOCAL_WRITE ? IWCH_MEM_ACCESS_LOCAL_WRITE : 0) | - (acc & IB_ACCESS_MW_BIND ? IWCH_MEM_ACCESS_BINDING : 0) | - IWCH_MEM_ACCESS_LOCAL_READ; + return (acc & IB_ACCESS_REMOTE_WRITE ? TPT_REMOTE_WRITE : 0) | + (acc & IB_ACCESS_REMOTE_READ ? TPT_REMOTE_READ : 0) | + (acc & IB_ACCESS_LOCAL_WRITE ? TPT_LOCAL_WRITE : 0) | + TPT_LOCAL_READ; +} + +static inline u32 iwch_ib_to_mwbind_access(int acc) +{ + return (acc & IB_ACCESS_REMOTE_WRITE ? T3_MEM_ACCESS_REM_WRITE : 0) | + (acc & IB_ACCESS_REMOTE_READ ? T3_MEM_ACCESS_REM_READ : 0) | + (acc & IB_ACCESS_LOCAL_WRITE ? T3_MEM_ACCESS_LOCAL_WRITE : 0) | + T3_MEM_ACCESS_LOCAL_READ; } enum iwch_mmid_state { diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index 9ea00cc..0a472c9 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -439,7 +439,7 @@ int iwch_bind_mw(struct ib_qp *qp, wqe->bind.type = T3_VA_BASED_TO; /* TBD: check perms */ - wqe->bind.perms = iwch_convert_access(mw_bind->mw_access_flags); + wqe->bind.perms = iwch_ib_to_mwbind_access(mw_bind->mw_access_flags); wqe->bind.mr_stag = cpu_to_be32(mw_bind->mr->lkey); wqe->bind.mw_stag = cpu_to_be32(mw->rkey); wqe->bind.mw_len = cpu_to_be32(mw_bind->length); diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h index 40404c9..82ded44 100644 --- a/drivers/infiniband/hw/ehca/ehca_classes.h +++ b/drivers/infiniband/hw/ehca/ehca_classes.h @@ -52,6 +52,8 @@ struct ehca_mw; struct ehca_pd; struct ehca_av; +#include + #include #include @@ -153,7 +155,9 @@ struct ehca_cq { spinlock_t cb_lock; struct hlist_head qp_hashtab[QP_HASHTAB_LEN]; struct list_head entry; - u32 nr_callbacks; + u32 nr_callbacks; /* #events assigned to cpu by scaling code */ + u32 nr_events; /* #events seen */ + wait_queue_head_t wait_completion; spinlock_t task_lock; u32 ownpid; /* mmap counter for resources mapped into user space */ diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c index 6ebfa27..e2cdc1a 100644 --- a/drivers/infiniband/hw/ehca/ehca_cq.c +++ b/drivers/infiniband/hw/ehca/ehca_cq.c @@ -146,6 +146,7 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, spin_lock_init(&my_cq->spinlock); spin_lock_init(&my_cq->cb_lock); spin_lock_init(&my_cq->task_lock); + init_waitqueue_head(&my_cq->wait_completion); my_cq->ownpid = current->tgid; cq = &my_cq->ib_cq; @@ -302,6 +303,16 @@ create_cq_exit1: return cq; } +static int get_cq_nr_events(struct ehca_cq *my_cq) +{ + int ret; + unsigned long flags; + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + ret = my_cq->nr_events; + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + return ret; +} + int ehca_destroy_cq(struct ib_cq *cq) { u64 h_ret; @@ -329,10 +340,11 @@ int ehca_destroy_cq(struct ib_cq *cq) } spin_lock_irqsave(&ehca_cq_idr_lock, flags); - while (my_cq->nr_callbacks) { + while (my_cq->nr_events) { spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); - yield(); + wait_event(my_cq->wait_completion, !get_cq_nr_events(my_cq)); spin_lock_irqsave(&ehca_cq_idr_lock, flags); + /* recheck nr_events to assure no cqe has just arrived */ } idr_remove(&ehca_cq_idr, my_cq->token); diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index 3ec53c6..20f36bf 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -404,10 +404,11 @@ static inline void process_eqe(struct ehca_shca *shca, struct ehca_eqe *eqe) u32 token; unsigned long flags; struct ehca_cq *cq; + eqe_value = eqe->entry; ehca_dbg(&shca->ib_device, "eqe_value=%lx", eqe_value); if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) { - ehca_dbg(&shca->ib_device, "... completion event"); + ehca_dbg(&shca->ib_device, "Got completion event"); token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value); spin_lock_irqsave(&ehca_cq_idr_lock, flags); cq = idr_find(&ehca_cq_idr, token); @@ -419,16 +420,20 @@ static inline void process_eqe(struct ehca_shca *shca, struct ehca_eqe *eqe) return; } reset_eq_pending(cq); - if (ehca_scaling_code) { + cq->nr_events++; + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + if (ehca_scaling_code) queue_comp_task(cq); - spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); - } else { - spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + else { comp_event_callback(cq); + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + cq->nr_events--; + if (!cq->nr_events) + wake_up(&cq->wait_completion); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); } } else { - ehca_dbg(&shca->ib_device, - "Got non completion event"); + ehca_dbg(&shca->ib_device, "Got non completion event"); parse_identifier(shca, eqe_value); } } @@ -478,6 +483,7 @@ void ehca_process_eq(struct ehca_shca *shca, int is_irq) "token=%x", token); continue; } + eqe_cache[eqe_cnt].cq->nr_events++; spin_unlock(&ehca_cq_idr_lock); } else eqe_cache[eqe_cnt].cq = NULL; @@ -504,12 +510,18 @@ void ehca_process_eq(struct ehca_shca *shca, int is_irq) /* call completion handler for cached eqes */ for (i = 0; i < eqe_cnt; i++) if (eq->eqe_cache[i].cq) { - if (ehca_scaling_code) { - spin_lock(&ehca_cq_idr_lock); + if (ehca_scaling_code) queue_comp_task(eq->eqe_cache[i].cq); - spin_unlock(&ehca_cq_idr_lock); - } else - comp_event_callback(eq->eqe_cache[i].cq); + else { + struct ehca_cq *cq = eq->eqe_cache[i].cq; + comp_event_callback(cq); + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + cq->nr_events--; + if (!cq->nr_events) + wake_up(&cq->wait_completion); + spin_unlock_irqrestore(&ehca_cq_idr_lock, + flags); + } } else { ehca_dbg(&shca->ib_device, "Got non completion event"); parse_identifier(shca, eq->eqe_cache[i].eqe->entry); @@ -523,7 +535,6 @@ void ehca_process_eq(struct ehca_shca *shca, int is_irq) if (!eqe) break; process_eqe(shca, eqe); - eqe_cnt++; } while (1); unlock_irq_spinlock: @@ -567,8 +578,7 @@ static void __queue_comp_task(struct ehca_cq *__cq, list_add_tail(&__cq->entry, &cct->cq_list); cct->cq_jobs++; wake_up(&cct->wait_queue); - } - else + } else __cq->nr_callbacks++; spin_unlock(&__cq->task_lock); @@ -577,18 +587,21 @@ static void __queue_comp_task(struct ehca_cq *__cq, static void queue_comp_task(struct ehca_cq *__cq) { - int cpu; int cpu_id; struct ehca_cpu_comp_task *cct; + int cq_jobs; + unsigned long flags; - cpu = get_cpu(); cpu_id = find_next_online_cpu(pool); BUG_ON(!cpu_online(cpu_id)); cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id); BUG_ON(!cct); - if (cct->cq_jobs > 0) { + spin_lock_irqsave(&cct->task_lock, flags); + cq_jobs = cct->cq_jobs; + spin_unlock_irqrestore(&cct->task_lock, flags); + if (cq_jobs > 0) { cpu_id = find_next_online_cpu(pool); cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id); BUG_ON(!cct); @@ -608,11 +621,17 @@ static void run_comp_task(struct ehca_cpu_comp_task* cct) cq = list_entry(cct->cq_list.next, struct ehca_cq, entry); spin_unlock_irqrestore(&cct->task_lock, flags); comp_event_callback(cq); - spin_lock_irqsave(&cct->task_lock, flags); + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + cq->nr_events--; + if (!cq->nr_events) + wake_up(&cq->wait_completion); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + + spin_lock_irqsave(&cct->task_lock, flags); spin_lock(&cq->task_lock); cq->nr_callbacks--; - if (cq->nr_callbacks == 0) { + if (!cq->nr_callbacks) { list_del_init(cct->cq_list.next); cct->cq_jobs--; } diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index c183512..059da96 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -52,7 +52,7 @@ MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); -MODULE_VERSION("SVNEHCA_0021"); +MODULE_VERSION("SVNEHCA_0022"); int ehca_open_aqp1 = 0; int ehca_debug_level = 0; @@ -810,7 +810,7 @@ int __init ehca_module_init(void) int ret; printk(KERN_INFO "eHCA Infiniband Device Driver " - "(Rel.: SVNEHCA_0021)\n"); + "(Rel.: SVNEHCA_0022)\n"); idr_init(&ehca_qp_idr); idr_init(&ehca_cq_idr); spin_lock_init(&ehca_qp_idr_lock); diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 71dc84b..1c6b63a 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -1088,21 +1088,21 @@ static void mthca_unmap_memfree(struct mthca_dev *dev, static int mthca_alloc_memfree(struct mthca_dev *dev, struct mthca_qp *qp) { - int ret = 0; - if (mthca_is_memfree(dev)) { qp->rq.db_index = mthca_alloc_db(dev, MTHCA_DB_TYPE_RQ, qp->qpn, &qp->rq.db); if (qp->rq.db_index < 0) - return ret; + return -ENOMEM; qp->sq.db_index = mthca_alloc_db(dev, MTHCA_DB_TYPE_SQ, qp->qpn, &qp->sq.db); - if (qp->sq.db_index < 0) + if (qp->sq.db_index < 0) { mthca_free_db(dev, MTHCA_DB_TYPE_RQ, qp->rq.db_index); + return -ENOMEM; + } } - return ret; + return 0; } static void mthca_free_memfree(struct mthca_dev *dev, diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index bb2e3d5..56c87a8 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -407,6 +407,10 @@ static int ipoib_mcast_join_complete(int status, queue_delayed_work(ipoib_workqueue, &priv->mcast_task, 0); mutex_unlock(&mcast_mutex); + + if (mcast == priv->broadcast) + netif_carrier_on(dev); + return 0; } @@ -594,7 +598,6 @@ void ipoib_mcast_join_task(struct work_struct *work) ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n"); clear_bit(IPOIB_MCAST_RUN, &priv->flags); - netif_carrier_on(dev); } int ipoib_mcast_start_thread(struct net_device *dev) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 3cb551b..7f3ec20 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler, struct ipoib_dev_priv *priv = container_of(handler, struct ipoib_dev_priv, event_handler); - if (record->event == IB_EVENT_PORT_ERR || - record->event == IB_EVENT_PKEY_CHANGE || - record->event == IB_EVENT_PORT_ACTIVE || - record->event == IB_EVENT_LID_CHANGE || - record->event == IB_EVENT_SM_CHANGE || - record->event == IB_EVENT_CLIENT_REREGISTER) { + if ((record->event == IB_EVENT_PORT_ERR || + record->event == IB_EVENT_PKEY_CHANGE || + record->event == IB_EVENT_PORT_ACTIVE || + record->event == IB_EVENT_LID_CHANGE || + record->event == IB_EVENT_SM_CHANGE || + record->event == IB_EVENT_CLIENT_REREGISTER) && + record->element.port_num == priv->port) { ipoib_dbg(priv, "Port state change event\n"); queue_work(ipoib_workqueue, &priv->flush_task); } From rdreier at cisco.com Thu Mar 8 15:54:55 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Mar 2007 15:54:55 -0800 Subject: [ofa-general] Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy In-Reply-To: <20070308063921.GB26304@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 8 Mar 2007 08:39:38 +0200") References: <20070308063921.GB26304@mellanox.co.il> Message-ID: > It is common practice to put a pointer/index to a per-QP > structure inside the wrid. This data is available after poll_cq > returns, when cq lock is not taken. If this pointer is used > directly inside the event handler, the ULP that is moving QP to > reset has no way to know when is it safe to free data it points to, > unless the verbs provider synchronizes with the IRQ handler > before the verbs returns. Does this fix any problem for in-tree (or OFED) drivers? Because I'm not convinced this is something that a low-level driver should try to handle. A ULP that suffers from this that polls a CQ from a workqueue, say, rather than an interrupt remains broken even after this change. And my gut feeling is that this type of problem is something a ULP should handle by not getting into this situation in the first place. BTW, have you had a chance to test the other changes and see if they fix the IPoIB CM issue? - R. From sean.hefty at intel.com Thu Mar 8 16:05:41 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 8 Mar 2007 16:05:41 -0800 Subject: [ofa-general] [PATCH] ib_ipoib: fix race detaching from mcast group before attaching In-Reply-To: <000001c761b3$70c16030$ff0da8c0@amr.corp.intel.com> Message-ID: <000301c761de$acd913d0$ff0da8c0@amr.corp.intel.com> I believe this is a simple fix for the detach before attach race that Roland pointed out. I only did some limited testing on my systems, so I can't say that it will fully fix bug report 400. Roland, if this looks good to you, let me know and I can push it out to my git tree. Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index bb2e3d5..cd202a0 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -641,6 +641,9 @@ static int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast) struct ipoib_dev_priv *priv = netdev_priv(dev); int ret = 0; + if (test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) + ib_sa_free_multicast(mcast->mc); + if (test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) { ipoib_dbg_mcast(priv, "leaving MGID " IPOIB_GID_FMT "\n", IPOIB_GID_ARG(mcast->mcmember.mgid)); @@ -652,9 +655,6 @@ static int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast) ipoib_warn(priv, "ipoib_mcast_detach failed (result = %d)\n", ret); } - if (test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) - ib_sa_free_multicast(mcast->mc); - return 0; } From sweitzen at cisco.com Thu Mar 8 16:24:40 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 8 Mar 2007 16:24:40 -0800 Subject: [ofa-general] OFED 1.2 beta blocking bugs (March 8) Message-ID: I've been testing OFED-1.2-20070308-0708 today on x86_64/i686/ppc64/ia64, and here's a current list of bugs I'd like fixed before OFED 1.2 beta. I can compile better today than yesterday, although perftest and mvapich won't compile on ppc64 (bug 379), and mvapich won't compile on ia64 (bug 434). IPoIB CM is locking up ppc64, IBM folks have you tried IPoIB CM on ppc64? bug_id assigned_to short_desc 395 vlad at mellanox.co.il uDAPL fails (with Intel MPI or HP MPI) on SLES 10 i686 431 mst at mellanox.co.il IPoIB CM locks up server on SLES10/RHEL4 ppc64 400 sean.hefty at intel.com OFED 1.2 IPoIB HA/CM/multicast problems 379 vlad at mellanox.co.il can't compile OFED 1.2 on RHEL4/SLES10 ppc64 434 pasha at mellanox.co.il OFED-1.2-20070308-0708 MVAPICH won't compile on RHEL4 IA64 with Intel compiler 418 mst at mellanox.co.il IPoIB CM causes large message IPv4 multicast to fail 417 vlad at mellanox.co.il can't unload OFED 1.2 drivers on SLES10 Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From Koen.SEGERS at VRT.BE Fri Mar 9 01:51:37 2007 From: Koen.SEGERS at VRT.BE (SEGERS Koen) Date: Fri, 9 Mar 2007 10:51:37 +0100 Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS References: Message-ID: If I understand your information correct, there is no way of creating a fully redundant setup (2 servers connected with 2 paths to eachother end-to-end) with infiniband using VERBS or SDP. Koen ________________________________ Van: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Verzonden: do 8/03/2007 17:55 Aan: SEGERS Koen; general at lists.openfabrics.org Onderwerp: RE: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS "ipoibcfg merge" only handles IPoIB, not SDP, and it's active/passive. ________________________________ From: SEGERS Koen [mailto:Koen.SEGERS at VRT.BE] Sent: Thursday, March 08, 2007 1:03 AM To: Scott Weitzenkamp (sweitzen); general at lists.openfabrics.org Subject: RE: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS Are you talking about a kernel patch when you refer to the "bonding kernel driver"? I can't find a specific bonding command that allows bonding two or more ports. So if I understand it correct, with SDP you can't have redundancy (active/passive) or aggregation (active/active) with the current OFED-1.2 driver. Renaud Larsen of Cisco told us that bonding is possible in the Topspin driver with the "ipoibcfg merge" command. We are wondering if this also applies for SDP. That is why we are very interested in the beta drivers of Topspin! We are supposed to get them (from Renaud) within a few days, but if you can send it to me earlier, it is always better :) Greetings, Koen ________________________________ Van: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Verzonden: do 8/03/2007 0:01 Aan: SEGERS Koen; general at lists.openfabrics.org Onderwerp: RE: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS I have not tried the OFED 1.2 IPoIB bonding kernel driver, and can only speak for the userspace IPoIB HA ipoib_ha.pl script. Both Topspin IPoIB and OFED IPoIB have active/passive IPoIB high availability, neither can aggregate IPoIB throughput, and neither has SDP high availability. We will have Tosppin driver SLES10 drivers in beta soon, let me know if you are interested. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of SEGERS Koen Sent: Wednesday, March 07, 2007 6:59 AM To: general at lists.openfabrics.org Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS Hi all! We are trying to bond two ports on 1 HCA so that we are able aggregate the throughput. We are also interested in bonding ports of different HCA's. Is this possible with the OFED driver? If so, can you give the command? We know TopSpin has support for this feature. Sadly, Topspin has no driver that runs on our system (SLES 10). We currently installed OFED-1.2 of 20070306 and the stable OFED-1.1 driver, but we can't figure out how this bonding is started in either versions. It is important that we offload the bonding. We don't want to use the standard linux bonding. That is why we think that bonding over different HCA's is not going to work. Is this assumption correct? Is bonding possible when running SDP? And VERBS? Greetings Koen *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From ossrosch at linux.vnet.ibm.com Fri Mar 9 02:05:54 2007 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Fri, 9 Mar 2007 11:05:54 +0100 Subject: [ofa-general] [PATCH ofed-1.2 3/3] ehca backport 2.6.17 Message-ID: <200703091105.55128.ossrosch@linux.vnet.ibm.com> backport WARN_ON_ONCE macro for 2.6.17 Signed-off-by: Stefan Roscher --- diff -Nurp a/include/asm-generic/bug.h b/include/asm-generic/bug.h --- a/include/asm-generic/bug.h 1970-01-01 01:00:00.000000000 +0100 +++ b/include/asm-generic/bug.h 2007-03-08 14:56:10.000000000 +0100 @@ -0,0 +1,34 @@ +#ifndef _BACKPORT_ASM_GENERIC_BUG_H +#define _BACKPORT_ASM_GENERIC_BUG_H + +#include_next + +#ifdef CONFIG_BUG +#define WARN_ON_2(condition) ({ \ + typeof(condition) __ret_warn_on = (condition); \ + if (unlikely(__ret_warn_on)) { \ + printk("BUG: at %s:%d %s()\n", __FILE__, \ + __LINE__, __FUNCTION__); \ + dump_stack(); \ + } \ + unlikely(__ret_warn_on); \ +}) + +#else /* !CONFIG_BUG */ + +#define WARN_ON_2(condition) ({ \ + typeof(condition) __ret_warn_on = (condition); \ + unlikely(__ret_warn_on); \ +}) +#endif +#define WARN_ON_ONCE(condition) ({ \ + static int __warned; \ + typeof(condition) __ret_warn_once = (condition); \ + \ + if (unlikely(__ret_warn_once)) \ + if (WARN_ON_2(!__warned)) \ + __warned = 1; \ + unlikely(__ret_warn_once); \ +}) + +#endif From ossrosch at linux.vnet.ibm.com Fri Mar 9 02:05:46 2007 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Fri, 9 Mar 2007 11:05:46 +0100 Subject: [ofa-general] [PATCH ofed-1.2 2/3] ehca backport 2.6.16_SLES10 Message-ID: <200703091105.47598.ossrosch@linux.vnet.ibm.com> backport WARN_ON_ONCE macro for 2.6.16_SLES10 Signed-off-by: Stefan Roscher --- diff -Nurp a/include/asm-generic/bug.h b/include/asm-generic/bug.h --- a/include/asm-generic/bug.h 1970-01-01 01:00:00.000000000 +0100 +++ b/include/asm-generic/bug.h 2007-03-08 14:56:10.000000000 +0100 @@ -0,0 +1,34 @@ +#ifndef _BACKPORT_ASM_GENERIC_BUG_H +#define _BACKPORT_ASM_GENERIC_BUG_H + +#include_next + +#ifdef CONFIG_BUG +#define WARN_ON_2(condition) ({ \ + typeof(condition) __ret_warn_on = (condition); \ + if (unlikely(__ret_warn_on)) { \ + printk("BUG: at %s:%d %s()\n", __FILE__, \ + __LINE__, __FUNCTION__); \ + dump_stack(); \ + } \ + unlikely(__ret_warn_on); \ +}) + +#else /* !CONFIG_BUG */ + +#define WARN_ON_2(condition) ({ \ + typeof(condition) __ret_warn_on = (condition); \ + unlikely(__ret_warn_on); \ +}) +#endif +#define WARN_ON_ONCE(condition) ({ \ + static int __warned; \ + typeof(condition) __ret_warn_once = (condition); \ + \ + if (unlikely(__ret_warn_once)) \ + if (WARN_ON_2(!__warned)) \ + __warned = 1; \ + unlikely(__ret_warn_once); \ +}) + +#endif From ossrosch at linux.vnet.ibm.com Fri Mar 9 02:05:31 2007 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Fri, 9 Mar 2007 11:05:31 +0100 Subject: [ofa-general] [PATCH ofed-1.2 0/3] ehca (kernel space) backport patches for ofed1.2 Message-ID: <200703091105.32740.ossrosch@linux.vnet.ibm.com> Hi, the following patch is a backport for the missing WARN_ON_ONCE macro in older kernels. Regards Stefan From ossrosch at linux.vnet.ibm.com Fri Mar 9 02:05:42 2007 From: ossrosch at linux.vnet.ibm.com (Stefan Roscher) Date: Fri, 9 Mar 2007 11:05:42 +0100 Subject: [ofa-general] [PATCH ofed-1.2 1/3] ehca backport 2.6.16 Message-ID: <200703091105.43301.ossrosch@linux.vnet.ibm.com> backport WARN_ON_ONCE macro for 2.6.16 Signed-off-by: Stefan Roscher --- diff -Nurp a/include/asm-generic/bug.h b/include/asm-generic/bug.h --- a/include/asm-generic/bug.h 1970-01-01 01:00:00.000000000 +0100 +++ b/include/asm-generic/bug.h 2007-03-08 14:56:10.000000000 +0100 @@ -0,0 +1,34 @@ +#ifndef _BACKPORT_ASM_GENERIC_BUG_H +#define _BACKPORT_ASM_GENERIC_BUG_H + +#include_next + +#ifdef CONFIG_BUG +#define WARN_ON_2(condition) ({ \ + typeof(condition) __ret_warn_on = (condition); \ + if (unlikely(__ret_warn_on)) { \ + printk("BUG: at %s:%d %s()\n", __FILE__, \ + __LINE__, __FUNCTION__); \ + dump_stack(); \ + } \ + unlikely(__ret_warn_on); \ +}) + +#else /* !CONFIG_BUG */ + +#define WARN_ON_2(condition) ({ \ + typeof(condition) __ret_warn_on = (condition); \ + unlikely(__ret_warn_on); \ +}) +#endif +#define WARN_ON_ONCE(condition) ({ \ + static int __warned; \ + typeof(condition) __ret_warn_once = (condition); \ + \ + if (unlikely(__ret_warn_once)) \ + if (WARN_ON_2(!__warned)) \ + __warned = 1; \ + unlikely(__ret_warn_once); \ +}) + +#endif From vlad at lists.openfabrics.org Fri Mar 9 02:16:18 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Fri, 9 Mar 2007 02:16:18 -0800 (PST) Subject: [ofa-general] ofa_1_2_kernel 20070309-0200 daily build status Message-ID: <20070309101618.BEA50E60804@openfabrics.org> This email was generated automatically, please do not reply Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.17 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.19 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.13 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.14 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.16.21-0.8-default Failed: From halr at voltaire.com Fri Mar 9 02:44:21 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Mar 2007 05:44:21 -0500 Subject: [ofa-general] RE: OFED 1.2 beta blocking bugs In-Reply-To: <45F094B7.7080406@ichips.intel.com> References: <000201c761ce$71f2e710$ff0da8c0@amr.corp.intel.com> <45F094B7.7080406@ichips.intel.com> Message-ID: <1173437058.465.159160.camel@hal.voltaire.com> On Thu, 2007-03-08 at 17:56, Sean Hefty wrote: > > I'm talking about IP multicast not IB multicast, so the hardware MTU > > should be transparent. > > The IP multicast message would need fragmentation though if the hardware MTU > were too small. IP does this fragmentation and would need to do this to UD MTU size not the RC (CM) size. > If fragmentation wasn't occurring, then getting a local length > error might be expected. Or what about the other way around, if it tried to make the MTU too large ? > I'm not familiar with the IPoIB CM code (looking at it > now), but I wouldn't expect the CM related code to affect multicast traffic. I would as IPoIB-CM does not support MC and there needs to be a fallback to UD. I'm not sure where IPmc gets the device MTU from but it may be the wrong one (from CM rather than the normal UD interface). Has anyone tracked this down ? I haven't had a chance to look yet. -- Hal > - Sean > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From bugzilla-daemon at lists.openfabrics.org Fri Mar 9 06:24:03 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Fri, 9 Mar 2007 06:24:03 -0800 (PST) Subject: [ofa-general] [Bug 441] New: IPOIB build faild on RHEL5 Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=441 Summary: IPOIB build faild on RHEL5 Product: OpenFabrics Linux Version: 1.2alpha1 Platform: PPC64 OS/Version: Other Status: NEW Severity: blocker Priority: P1 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: stefan.roscher at de.ibm.com CC: hnguyen at de.ibm.com Hi, the OFED-1.2-20070308-0708 build fails with REHL5 on ppc64. The build-process stops. The following error occured. -include /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include/linux/autoconf.h \ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Wstrict-prototypes -Wundef -Werror-implicit-function-declaration -Os -msoft-float -pipe -mminimal-toc -mtraceback=none -mcall-aixdesc -mtune=power4 -mno-altivec -funit-at-a-time -mstring -Wa,-maltivec -fomit-frame-pointer -g -fno-stack-protector -Wdeclaration-after-statement -Wno-pointer-sign -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/include -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/ipoib -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/debug -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/cxgb3/core -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3 -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/rds -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(ipoib_fs)" -D"KBUILD_MODNAME=KBUILD_STR(ib_ipoib)" -c -o /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/ipoib/.tmp_ipoib_fs.o /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/ipoib/ipoib_fs.c /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/ipoib/ipoib_fs.c: In function 'ipoib_mcg_open': /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/ipoib/ipoib_fs.c:144: error: 'struct inode' has no member named 'u' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/ipoib/ipoib_fs.c: In function 'ipoib_path_open': /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/ipoib/ipoib_fs.c:250: error: 'struct inode' has no member named 'u' make[4]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/ipoib/ipoib_fs.o] Error 1 make[3]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/ipoib] Error 2 make[2]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband] Error 2 make[1]: *** [_module_/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2] Error 2 make[1]: Leaving directory `/usr/src/redhat/BUILD/kernel-2.6.18/linux-2.6.18.ppc64' make: *** [kernel] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.30322 (%install) I checked the inode structure in include/linux/fs.h file and realized that the union "u" does not exists. Regards Stefan -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From diego.guella at sircomtech.com Fri Mar 9 06:51:02 2007 From: diego.guella at sircomtech.com (Diego Guella) Date: Fri, 9 Mar 2007 15:51:02 +0100 Subject: [ofa-general] RE: [ewg] RE: OFED 1.2 beta blocking bugs References: <20070307195553.GB9817@mellanox.co.il><45EF303F.5090607@ichips.intel.com><20070308140703.GC23302@mellanox.co.il><45F04DEC.10000@ichips.intel.com><45EF040F.3090305@veritas.com><1173385548.465.104193.camel@hal.voltaire.com> Message-ID: <011d01c7625a$5d3854e0$05c8a8c0@DIEGO> From: "Roland Dreier" > Of course not all managed switches necessarily have an embedded SM, > just as managed ethernet switches have different feature sets (L2 > vs. L3, etc). In fact I was trying to tie this into the earlier > thread and say that the advantages of having a managed switch go > beyond having an SM, and it is useful to have a managed switch even if > you don't want an embedded SM. I am particularly interested in managed switches, but I don't have understood exactly what the definition of "managed" is. Can someone enlight me? And, another question, this switch: Flextronics F-X430047: 24-port 4x DDR managed infiniband switch has an embedded SM? Thanks Diego Guella SIRCOM Via Borsellino, 46 25038 Rovato (BS) ITALY Tel. +39 030 7722673 Fax. +39 030 7249329 e-mail. diego.guella at sircomtech.com From halr at voltaire.com Fri Mar 9 06:59:07 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Mar 2007 09:59:07 -0500 Subject: [ofa-general] RE: [ewg] RE: OFED 1.2 beta blocking bugs In-Reply-To: <011d01c7625a$5d3854e0$05c8a8c0@DIEGO> References: <20070307195553.GB9817@mellanox.co.il><45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il><45F04DEC.10000@ichips.intel.com> <45EF040F.3090305@veritas.com> <1173385548.465.104193.camel@hal.voltaire.com> <011d01c7625a$5d3854e0$05c8a8c0@DIEGO> Message-ID: <1173452345.465.175358.camel@hal.voltaire.com> On Fri, 2007-03-09 at 09:51, Diego Guella wrote: > From: "Roland Dreier" > > Of course not all managed switches necessarily have an embedded SM, > > just as managed ethernet switches have different feature sets (L2 > > vs. L3, etc). In fact I was trying to tie this into the earlier > > thread and say that the advantages of having a managed switch go > > beyond having an SM, and it is useful to have a managed switch even if > > you don't want an embedded SM. > > I am particularly interested in managed switches, but I don't have > understood exactly what the definition of "managed" is. > Can someone enlight me? I think there are different levels of "managed" switches. This is no standard definition for this term. There are those (managed) switches which have things like MIBs and other diags built in (e.g. a local CPU, perhaps an out of band network, a command line interface, etc.) and others with a higher level of functionality which includes embedded SM and additional management appplications support. The latter "class" can be turned into the former class by disabling the SM. > And, another question, this switch: > Flextronics F-X430047: 24-port 4x DDR managed infiniband switch > has an embedded SM? I don't think so but I may be wrong. I think this is a CPU less unmanaged switch. -- Hal > > > Thanks > > > > Diego Guella > SIRCOM > Via Borsellino, 46 > 25038 Rovato (BS) > ITALY > Tel. +39 030 7722673 > Fax. +39 030 7249329 > e-mail. diego.guella at sircomtech.com > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From cppbala at yahoo.com Fri Mar 9 07:32:58 2007 From: cppbala at yahoo.com (Bala) Date: Fri, 9 Mar 2007 07:32:58 -0800 (PST) Subject: [ofa-general] OFED-1.1, mvapich and openmpi question Message-ID: <447102.92209.qm@web35103.mail.mud.yahoo.com> Hi All, Recently we have installed OFED-1.1 on our 16 node blade cluster, as we are new to IB we have the following queries 1. For openmpi we have used the "--mca btl openib,self ..." as command line option and it is running fine, now the question is how to make sure the it uses IB for mpi communication?? 2. Is there any tool available to find activity over IB?? 3. Is there any option available for "mvapich" to use IB?? 4. we are using cpi.c sample is there any other sample we can use for testing as well as benchmark?? 5. shall we use this with SGE(Sun Grid Engine) scheduler??, anybody already using SGE for openmpi and mvapich?? thanks in advance, -bala- ____________________________________________________________________________________ Don't pick lemons. See all the new 2007 cars at Yahoo! Autos. http://autos.yahoo.com/new_cars.html From halr at voltaire.com Fri Mar 9 07:37:23 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Mar 2007 10:37:23 -0500 Subject: [ofa-general] OFED-1.1, mvapich and openmpi question In-Reply-To: <447102.92209.qm@web35103.mail.mud.yahoo.com> References: <447102.92209.qm@web35103.mail.mud.yahoo.com> Message-ID: <1173454642.465.177652.camel@hal.voltaire.com> On Fri, 2007-03-09 at 10:32, Bala wrote: > Hi All, > Recently we have installed OFED-1.1 on our > 16 node blade cluster, as we are new to IB we > have the following queries > > 1. For openmpi we have used the "--mca btl openib,self > ..." as command line option and it is > running fine, now the question is how to make sure > the it uses IB for mpi communication?? > > 2. Is there any tool available to find activity over > IB?? perfquery (an OpenIB diag) will show port counters which include transmit/receive packets and bytes (for all IB traffic). -- Hal > 3. Is there any option available for "mvapich" to use > IB?? > > 4. we are using cpi.c sample is there any other > sample we can use for testing as well as benchmark?? > > 5. shall we use this with SGE(Sun Grid Engine) > scheduler??, anybody already using SGE for > openmpi and mvapich?? > > thanks in advance, > -bala- > > > > ____________________________________________________________________________________ > Don't pick lemons. > See all the new 2007 cars at Yahoo! Autos. > http://autos.yahoo.com/new_cars.html > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From jsquyres at cisco.com Fri Mar 9 07:44:38 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Fri, 9 Mar 2007 10:44:38 -0500 Subject: [ofa-general] OFED-1.1, mvapich and openmpi question In-Reply-To: <447102.92209.qm@web35103.mail.mud.yahoo.com> References: <447102.92209.qm@web35103.mail.mud.yahoo.com> Message-ID: <292DA06D-1703-4C10-A404-6BC40388C6C6@cisco.com> On Mar 9, 2007, at 10:32 AM, Bala wrote: > Recently we have installed OFED-1.1 on our > 16 node blade cluster, as we are new to IB we > have the following queries > > 1. For openmpi we have used the "--mca btl openib,self > ..." as command line option and it is > running fine, now the question is how to make sure > the it uses IB for mpi communication?? This parameter tells Open MPI to only use the "openib" and "self" plugins for MPI communications. The "openib" plugin uses the OpenFabrics stack; the "self" plugin is used for loopback (send-to- self) communication. If you want to use shared memory for on-host communication, add "sm" into the list: --mca btl sm,openib,self See http://www.open-mpi.org/faq/ for more information. > 2. Is there any tool available to find activity over > IB?? I think Hal answered that. > 3. Is there any option available for "mvapich" to use > IB?? The MVAPICH guys will have to answer that. > 4. we are using cpi.c sample is there any other > sample we can use for testing as well as benchmark?? There's lots of other benchmarks out there; I personally like NetPIPE as a ping-pong test, but OSU has some benchmarks as well. > 5. shall we use this with SGE(Sun Grid Engine) > scheduler??, anybody already using SGE for > openmpi and mvapich?? Open MPI v1.2 debuts support for SGE, but also supports a variety of other schedulers (Torque, SLURM, ...etc.). -- Jeff Squyres Cisco Systems From cppbala at yahoo.com Fri Mar 9 07:57:20 2007 From: cppbala at yahoo.com (Bala) Date: Fri, 9 Mar 2007 07:57:20 -0800 (PST) Subject: [ofa-general] In Myrinet we used to face error GM port not available for open, is applicable for IB also Message-ID: <183382.79921.qm@web35104.mail.mud.yahoo.com> Hi All, In our Myrinet at a time more number of users try to use the myrinet we used to get error saying "GM port not available for open", is there any such limitation in IB also?? sorry, if the question is not clear, one of my client wants to know this?? thanks in advance, -bala- ____________________________________________________________________________________ Don't get soaked. Take a quick peek at the forecast with the Yahoo! Search weather shortcut. http://tools.search.yahoo.com/shortcuts/#loc_weather From surs at cse.ohio-state.edu Fri Mar 9 08:08:06 2007 From: surs at cse.ohio-state.edu (Sayantan Sur) Date: Fri, 9 Mar 2007 11:08:06 -0500 Subject: [ofa-general] OFED-1.1, mvapich and openmpi question In-Reply-To: <447102.92209.qm@web35103.mail.mud.yahoo.com> References: <447102.92209.qm@web35103.mail.mud.yahoo.com> Message-ID: <20070309160805.GC18726@cse.ohio-state.edu> Hi Bala, * On Mar,1 Bala wrote : > Hi All, > Recently we have installed OFED-1.1 on our > 16 node blade cluster, as we are new to IB we > have the following queries > > 1. For openmpi we have used the "--mca btl openib,self > ..." as command line option and it is > running fine, now the question is how to make sure > the it uses IB for mpi communication?? > > 2. Is there any tool available to find activity over > IB?? > > 3. Is there any option available for "mvapich" to use > IB?? By default, the OFED installation of MVAPICH will automatically use IB. So no special option is needed to run it on IB. It might be helpful to check out our user guide sections on running MPI jobs over InfiniBand. http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich/mvapich_user_guide.html#x1-190005.2 In addition, you could also look at our "Troubleshooting" section for commonly encountered problems and their solutions. http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich/mvapich_user_guide.html#x1-290007 > 4. we are using cpi.c sample is there any other > sample we can use for testing as well as benchmark?? You could use the OSU benchmarks. Thanks, Sayantan. -- http://www.cse.ohio-state.edu/~surs From mshefty at ichips.intel.com Fri Mar 9 09:13:37 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 09 Mar 2007 09:13:37 -0800 Subject: [ofa-general] bug 418: was OFED 1.2 beta blocking bugs In-Reply-To: <1173437058.465.159160.camel@hal.voltaire.com> References: <000201c761ce$71f2e710$ff0da8c0@amr.corp.intel.com> <45F094B7.7080406@ichips.intel.com> <1173437058.465.159160.camel@hal.voltaire.com> Message-ID: <45F195C1.8000402@ichips.intel.com> > IP does this fragmentation and would need to do this to UD MTU size not > the RC (CM) size. Agreed. I'm not familiar enough with the network stack to know how it handles the MTU sizes to different destinations or across different paths. The IPoIB CM code displays a warning when the MTU is set larger than the MTU of the device, since this results in dropping large multicast packets. So at least these messages that were reported on the send size make sense: ib0: enabling connected mode will cause multicast packet drops ib0: mtu > 2044 will cause multicast packet drops. ib0: packet len 4100 (> 2064) too long to send, dropping ib0: packet len 4100 (> 2064) too long to send, dropping The original bug report simply stated that this behavior is undesirable as the default, and that IPoIB should either use UD or the CM should be fixed to avoid this situation. (The fix may be a change in some setup rather than a change in the code, but I don't know.) > I would as IPoIB-CM does not support MC and there needs to be a fallback > to UD. I'm not sure where IPmc gets the device MTU from but it may be > the wrong one (from CM rather than the normal UD interface). Has anyone > tracked this down ? I haven't had a chance to look yet. I'm still studying it, but the behavior on the send side seems to make sense after reading the code. I'm more interested in the errors on the 'receive' side, which at least to me, appear as a different issue: ib0: failed send event (status=1, wrid=35 vend_err 69) ib_mthca 0000:08:00.0: modify QP 3->3 returned status 10. ib0: failed to modify QP, ret = -22 ib0: couldn't attach QP to multicast group ff12:401b:ffff:0000:0000:0000:0001:0101 ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:0001:0101, status -22 - Sean From jsquyres at cisco.com Fri Mar 9 09:38:06 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Fri, 9 Mar 2007 12:38:06 -0500 Subject: [ofa-general] OFED-1.1, mvapich and openmpi question In-Reply-To: <292DA06D-1703-4C10-A404-6BC40388C6C6@cisco.com> References: <447102.92209.qm@web35103.mail.mud.yahoo.com> <292DA06D-1703-4C10-A404-6BC40388C6C6@cisco.com> Message-ID: <1F105E0A-5EB5-4757-9B63-A965B9922EDD@cisco.com> On Mar 9, 2007, at 10:44 AM, Jeff Squyres wrote: >> 1. For openmpi we have used the "--mca btl openib,self >> ..." as command line option and it is >> running fine, now the question is how to make sure >> the it uses IB for mpi communication?? > > This parameter tells Open MPI to only use the "openib" and "self" > plugins for MPI communications. I neglected to mention that by default, Open MPI should use as many networks as it can. So if it finds an OFED-based network adapter, it'll use it (and implicitly disable tcp). Hence, you shouldn't need to specify the "--mca btl [sm,]openib,self" parameter. There is no harm in doing so, of course -- it's just unnecessary. More specifically, if you have the "openib" BTL plugin properly installed (e.g., via the OFED installer), it will complain if it is available and *not* used (e.g., no OFED-based network adapters were found / available to be used). Hope that helps. -- Jeff Squyres Cisco Systems From mshefty at ichips.intel.com Fri Mar 9 10:55:02 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 09 Mar 2007 10:55:02 -0800 Subject: [ofa-general] bug 418: was OFED 1.2 beta blocking bugs In-Reply-To: <45F195C1.8000402@ichips.intel.com> References: <000201c761ce$71f2e710$ff0da8c0@amr.corp.intel.com> <45F094B7.7080406@ichips.intel.com> <1173437058.465.159160.camel@hal.voltaire.com> <45F195C1.8000402@ichips.intel.com> Message-ID: <45F1AD86.4050808@ichips.intel.com> > ib0: failed send event (status=1, wrid=35 vend_err 69) I believe that this is causing the QP to transition into the error state. > ib_mthca 0000:08:00.0: modify QP 3->3 returned status 10. The mthca status of 0x10 indicates a bad QP state. The transition from 3->3 is RTS to RTS, but the QP is not in the RTS state, which makes sense given the previous error. The other receive side errors in the bug report are a fallout from not recovering from the send error. I don't know if this causes any problems, but at first glance it appears that the IPoIB CM code begins listening for connection requests before the code has had a chance to join the IPoIB broadcast group. This allows a connection to form before the broadcast traffic is ready. Someone more familiar with the code than I am will need to determine if this can lead to any undesirable race conditions. - Sean From somenath at veritas.com Thu Mar 8 11:06:02 2007 From: somenath at veritas.com (somenath) Date: Thu, 08 Mar 2007 11:06:02 -0800 Subject: [ofa-general] OFDEF 1.2 feedback and few questions... Message-ID: <45F05E9A.6070802@veritas.com> 1. I installed OFED 1.2 20070308-0708 (compiled and installed in a RHEL4 U4 SMP kernel-x86-64 machine). I just opted for basic IB drivers and didn't opt for ipoib configuration. I still see that module is loaded...is it a bug? also, ib_ipath hardware is not present, still the module is loaded...is it by OFED requirement or a mistake? [root at alekhine OFED-1.2-20070308-0708]# lsmod | grep ib ib_addr 11272 1 rdma_cm ib_local_sa 14736 1 rdma_cm ib_ipoib 79320 0 ipv6 284705 15 ib_ipoib ib_ipath 227808 0 ib_mthca 156756 0 ib_umad 20016 0 ib_ucm 21256 0 ib_uverbs 46384 2 rdma_ucm,ib_ucm ib_cm 41768 4 rdma_cm,ib_ipoib,crtl,ib_ucm ib_sa 28000 5 rdma_cm,ib_local_sa,ib_ipoib,crtl,ib_cm ib_mad 43304 5 ib_local_sa,ib_mthca,ib_umad,ib_cm,ib_sa ib_core 69264 14 rdma_ucm,rdma_cm,iw_cm,ib_local_sa,ib_ipoib,ib_ipath,crtl,ib_mthca,ib_umad,ib_ucm,ib_uverbs,ib_cm,ib_sa,ib_mad 2. How to find out what exactly changed in this build in the core components (ib_core, ib_cm, ib_mad, ib_addr, ib_local_sa)? OFED_Installation_Guide.txt says see package release notes for more details , but there is none for ib_core , ib_cm etc. thanks, som. From pradeep at us.ibm.com Fri Mar 9 12:01:23 2007 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Fri, 9 Mar 2007 13:01:23 -0700 Subject: [ofa-general] IPOIB CM (NOSRQ) patch for review Message-ID: Here is a first version of the IPOIB_CM_NOSRQ patch for review. Will benefit adapters that do not (yet) support shared receive queues. This patch works in conjunction with the IPOIB CM patches submitted by Michael Tsirkin. That has now been integrated into Roland's 2.6.21-rc1 git tree and so this can be applied on top of that tree. Instead of the srq hanging off ipoib_cm_dev_priv, this patch introduces an rx_ring hanging off ipoib_cm_rx. There are some changes in the initialization and cleanup paths since srqs are not used. This has been tested on the IBM HCA with the ehca driver. Please note another small patch (not in this one) is needed to the ehca driver for it to work on the IBM HCAs. Signed-off-by: Pradeep Satyanarayana ----------------------------------------------------------------------------------------------- --- linux-2.6.21-rc1-mst/drivers/infiniband/ulp/ipoib/Makefile 2007-03-08 17:09:48.000000000 -0800 +++ linux-2.6.21-rc1/drivers/infiniband/ulp/ipoib/Makefile 2007-03-09 08:51:41.000000000 -0800 @@ -1,3 +1,4 @@ +EXTRA_CFLAGS += -DIPOIB_CM_NOSRQ obj-$(CONFIG_INFINIBAND_IPOIB) += ib_ipoib.o ib_ipoib-y := ipoib_main.o \ --- linux-2.6.21-rc1-mst/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-08 17:09:48.000000000 -0800 +++ linux-2.6.21-rc1/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-08 17:35:07.000000000 -0800 @@ -98,7 +98,11 @@ enum { #define IPOIB_OP_RECV (1ul << 31) #ifdef CONFIG_INFINIBAND_IPOIB_CM +#ifdef IPOIB_CM_NOSRQ +#define IPOIB_CM_OP_NOSRQ (1ul << 30) +#else #define IPOIB_CM_OP_SRQ (1ul << 30) +#endif #else #define IPOIB_CM_OP_SRQ (0) #endif @@ -136,6 +140,9 @@ struct ipoib_cm_data { struct ipoib_cm_rx { struct ib_cm_id *id; struct ib_qp *qp; +#ifdef IPOIB_CM_NOSRQ + struct ipoib_cm_rx_buf *rx_ring; +#endif struct list_head list; struct net_device *dev; unsigned long jiffies; @@ -163,8 +170,10 @@ struct ipoib_cm_rx_buf { }; struct ipoib_cm_dev_priv { +#ifndef IPOIB_CM_NOSRQ struct ib_srq *srq; struct ipoib_cm_rx_buf *srq_ring; +#endif struct ib_cm_id *id; struct list_head passive_ids; struct work_struct start_task; --- linux-2.6.21-rc1-mst/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-03-08 17:09:48.000000000 -0800 +++ linux-2.6.21-rc1/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-03-09 08:39:00.000000000 -0800 @@ -76,12 +76,47 @@ static void ipoib_cm_dma_unmap_rx(struct ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); } +#ifdef IPOIB_CM_NOSRQ +static int ipoib_cm_post_receive(struct net_device *dev, u64 id) +#else static int ipoib_cm_post_receive(struct net_device *dev, int id) +#endif { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_recv_wr *bad_wr; int i, ret; +#ifdef IPOIB_CM_NOSRQ + unsigned long flags; + struct ipoib_cm_rx *rx_ptr; + u32 qp_num = id & 0xffffffff; + u64 wr_id = id >> 32; + int found = 0; + + spin_lock_irqsave(&priv->lock, flags); + list_for_each_entry(rx_ptr, &priv->cm.passive_ids, list) + if (qp_num == rx_ptr->qp->qp_num) { + found = 1; + break; + } + spin_unlock_irqrestore(&priv->lock, flags); + if (!found) + printk(KERN_WARNING "qp not on passive_ids list!!\n"); + priv->cm.rx_wr.wr_id = wr_id << 32 | qp_num | IPOIB_CM_OP_NOSRQ; + + for (i = 0; i < IPOIB_CM_RX_SG; ++i) + priv->cm.rx_sge[i].addr = rx_ptr->rx_ring[wr_id].mapping[i]; + + ret = ib_post_recv(rx_ptr->qp, &priv->cm.rx_wr, &bad_wr); + if (unlikely(ret)) { + ipoib_warn(priv, "post recv failed for buf %d (%d)\n", + wr_id, ret); + ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, + rx_ptr->rx_ring[wr_id].mapping); + dev_kfree_skb_any(rx_ptr->rx_ring[wr_id].skb); + rx_ptr->rx_ring[wr_id].skb = NULL; + } +#else priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ; for (i = 0; i < IPOIB_CM_RX_SG; ++i) @@ -96,15 +131,30 @@ static int ipoib_cm_post_receive(struct priv->cm.srq_ring[id].skb = NULL; } +#endif return ret; } +#ifdef IPOIB_CM_NOSRQ +static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, u64 id, + int frags, + u64 mapping[IPOIB_CM_RX_SG]) +#else static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, int frags, u64 mapping[IPOIB_CM_RX_SG]) +#endif { struct ipoib_dev_priv *priv = netdev_priv(dev); struct sk_buff *skb; int i; +#ifdef IPOIB_CM_NOSRQ + unsigned long flags; + struct ipoib_cm_rx *rx_ptr; + u32 qp_num = id & 0xffffffff; + u32 wr_id = id >> 32; + int found = 0; +#endif + skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12); if (unlikely(!skb)) @@ -136,7 +186,25 @@ static struct sk_buff *ipoib_cm_alloc_rx goto partial_error; } +#ifdef IPOIB_CM_NOSRQ + + spin_lock_irqsave(&priv->lock, flags); + list_for_each_entry(rx_ptr, &priv->cm.passive_ids, list) + if(qp_num == rx_ptr->qp->qp_num) { + found = 1; + break; + } + spin_unlock_irqrestore(&priv->lock, flags); + + if (!found) + printk(KERN_WARNING "qp not on passive_ids list!!\n"); + + /* Use the rx_ptr to get the requisite entry */ + rx_ptr->rx_ring[wr_id].skb = skb; + +#else priv->cm.srq_ring[id].skb = skb; +#endif return skb; partial_error: @@ -157,9 +225,16 @@ static struct ib_qp *ipoib_cm_create_rx_ struct ib_qp_init_attr attr = { .send_cq = priv->cq, /* does not matter, we never send anything */ .recv_cq = priv->cq, +#ifdef IPOIB_CM_NOSRQ + .srq = NULL, +#else .srq = priv->cm.srq, +#endif .cap.max_send_wr = 1, /* FIXME: 0 Seems not to work */ + .cap.max_recv_wr = ipoib_recvq_size + 1, .cap.max_send_sge = 1, /* FIXME: 0 Seems not to work */ + /* .cap.max_recv_sge = 1, */ /* Is this correct? */ + .cap.max_recv_sge = IPOIB_CM_RX_SG, /* Is this correct? */ .sq_sig_type = IB_SIGNAL_ALL_WR, .qp_type = IB_QPT_RC, .qp_context = p, @@ -217,7 +292,11 @@ static int ipoib_cm_send_rep(struct net_ rep.flow_control = 0; rep.rnr_retry_count = req->rnr_retry_count; rep.target_ack_delay = 20; /* FIXME */ +#ifdef IPOIB_CM_NOSRQ + rep.srq = 0; +#else rep.srq = 1; +#endif rep.qp_num = qp->qp_num; rep.starting_psn = psn; return ib_send_cm_rep(cm_id, &rep); @@ -231,6 +310,12 @@ static int ipoib_cm_req_handler(struct i unsigned long flags; unsigned psn; int ret; + struct ib_qp_attr qp_attr; + int qp_attr_mask; +#ifdef IPOIB_CM_NOSRQ + u32 qp_num; + u64 i; +#endif ipoib_dbg(priv, "REQ arrived\n"); p = kzalloc(sizeof *p, GFP_KERNEL); @@ -244,10 +329,46 @@ static int ipoib_cm_req_handler(struct i goto err_qp; } +#ifdef IPOIB_CM_NOSRQ + qp_num = p->qp->qp_num; + + /* Allocate space for the rx_ring here */ + p->rx_ring = kzalloc(ipoib_recvq_size * sizeof *p->rx_ring, + GFP_KERNEL); + + cm_id->context = p; + p->jiffies = jiffies; + spin_lock_irqsave(&priv->lock, flags); + list_add(&p->list, &priv->cm.passive_ids); + spin_unlock_irqrestore(&priv->lock, flags); + + psn = random32() & 0xffffff; + ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn); + if (ret) + goto err_modify; + + for (i = 0; i < ipoib_recvq_size; ++i) { + if (!ipoib_cm_alloc_rx_skb(dev, i << 32 | qp_num, + IPOIB_CM_RX_SG - 1, + p->rx_ring[i].mapping)) { + ipoib_warn(priv, "failed to allocate receive buffer %d\n", i); + ipoib_cm_dev_cleanup(dev); + return -ENOMEM; + } + + if (ipoib_cm_post_receive(dev, i << 32 | qp_num)) { + ipoib_warn(priv, "ipoib_ib_post_receive failed for buf %d\n", i); + ipoib_cm_dev_cleanup(dev); + return -EIO; + } + } + +#else psn = random32() & 0xffffff; ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn); if (ret) goto err_modify; +#endif ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); if (ret) { @@ -255,11 +376,28 @@ static int ipoib_cm_req_handler(struct i goto err_rep; } + /* This is missing in Michael's code -Do we need this */ + qp_attr.qp_state = IB_QPS_RTS; + + ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to init QP attr for RTS: %d\n", ret); + return ret; + } + ret = ib_modify_qp(p->qp, &qp_attr, qp_attr_mask); + if (ret) { + ipoib_warn(priv, "failed to modify QP to RTS: %d\n", ret); + return ret; + } + /*** missing end ***/ + +#ifndef IPOIB_CM_NOSRQ cm_id->context = p; p->jiffies = jiffies; spin_lock_irqsave(&priv->lock, flags); list_add(&p->list, &priv->cm.passive_ids); spin_unlock_irqrestore(&priv->lock, flags); +#endif queue_delayed_work(ipoib_workqueue, &priv->cm.stale_task, IPOIB_CM_RX_DELAY); return 0; @@ -344,7 +482,14 @@ static void skb_put_frags(struct sk_buff void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); +#ifdef IPOIB_CM_NOSRQ + struct ipoib_cm_rx *rx_ptr; + u32 qp_num = (wc->wr_id & ~IPOIB_CM_OP_NOSRQ) & 0xffffffff; + u64 wr_id = wc->wr_id >> 32; + int found = 0; +#else unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ; +#endif struct sk_buff *skb, *newskb; struct ipoib_cm_rx *p; unsigned long flags; @@ -360,7 +505,23 @@ void ipoib_cm_handle_rx_wc(struct net_de return; } +#ifdef IPOIB_CM_NOSRQ + spin_lock_irqsave(&priv->lock, flags); + list_for_each_entry(rx_ptr, &priv->cm.passive_ids, list) + if(qp_num == rx_ptr->qp->qp_num) { + found = 1; + break; + } + spin_unlock_irqrestore(&priv->lock, flags); + + if (!found) + printk(KERN_WARNING "qp not on passive_ids list!!\n"); + + /* Use the rx_ptr to get the requisite entry */ + skb = rx_ptr->rx_ring[wr_id].skb; +#else skb = priv->cm.srq_ring[wr_id].skb; +#endif if (unlikely(wc->status != IB_WC_SUCCESS)) { ipoib_dbg(priv, "cm recv error " @@ -371,7 +532,12 @@ void ipoib_cm_handle_rx_wc(struct net_de } if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) { +#ifdef IPOIB_CM_NOSRQ + /* Temporary hack till ehca fixes wc->qp = NULL */ + p = rx_ptr; +#else p = wc->qp->qp_context; +#endif if (time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { spin_lock_irqsave(&priv->lock, flags); p->jiffies = jiffies; @@ -388,7 +554,12 @@ void ipoib_cm_handle_rx_wc(struct net_de frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len, (unsigned)IPOIB_CM_HEAD_SIZE)) / PAGE_SIZE; +#ifdef IPOIB_CM_NOSRQ + newskb = ipoib_cm_alloc_rx_skb(dev, wr_id << 32 | qp_num, frags, + mapping); +#else newskb = ipoib_cm_alloc_rx_skb(dev, wr_id, frags, mapping); +#endif if (unlikely(!newskb)) { /* * If we can't allocate a new RX buffer, dump @@ -399,8 +570,13 @@ void ipoib_cm_handle_rx_wc(struct net_de goto repost; } +#ifdef IPOIB_CM_NOSRQ + ipoib_cm_dma_unmap_rx(priv, frags, rx_ptr->rx_ring[wr_id].mapping); + memcpy(rx_ptr->rx_ring[wr_id].mapping, mapping, (frags + 1) * sizeof *mapping); +#else ipoib_cm_dma_unmap_rx(priv, frags, priv->cm.srq_ring[wr_id].mapping); memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, (frags + 1) * sizeof *mapping); +#endif ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", wc->byte_len, wc->slid); @@ -421,7 +597,11 @@ void ipoib_cm_handle_rx_wc(struct net_de netif_rx_ni(skb); repost: +#ifdef IPOIB_CM_NOSRQ + if (unlikely(ipoib_cm_post_receive(dev, wr_id << 32 | qp_num))) +#else if (unlikely(ipoib_cm_post_receive(dev, wr_id))) +#endif ipoib_warn(priv, "ipoib_cm_post_receive failed " "for buf %d\n", wr_id); } @@ -613,6 +793,9 @@ void ipoib_cm_dev_stop(struct net_device struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_cm_rx *p; unsigned long flags; +#ifdef IPOIB_CM_NOSRQ + int i; +#endif if (!IPOIB_CM_SUPPORTED(dev->dev_addr)) return; @@ -621,6 +804,16 @@ void ipoib_cm_dev_stop(struct net_device spin_lock_irqsave(&priv->lock, flags); while (!list_empty(&priv->cm.passive_ids)) { p = list_entry(priv->cm.passive_ids.next, typeof(*p), list); +#ifdef IPOIB_CM_NOSRQ + for(i = 0; i < ipoib_recvq_size; ++i) + if(p->rx_ring[i].skb) { + ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, + p->rx_ring[i].mapping); + dev_kfree_skb_any(p->rx_ring[i].skb); + p->rx_ring[i].skb = NULL; + } + kfree(p->rx_ring); +#endif list_del_init(&p->list); spin_unlock_irqrestore(&priv->lock, flags); ib_destroy_cm_id(p->id); @@ -707,7 +900,11 @@ static struct ib_qp *ipoib_cm_create_tx_ struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_init_attr attr = {}; attr.recv_cq = priv->cq; +#ifdef IPOIB_CM_NOSRQ + attr.srq = NULL; +#else attr.srq = priv->cm.srq; +#endif attr.cap.max_send_wr = ipoib_sendq_size; attr.cap.max_send_sge = 1; attr.sq_sig_type = IB_SIGNAL_ALL_WR; @@ -749,7 +946,11 @@ static int ipoib_cm_send_req(struct net_ req.retry_count = 0; /* RFC draft warns against retries */ req.rnr_retry_count = 0; /* RFC draft warns against retries */ req.max_cm_retries = 15; +#ifdef IPOIB_CM_NOSRQ + req.srq = 0; +#else req.srq = 1; +#endif return ib_send_cm_req(id, &req); } @@ -1089,6 +1290,9 @@ static void ipoib_cm_stale_task(struct w cm.stale_task.work); struct ipoib_cm_rx *p; unsigned long flags; +#ifdef IPOIB_CM_NOSRQ + int i; +#endif spin_lock_irqsave(&priv->lock, flags); while (!list_empty(&priv->cm.passive_ids)) { @@ -1097,6 +1301,17 @@ static void ipoib_cm_stale_task(struct w p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list); if (time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT)) break; +#ifdef IPOIB_CM_NOSRQ + for(i = 0; i < ipoib_recvq_size; ++i) + if(p->rx_ring[i].skb) { + ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, + p->rx_ring[i].mapping); + dev_kfree_skb_any(p->rx_ring[i].skb); + p->rx_ring[i].skb = NULL; + } + /* Free the rx_ring */ + kfree(p->rx_ring); +#endif list_del_init(&p->list); spin_unlock_irqrestore(&priv->lock, flags); ib_destroy_cm_id(p->id); @@ -1154,12 +1369,14 @@ int ipoib_cm_add_mode_attr(struct net_de int ipoib_cm_dev_init(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); +#ifndef IPOIB_CM_NOSRQ struct ib_srq_init_attr srq_init_attr = { .attr = { .max_wr = ipoib_recvq_size, .max_sge = IPOIB_CM_RX_SG } }; +#endif int ret, i; INIT_LIST_HEAD(&priv->cm.passive_ids); @@ -1172,6 +1389,7 @@ int ipoib_cm_dev_init(struct net_device skb_queue_head_init(&priv->cm.skb_queue); +#ifndef IPOIB_CM_NOSRQ priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr); if (IS_ERR(priv->cm.srq)) { ret = PTR_ERR(priv->cm.srq); @@ -1187,6 +1405,7 @@ int ipoib_cm_dev_init(struct net_device ipoib_cm_dev_cleanup(dev); return -ENOMEM; } +#endif for (i = 0; i < IPOIB_CM_RX_SG; ++i) priv->cm.rx_sge[i].lkey = priv->mr->lkey; @@ -1198,6 +1417,10 @@ int ipoib_cm_dev_init(struct net_device priv->cm.rx_wr.sg_list = priv->cm.rx_sge; priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG; +#ifndef IPOIB_CM_NOSRQ + /* In the case of IPOIB_CM_NOSRQ we do the rest of the init in + ipoib_cm_req_handler() */ + for (i = 0; i < ipoib_recvq_size; ++i) { if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1, priv->cm.srq_ring[i].mapping)) { @@ -1211,6 +1434,7 @@ int ipoib_cm_dev_init(struct net_device return -EIO; } } +#endif priv->dev->dev_addr[0] = IPOIB_FLAGS_RC; return 0; @@ -1221,10 +1445,21 @@ void ipoib_cm_dev_cleanup(struct net_dev struct ipoib_dev_priv *priv = netdev_priv(dev); int i, ret; + ipoib_dbg(priv, "Cleanup ipoib connected mode.\n"); + +#ifdef IPOIB_CM_NOSRQ + /* We need to destroy all the qps associated with the ipoib_cm_rx + linked list hanging off the ipoib_cm_dev_priv. We also need to + kfree the associated skb and also the ipoib_cm_rx structures + themselves */ + /* We actually do this in ipoib_cm_dev_stop(). Since srq is + common to all qps it is done here for SRQ. For us the + right place is to do it in ipoib_cm_dev_stop() */ + +#else if (!priv->cm.srq) return; - ipoib_dbg(priv, "Cleanup ipoib connected mode.\n"); ret = ib_destroy_srq(priv->cm.srq); if (ret) @@ -1242,4 +1477,5 @@ void ipoib_cm_dev_cleanup(struct net_dev } kfree(priv->cm.srq_ring); priv->cm.srq_ring = NULL; +#endif } --- linux-2.6.21-rc1-mst/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-08 17:09:48.000000000 -0800 +++ linux-2.6.21-rc1/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-08 17:35:07.000000000 -0800 @@ -282,12 +282,21 @@ static void ipoib_ib_handle_tx_wc(struct static void ipoib_ib_handle_wc(struct net_device *dev, struct ib_wc *wc) { +#ifdef IPOIB_CM_NOSRQ + if (wc->wr_id & IPOIB_CM_OP_NOSRQ) + ipoib_cm_handle_rx_wc(dev, wc); + else if (wc->wr_id & IPOIB_OP_RECV) + ipoib_ib_handle_rx_wc(dev, wc); + else + ipoib_ib_handle_tx_wc(dev, wc); +#else if (wc->wr_id & IPOIB_CM_OP_SRQ) ipoib_cm_handle_rx_wc(dev, wc); else if (wc->wr_id & IPOIB_OP_RECV) ipoib_ib_handle_rx_wc(dev, wc); else ipoib_ib_handle_tx_wc(dev, wc); +#endif } void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) Pradeep pradeep at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ipoib_cm.nosrq.2621.patch Type: application/octet-stream Size: 17385 bytes Desc: not available URL: From sweitzen at cisco.com Fri Mar 9 14:41:44 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Fri, 9 Mar 2007 14:41:44 -0800 Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS In-Reply-To: References: Message-ID: For SDP, there is no transparent redundancy (I can't speak to VERBS). Scott ________________________________ From: SEGERS Koen [mailto:Koen.SEGERS at VRT.BE] Sent: Friday, March 09, 2007 1:52 AM To: Scott Weitzenkamp (sweitzen); general at lists.openfabrics.org Subject: RE: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS If I understand your information correct, there is no way of creating a fully redundant setup (2 servers connected with 2 paths to eachother end-to-end) with infiniband using VERBS or SDP. Koen ________________________________ Van: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Verzonden: do 8/03/2007 17:55 Aan: SEGERS Koen; general at lists.openfabrics.org Onderwerp: RE: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS "ipoibcfg merge" only handles IPoIB, not SDP, and it's active/passive. ________________________________ From: SEGERS Koen [mailto:Koen.SEGERS at VRT.BE] Sent: Thursday, March 08, 2007 1:03 AM To: Scott Weitzenkamp (sweitzen); general at lists.openfabrics.org Subject: RE: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS Are you talking about a kernel patch when you refer to the "bonding kernel driver"? I can't find a specific bonding command that allows bonding two or more ports. So if I understand it correct, with SDP you can't have redundancy (active/passive) or aggregation (active/active) with the current OFED-1.2 driver. Renaud Larsen of Cisco told us that bonding is possible in the Topspin driver with the "ipoibcfg merge" command. We are wondering if this also applies for SDP. That is why we are very interested in the beta drivers of Topspin! We are supposed to get them (from Renaud) within a few days, but if you can send it to me earlier, it is always better :) Greetings, Koen ________________________________ Van: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Verzonden: do 8/03/2007 0:01 Aan: SEGERS Koen; general at lists.openfabrics.org Onderwerp: RE: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS I have not tried the OFED 1.2 IPoIB bonding kernel driver, and can only speak for the userspace IPoIB HA ipoib_ha.pl script. Both Topspin IPoIB and OFED IPoIB have active/passive IPoIB high availability, neither can aggregate IPoIB throughput, and neither has SDP high availability. We will have Tosppin driver SLES10 drivers in beta soon, let me know if you are interested. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of SEGERS Koen Sent: Wednesday, March 07, 2007 6:59 AM To: general at lists.openfabrics.org Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/orVERBS Hi all! We are trying to bond two ports on 1 HCA so that we are able aggregate the throughput. We are also interested in bonding ports of different HCA's. Is this possible with the OFED driver? If so, can you give the command? We know TopSpin has support for this feature. Sadly, Topspin has no driver that runs on our system (SLES 10). We currently installed OFED-1.2 of 20070306 and the stable OFED-1.1 driver, but we can't figure out how this bonding is started in either versions. It is important that we offload the bonding. We don't want to use the standard linux bonding. That is why we think that bonding over different HCA's is not going to work. Is this assumption correct? Is bonding possible when running SDP? And VERBS? Greetings Koen *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Fri Mar 9 15:10:44 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 09 Mar 2007 15:10:44 -0800 Subject: [ofa-general] IPOIB CM (NOSRQ) patch for review In-Reply-To: (Pradeep Satyanarayana's message of "Fri, 9 Mar 2007 13:01:23 -0700") References: Message-ID: > +EXTRA_CFLAGS += -DIPOIB_CM_NOSRQ This type of compile-time selection is obviously unacceptable for anything that we actually merge upstream. What is needed is for the IPoIB driver to decide at runtime what to do, so that on a system with multiple different types of HCAs installed, IPoIB CM uses SRQs on the HCAs that support SRQs, and does not use SRQs on the HCAs that don't. Not to mention the fact that basically mixing together two different implementations with a liberal sprinkling of #ifdef IPOIB_CM_NOSRQ makes the code basically unreadable and unmaintainable. > + /* This is missing in Michael's code -Do we need this */ seems like it would be easy to answer this question -- just try it without the change. And I think the answer is no, there's no reason to move QPs that are not used to send data to the RTS state. > void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) > +#ifdef IPOIB_CM_NOSRQ > + spin_lock_irqsave(&priv->lock, flags); > + list_for_each_entry(rx_ptr, &priv->cm.passive_ids, list) > + if(qp_num == rx_ptr->qp->qp_num) { > + found = 1; > + break; > + } > + spin_unlock_irqrestore(&priv->lock, flags); This seems crazy -- you do a linear search through a list of QPs (which potentially has 100s of entries) for every receive completion! Just the spinlock alone is something we would want to avoid in the hot performance path. > + /* Temporary hack till ehca fixes wc->qp = NULL */ Why do you need wc->qp without SRQ? Surely the wr_id alone should be enough to figure out which QP the work request was posted on. - R. From mst at mellanox.co.il Thu Mar 8 11:21:27 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Mar 2007 21:21:27 +0200 Subject: [ofa-general] Re: [ewg] Re: OFED 1.2 beta blocking bugs In-Reply-To: References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> <45F054CC.8020107@ichips.intel.com> <45F0606D.7050005@ichips.intel.com> Message-ID: <20070308192127.GB17114@mellanox.co.il> > Quoting Scott Weitzenkamp (sweitzen) : > Subject: RE: [ewg] Re: OFED 1.2 beta blocking bugs > > > Going back to 2.6.20 (pre-multicast changes), this area of > > the code looks like > > it has the same race. Was IPoIB HA testing done on 2.6.20 or > > earlier versions > > of the code, and if so, were any issues found? (I'm not sure > > we've found all of > > the problems yet.) > > No testing was done with 2.6.20 AFAIK. OFED 1.1 has basically 2.6.20 code. One bug in core mad layer was uncovered, fixed after OFED 1.1 -- MST From mst at mellanox.co.il Thu Mar 8 11:23:46 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Mar 2007 21:23:46 +0200 Subject: [ofa-general] Re: [ewg] Re: OFED 1.2 beta blocking bugs In-Reply-To: References: <45F054CC.8020107@ichips.intel.com> Message-ID: <20070308192346.GC17114@mellanox.co.il> > I'll post a patch later today if I > get a chance. Unfortunately I haven't really kept up with all the > OFED built stuff -- does anyone know an easy way for Scott to take a > kernel patch and rebuild his OFED install? If one uses the tarball, one just needs to drop a patch in kernel_patches/fixes and it will be applied. For RPMs, the procedure one may try OFED 1.1 procedure. https://wiki.openfabrics.org/tiki-index.php?page=OFED+Support I am not 100% sure this will work for 1.2 but one can try. -- MST From boris at mellanox.com Fri Mar 9 20:39:56 2007 From: boris at mellanox.com (Boris Shpolyansky) Date: Fri, 9 Mar 2007 20:39:56 -0800 Subject: [ofa-general] uDAPL question Message-ID: <1E3DCD1C63492545881FACB6063A57C1D522CD@mtiexch01.mti.com> Hi, I'm trying to get simple Intel MPI benchmark running over IB (uDAPL) using OFED-1.1 stack. I'm consistently getting the following error: [root at ibd005 ~]# ./runjob_I_MPI.boris 2 Task 0 of 2 tasks started on host ibd005.ibd.mti.com clock_resolution = 1.00e-06 s Task 1 of 2 tasks started on host ibd006.ibd.mti.com [0:ibd005] unexpected DAPL event 4006 from 1:ibd006 [1:ibd006] unexpected DAPL event 4006 from 0:ibd005 rank 0 in job 14 ibd005_36193 caused collective abort of all ranks exit status of rank 0: return code 254 I did some digging and found out that event 4006 (actually 0x4006) means DAT_CONNECTION_EVENT_BROKEN and it is returned by function dat_rmr_bind. So my question is why this function consistently fails. I'm using standard dat.conf file: OpenIB-cma u1.2 nonthreadsafe default /usr/local/ofed/lib64/libdaplcma.so mv_dapl.1.2 "ib0 0" "" Appreciate your help, Boris Shpolyansky -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at lists.openfabrics.org Sat Mar 10 02:14:24 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sat, 10 Mar 2007 02:14:24 -0800 (PST) Subject: [ofa-general] ofa_1_2_kernel 20070310-0200 daily build status Message-ID: <20070310101424.A698BE603C0@openfabrics.org> This email was generated automatically, please do not reply Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.19 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.14 Passed on powerpc with linux-2.6.18 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.13 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.14 Passed on ia64 with linux-2.6.13 Passed on powerpc with linux-2.6.15 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.12 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.5-7.244-smp Failed: From sweitzen at cisco.com Sat Mar 10 14:45:25 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sat, 10 Mar 2007 14:45:25 -0800 Subject: [ofa-general] uDAPL question In-Reply-To: <1E3DCD1C63492545881FACB6063A57C1D522CD@mtiexch01.mti.com> References: <1E3DCD1C63492545881FACB6063A57C1D522CD@mtiexch01.mti.com> Message-ID: What version of Intel MPI are you using? ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Boris Shpolyansky Sent: Friday, March 09, 2007 8:40 PM To: general at lists.openfabrics.org Subject: [ofa-general] uDAPL question Hi, I'm trying to get simple Intel MPI benchmark running over IB (uDAPL) using OFED-1.1 stack. I'm consistently getting the following error: [root at ibd005 ~]# ./runjob_I_MPI.boris 2 Task 0 of 2 tasks started on host ibd005.ibd.mti.com clock_resolution = 1.00e-06 s Task 1 of 2 tasks started on host ibd006.ibd.mti.com [0:ibd005] unexpected DAPL event 4006 from 1:ibd006 [1:ibd006] unexpected DAPL event 4006 from 0:ibd005 rank 0 in job 14 ibd005_36193 caused collective abort of all ranks exit status of rank 0: return code 254 I did some digging and found out that event 4006 (actually 0x4006) means DAT_CONNECTION_EVENT_BROKEN and it is returned by function dat_rmr_bind. So my question is why this function consistently fails. I'm using standard dat.conf file: OpenIB-cma u1.2 nonthreadsafe default /usr/local/ofed/lib64/libdaplcma.so mv_dapl.1.2 "ib0 0" "" Appreciate your help, Boris Shpolyansky -------------- next part -------------- An HTML attachment was scrubbed... URL: From boris at mellanox.com Sat Mar 10 15:02:26 2007 From: boris at mellanox.com (Boris Shpolyansky) Date: Sat, 10 Mar 2007 15:02:26 -0800 Subject: [ofa-general] uDAPL question Message-ID: <1E3DCD1C63492545881FACB6063A57C1D522CF@mtiexch01.mti.com> 3.0 Boris Shpolyansky Application Engineer Mellanox Technologies Inc. 2900 Stender Way Santa Clara, CA 95054 Tel.: (408) 916 0014 Fax: (408) 970 3403 Cell: (408) 834 9365 www.mellanox.com ----- Original Message ----- From: Scott Weitzenkamp (sweitzen) To: Boris Shpolyansky; general at lists.openfabrics.org Sent: Sat Mar 10 14:45:25 2007 Subject: RE: [ofa-general] uDAPL question What version of Intel MPI are you using? ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Boris Shpolyansky Sent: Friday, March 09, 2007 8:40 PM To: general at lists.openfabrics.org Subject: [ofa-general] uDAPL question Hi, I'm trying to get simple Intel MPI benchmark running over IB (uDAPL) using OFED-1.1 stack. I'm consistently getting the following error: [root at ibd005 ~]# ./runjob_I_MPI.boris 2 Task 0 of 2 tasks started on host ibd005.ibd.mti.com clock_resolution = 1.00e-06 s Task 1 of 2 tasks started on host ibd006.ibd.mti.com [0:ibd005] unexpected DAPL event 4006 from 1:ibd006 [1:ibd006] unexpected DAPL event 4006 from 0:ibd005 rank 0 in job 14 ibd005_36193 caused collective abort of all ranks exit status of rank 0: return code 254 I did some digging and found out that event 4006 (actually 0x4006) means DAT_CONNECTION_EVENT_BROKEN and it is returned by function dat_rmr_bind. So my question is why this function consistently fails. I'm using standard dat.conf file: OpenIB-cma u1.2 nonthreadsafe default /usr/local/ofed/lib64/libdaplcma.so mv_dapl.1.2 "ib0 0" "" Appreciate your help, Boris Shpolyansky -------------- next part -------------- An HTML attachment was scrubbed... URL: From hussein at nationaltakaful.com Sat Mar 10 21:01:47 2007 From: hussein at nationaltakaful.com (Alhussein A. Mahmod) Date: Sun, 11 Mar 2007 08:01:47 +0300 Subject: [ofa-general] [openib-general] [noreply@googlegroups.com: Posting error: Message-ID: <6C09F96EE6CFB24CAA554C30DA8736853BE865@Mails.ntic.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at lists.openfabrics.org Sat Mar 10 21:56:49 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Sat, 10 Mar 2007 21:56:49 -0800 (PST) Subject: [ofa-general] [Bug 441] IPOIB build faild on RHEL5 In-Reply-To: Message-ID: <20070311055649.AC168E603C6@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=441 mst at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #1 from mst at mellanox.co.il 2007-03-10 21:56 ------- AFAIK RHEL5 isn't out yet. Is this on RHEL5 beta? If yes I don't think it's a supported platform. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Sat Mar 10 22:09:35 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Mar 2007 08:09:35 +0200 Subject: [ofa-general] Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QPdestroy In-Reply-To: References: Message-ID: <20070311060858.GK17114@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QPdestroy > > > It is common practice to put a pointer/index to a per-QP > > structure inside the wrid. This data is available after poll_cq > > returns, when cq lock is not taken. If this pointer is used > > directly inside the event handler, the ULP that is moving QP to > > reset has no way to know when is it safe to free data it points to, > > unless the verbs provider synchronizes with the IRQ handler > > before the verbs returns. > > Does this fix any problem for in-tree (or OFED) drivers? Yes. Example: IPoIB UD (ipoib_ib_dev_stop) has an assumption that WRs will complete within several seconds. This might not be the case. It should modify QP to reset instead and that should guarantee no more completions will be polled. > Because I'm > not convinced this is something that a low-level driver should try to > handle. A ULP that suffers from this that polls a CQ from a > workqueue, say, rather than an interrupt remains broken even after > this change. Yes but such a ULP can flush the workqueue before freeing any memory. ULP can't do this for callbacks coming from low level driver. > And my gut feeling is that this type of problem is > something a ULP should handle by not getting into this situation in > the first place. To handle this, ULP would need a capability to sync with the completion IRQ, which is lacking in our API. > BTW, have you had a chance to test the other changes and see if they > fix the IPoIB CM issue? There were some failures that seem unrelated. Hope to let you know soon. -- MST From mst at mellanox.co.il Sat Mar 10 22:43:32 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Mar 2007 08:43:32 +0200 Subject: [ofa-general] Re: [PATCH]] ucma backport to 2.6.19 In-Reply-To: <000101c761c0$cf0d7810$ff0da8c0@amr.corp.intel.com> References: <20070308192530.GD17114@mellanox.co.il> <000101c761c0$cf0d7810$ff0da8c0@amr.corp.intel.com> Message-ID: <20070311064332.GN17114@mellanox.co.il> > Quoting Sean Hefty : > Subject: [PATCH]] ucma backport to 2.6.19 > > >Post a replacement to 2_misc_device_to_2_6_19.patch, we'll test. > > I did not test this patch, but you can try replacing the contents of > the 2_misc_device_to_2_6_19.patch with the changes below. (It's > possible that this may lead to some conflict further down in the patch > chain...) With the cross-build setup, its actually quite easy to test a patch on all kernels. But oh well. I guess I'll do it for now. > The function prototype for show_abi_version changed between > 2.6.20 to 2.6.19; this was the missing piece in the original backport > patch. Thanks, that must be it. I looked at this briefly but missed the fact that the # of args changed. > I would have expected a build warning for this. Unfortunately older and distro kernels generate a ton of warnings when built with a recent cross-compiler. So it's easy to miss a valid one. -- MST From mst at mellanox.co.il Sat Mar 10 23:13:00 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Mar 2007 09:13:00 +0200 Subject: [ofa-general] Re: IPOIB CM (NOSRQ) patch for review In-Reply-To: References: Message-ID: <20070311071300.GQ17114@mellanox.co.il> > Quoting Pradeep Satyanarayana : > Subject: IPOIB CM (NOSRQ) patch for review > > > Here is a first version of the IPOIB_CM_NOSRQ patch for review. Please, avoid sending multiple copies of the patch. This messages had 3 of them: - plain text - HTML version - attachment And the multipart is set in such a way that one sees the HTML part by default whic is most likely not the intended effect. Just send a plain text message with an attachment inline, please. -- MST From greg.lindahl at qlogic.com Sat Mar 10 22:59:02 2007 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Sat, 10 Mar 2007 22:59:02 -0800 Subject: [ofa-general] In Myrinet we used to face error GM port not available for open, is applicable for IB also In-Reply-To: <183382.79921.qm@web35104.mail.mud.yahoo.com> References: <183382.79921.qm@web35104.mail.mud.yahoo.com> Message-ID: <20070311065902.GA7228@localhost.localdomain> On Fri, Mar 09, 2007 at 07:57:20AM -0800, Bala wrote: > In our Myrinet at a time more number of > users try to use the myrinet we used to get error > saying "GM port not available for open", is there > any such limitation in IB also?? The answer for IB is "no", IB doesn't have a concept similar to a GM "port". -- greg From mst at mellanox.co.il Thu Mar 8 11:17:58 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Mar 2007 21:17:58 +0200 Subject: [ofa-general] Re: [ewg] Re: OFED 1.2 beta blocking bugs In-Reply-To: <45F0606D.7050005@ichips.intel.com> References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> <45F054CC.8020107@ichips.intel.com> <45F0606D.7050005@ichips.intel.com> Message-ID: <20070308191758.GA17114@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [ewg] Re: OFED 1.2 beta blocking bugs > > >From a quick look at the code, it does look like there are some races > > in ipoib_multicast.c. The place where a QP is actually attached to a > > group is essentially (trimming debug prints): > > > > if (test_and_set_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) > > return 0; > > > > ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), > > &mcast->mcmember.mgid); > > > > and the place where a QP is detached is: > > > > if (test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) { > > ret = ipoib_mcast_detach(dev, be16_to_cpu(mcast->mcmember.mlid), > > &mcast->mcmember.mgid); > > Going back to 2.6.20 (pre-multicast changes), this area of the code looks like > it has the same race. Was IPoIB HA testing done on 2.6.20 or earlier versions > of the code, yes > and if so, were any issues found? (I'm not sure we've found all of > the problems yet.) no -- MST From mst at mellanox.co.il Sun Mar 11 00:09:49 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Mar 2007 10:09:49 +0200 Subject: [ofa-general] Re: [mstflint] fix warnings that were reported by sparse In-Reply-To: <1173283074.9453.1.camel@mtldesk014.lab.mtl.com> References: <1173283074.9453.1.camel@mtldesk014.lab.mtl.com> Message-ID: <20070311080949.GD6858@mellanox.co.il> > Quoting Dotan Barak : > Subject: [mstflint] fix warnings that were reported by sparse > > Fixed warnings that were reported by sparse. > > Signed-off-by: Dotan Barak Applied to master only. -- MST From mst at mellanox.co.il Sun Mar 11 01:12:53 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Mar 2007 11:12:53 +0200 Subject: [ofa-general] RFC: pull from 2.6.21 Message-ID: <20070311091253.GF6858@mellanox.co.il> Hi! I'm considering pulling git from linus 2.6.21-rc3 - this would make development easier by cutting down the number of patches we have to apply. 2 things I'm still checking and I'd like to get confirmation on: 1. merged_sean_rdma_dev_ofed_1_2.patch - I think all multicast bits are merged in 2.6.21-rc3 so we only have to take code from local_sa branch now. Right? 2. ipath-xxx - except ipath-26-wc-qp.patch, these aren't upstream so they still have to be applied. Correct? 3. ehca-xxx - these all seem to be applied -- MST From tziporet at mellanox.co.il Sun Mar 11 01:53:32 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 11 Mar 2007 11:53:32 +0200 Subject: [ofa-general] RE: [ewg] RE: OFED 1.2 beta blocking bugs In-Reply-To: <011d01c7625a$5d3854e0$05c8a8c0@DIEGO> References: <20070307195553.GB9817@mellanox.co.il><45EF303F.5090607@ichips.intel.com><20070308140703.GC23302@mellanox.co.il><45F04DEC.10000@ichips.intel.com><45EF040F.3090305@veritas.com><1173385548.465.104193.camel@hal.voltaire.com> <011d01c7625a$5d3854e0$05c8a8c0@DIEGO> Message-ID: <6C2C79E72C305246B504CBA17B5500C9A0DF92@mtlexch01.mtl.com> > And, another question, this switch: > Flextronics F-X430047: 24-port 4x DDR managed infiniband switch > has an embedded SM? No it does not - you must run opensm on one of your nodes. Tziporet From ogerlitz at voltaire.com Sun Mar 11 05:17:25 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 11 Mar 2007 14:17:25 +0200 Subject: [ofa-general] Re: [ewg] Re: OFED 1.2 beta blocking bugs In-Reply-To: <20070308192127.GB17114@mellanox.co.il> References: <20070307195553.GB9817@mellanox.co.il> <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> <45F054CC.8020107@ichips.intel.com> <45F0606D.7050005@ichips.intel.com> <20070308192127.GB17114@mellanox.co.il> Message-ID: <45F3F355.8020109@voltaire.com> Michael S. Tsirkin wrote: >> Quoting Scott Weitzenkamp (sweitzen) : >> Subject: RE: [ewg] Re: OFED 1.2 beta blocking bugs >> >>> Going back to 2.6.20 (pre-multicast changes), this area of >>> the code looks like >>> it has the same race. Was IPoIB HA testing done on 2.6.20 or >>> earlier versions >>> of the code, and if so, were any issues found? (I'm not sure >>> we've found all of >>> the problems yet.) >> No testing was done with 2.6.20 AFAIK. > OFED 1.1 has basically 2.6.20 code. One bug in core mad layer was uncovered, > fixed after OFED 1.1 what ?!?!?! wasn't OFED 1.1 based on 2.6.18 ? 2.6.20-rc1 is dated to Dec 14 2006 which is way after OFED 1.1 was released. Or. From mst at mellanox.co.il Sun Mar 11 05:30:59 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Mar 2007 14:30:59 +0200 Subject: [ofa-general] Re: [ewg] Re: OFED 1.2 beta blocking bugs In-Reply-To: <45F3F355.8020109@voltaire.com> References: <45EF303F.5090607@ichips.intel.com> <20070308140703.GC23302@mellanox.co.il> <45F04DEC.10000@ichips.intel.com> <45F054CC.8020107@ichips.intel.com> <45F0606D.7050005@ichips.intel.com> <20070308192127.GB17114@mellanox.co.il> <45F3F355.8020109@voltaire.com> Message-ID: <20070311123059.GB28400@mellanox.co.il> > Quoting Or Gerlitz : > Subject: Re: [ofa-general] Re: [ewg] Re: OFED 1.2 beta blocking bugs > > Michael S. Tsirkin wrote: > >> Quoting Scott Weitzenkamp (sweitzen) : > >> Subject: RE: [ewg] Re: OFED 1.2 beta blocking bugs > >> > >>> Going back to 2.6.20 (pre-multicast changes), this area of > >>> the code looks like > >>> it has the same race. Was IPoIB HA testing done on 2.6.20 or > >>> earlier versions > >>> of the code, and if so, were any issues found? (I'm not sure > >>> we've found all of > >>> the problems yet.) > >> No testing was done with 2.6.20 AFAIK. > > > OFED 1.1 has basically 2.6.20 code. One bug in core mad layer was uncovered, > > fixed after OFED 1.1 > > what ?!?!?! wasn't OFED 1.1 based on 2.6.18 ? Yes. With a ton of patches that went in for 2.6.19 - so OFED 1.1 is actually close to 2.6.19. > 2.6.20-rc1 is dated to Dec 14 2006 which is way after OFED 1.1 was released. I should have said 2.6.19. -- MST From HNGUYEN at de.ibm.com Sun Mar 11 05:57:38 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Sun, 11 Mar 2007 13:57:38 +0100 Subject: [ofa-general] RFC: pull from 2.6.21 In-Reply-To: <20070311091253.GF6858@mellanox.co.il> Message-ID: Hi, > I'm considering pulling git from linus 2.6.21-rc3 - this would > make development easier by cutting down the number of patches > we have to apply. > 3. ehca-xxx - these all seem to be applied Mostly, except for "ehca: Fix sync between completion handler and destroy cq", which is queued for rc4. Regards Nam From mst at mellanox.co.il Sun Mar 11 06:01:04 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Mar 2007 15:01:04 +0200 Subject: [ofa-general] Re: RFC: pull from 2.6.21 In-Reply-To: References: <20070311091253.GF6858@mellanox.co.il> Message-ID: <20070311130103.GC28400@mellanox.co.il> > Quoting Hoang-Nam Nguyen : > Subject: Re: RFC: pull from 2.6.21 > > Hi, > > I'm considering pulling git from linus 2.6.21-rc3 - this would > > make development easier by cutting down the number of patches > > we have to apply. > > 3. ehca-xxx - these all seem to be applied > Mostly, except for "ehca: Fix sync between completion handler and destroy > cq", > which is queued for rc4. Is that ehca_3_fix_race_condition_locking_issues.patch? Seems to be upstream already, just not in rc3. -- MST From dotanb at dev.mellanox.co.il Sun Mar 11 06:42:14 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Sun, 11 Mar 2007 15:42:14 +0200 Subject: [ofa-general] [PATCH - libibumad] Added release_ca in error flow to prevent resource leak Message-ID: <1173620535.11125.3.camel@mtldesk014.lab.mtl.com> Added release_ca in error flow to prevent resource leak. Signed-off-by: Dotan Barak --- Index: gen2_devel_user/src/userspace/management/libibumad/src/umad.c =================================================================== --- gen2_devel_user.orig/src/userspace/management/libibumad/src/umad.c 2007-02-08 17:01:40.000000000 +0200 +++ gen2_devel_user/src/userspace/management/libibumad/src/umad.c 2007-02-12 17:13:22.000000000 +0200 @@ -538,8 +538,10 @@ umad_get_ca_portguids(char *ca_name, uin return -1; if (portguids) { - if (ca.numports + 1 > max) + if (ca.numports + 1 > max) { + release_ca(&ca); return -ENOMEM; + } for (i = 0; i <= ca.numports; i++) portguids[ports++] = ca.ports[i] ? ca.ports[i]->port_guid : 0; From mst at mellanox.co.il Sun Mar 11 06:50:51 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Mar 2007 15:50:51 +0200 Subject: [ofa-general] lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!) In-Reply-To: References: <45E552FC.4040305@mellanox.co.il> Message-ID: <20070311135051.GA31985@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! > > >Feb 27 17:47:52 sw169 kernel: [] _spin_lock_irqsave+0x15/0x24 > >Feb 27 17:47:52 sw169 kernel: [] :ib_ipoib:ipoib_neigh_destructor+0xc2/0x139 > > It looks like this is deadlocking trying to take priv->lock in ipoib_neigh_destructor(). > One idea I just had would be to build a kernel with CONFIG_PROVE_LOCKING > turned on, and then rerun this test. There's a good chance that this would > diagnose the deadlock. (I don't have good access to my test machines right now, or > else I would do it myself) OK, I did that. But I get [13440.761857] INFO: trying to register non-static key. [13440.766903] the code is fine but needs lockdep annotation. [13440.772455] turning off the locking correctness validator. and I am not sure what triggers this, or how to fix it to have the validator actually do its job. Ingo, what key does the message refer to? The stack dump seems to point to drivers/infiniband/ulp/ipoib/ipoib_main.c line 829. Full message below: [13440.761857] INFO: trying to register non-static key. [13440.766903] the code is fine but needs lockdep annotation. [13440.772455] turning off the locking correctness validator. [13440.778008] [] __lock_acquire+0xae4/0xbb9 [13440.783078] [] lock_acquire+0x56/0x71 [13440.787784] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] [13440.794412] [] _spin_lock_irqsave+0x32/0x41 [13440.799649] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] [13440.806275] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] [13440.812897] [] dst_run_gc+0xc/0x118 [13440.817439] [] run_timer_softirq+0x37/0x16b [13440.822673] [] dst_run_gc+0x0/0x118 [13440.827221] [] neigh_destroy+0xbe/0x104 [13440.832114] [] dst_destroy+0x4d/0xab [13440.836751] [] dst_run_gc+0x55/0x118 [13440.841384] [] run_timer_softirq+0x108/0x16b [13440.846711] [] __do_softirq+0x5a/0xd5 [13440.851427] [] trace_hardirqs_on+0x106/0x141 [13440.856754] [] __do_softirq+0x69/0xd5 [13440.861470] [] do_softirq+0x37/0x4d [13440.866016] [] smp_apic_timer_interrupt+0x6b/0x77 [13440.871774] [] default_idle+0x3b/0x54 [13440.876491] [] default_idle+0x3b/0x54 [13440.881211] [] apic_timer_interrupt+0x33/0x38 [13440.886624] [] default_idle+0x3b/0x54 [13440.891342] [] default_idle+0x3d/0x54 [13440.896061] [] cpu_idle+0xa2/0xbb [13440.900436] ======================= [13768.711447] BUG: spinlock lockup on CPU#1, swapper/0, c0687880 [13768.717353] [] _raw_spin_lock+0xda/0xfd [13768.722247] [] _spin_lock_irqsave+0x39/0x41 [13768.727486] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] [13768.734110] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] [13768.740735] [] dst_run_gc+0xc/0x118 [13768.745276] [] run_timer_softirq+0x37/0x16b [13768.750517] [] dst_run_gc+0x0/0x118 [13768.755061] [] neigh_destroy+0xbe/0x104 [13768.759955] [] dst_destroy+0x4d/0xab [13768.764586] [] dst_run_gc+0x55/0x118 [13768.769218] [] run_timer_softirq+0x108/0x16b [13768.774542] [] __do_softirq+0x5a/0xd5 [13768.779261] [] trace_hardirqs_on+0x106/0x141 [13768.784588] [] __do_softirq+0x69/0xd5 [13768.789308] [] do_softirq+0x37/0x4d [13768.793851] [] smp_apic_timer_interrupt+0x6b/0x77 [13768.799609] [] default_idle+0x3b/0x54 [13768.804326] [] default_idle+0x3b/0x54 [13768.809054] [] apic_timer_interrupt+0x33/0x38 [13768.814471] [] default_idle+0x3b/0x54 [13768.819187] [] default_idle+0x3d/0x54 [13768.823903] [] cpu_idle+0xa2/0xbb [13768.828279] ======================= -- MST From mst at mellanox.co.il Sun Mar 11 08:04:31 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Mar 2007 17:04:31 +0200 Subject: [ofa-general] Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!) In-Reply-To: <20070311135051.GA31985@mellanox.co.il> References: <45E552FC.4040305@mellanox.co.il> <20070311135051.GA31985@mellanox.co.il> Message-ID: <20070311150431.GE31985@mellanox.co.il> After adding some printks, I started getting these: [ 597.036720] BUG: MAX_STACK_TRACE_ENTRIES too low! [ 597.041546] turning off the locking correctness validator. [ 597.047135] [] save_trace+0x8a/0x8f [ 597.051751] [] mark_lock+0x65/0x3ff [ 597.056366] [] save_trace+0x3e/0x8f [ 597.060980] [] add_lock_to_list+0x62/0x85 [ 597.066116] [] __lock_acquire+0x3f4/0xbb9 [ 597.071252] [] send_mad+0x79/0x103 [ib_sa] [ 597.076474] [] idr_get_new_above_int+0x13c/0x216 [ 597.082225] [] lock_acquire+0x56/0x71 [ 597.087018] [] send_mad+0x79/0x103 [ib_sa] [ 597.092240] [] _spin_lock_irqsave+0x32/0x41 [ 597.097547] [] send_mad+0x79/0x103 [ib_sa] [ 597.102770] [] send_mad+0x79/0x103 [ib_sa] [ 597.107989] [] ib_sa_path_rec_get+0x134/0x172 [ib_sa] [ 597.114166] [] path_rec_start+0x115/0x143 [ib_ipoib] [ 597.120254] [] path_rec_completion+0x0/0x4f4 [ib_ipoib] [ 597.126610] [] path_rec_create+0x77/0x9d [ib_ipoib] [ 597.132617] [] ipoib_start_xmit+0x441/0x57b [ib_ipoib] [ 597.138888] [] _spin_unlock_irqrestore+0x34/0x39 [ 597.144635] [] trace_hardirqs_on+0x106/0x141 [ 597.150035] [] dev_queue_xmit+0x109/0x245 [ 597.155167] [] __mod_timer+0x94/0x9e [ 597.159871] [] dev_hard_start_xmit+0x1be/0x21d [ 597.165438] [] __qdisc_run+0xd7/0x190 [ 597.170226] [] dev_queue_xmit+0x135/0x245 [ 597.175360] [] arp_process+0x2c0/0x512 [ 597.180234] [] mthca_tavor_interrupt+0xf3/0x12b [ib_mthca] [ 597.186855] [] netif_receive_skb+0x1c4/0x1da [ 597.192254] [] trace_hardirqs_on+0x106/0x141 [ 597.197648] [] process_backlog+0x94/0x107 [ 597.202785] [] net_rx_action+0x9a/0x15e [ 597.207743] [] __do_softirq+0x69/0xd5 [ 597.212530] [] do_softirq+0x37/0x4d [ 597.217147] [] do_IRQ+0x5c/0x72 [ 597.221415] [] common_interrupt+0x2e/0x34 [ 597.226549] [] default_idle+0x3b/0x54 [ 597.231337] [] default_idle+0x3d/0x54 [ 597.236124] [] cpu_idle+0xa2/0xbb [ 597.240567] ======================= And sometimes these: [ 404.493572] KERNEL: assertion (!timer_pending(&dev->watchdog_timer)) failed at net/sched/sch_generic.c (608) -- MST From mst at mellanox.co.il Sun Mar 11 08:22:13 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Mar 2007 17:22:13 +0200 Subject: [ofa-general] Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!) In-Reply-To: <20070311150431.GE31985@mellanox.co.il> References: <45E552FC.4040305@mellanox.co.il> <20070311135051.GA31985@mellanox.co.il> <20070311150431.GE31985@mellanox.co.il> Message-ID: <20070311152213.GH31985@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!) > > > After adding some printks, I started getting these: > > [ 597.036720] BUG: MAX_STACK_TRACE_ENTRIES too low! > [ 597.041546] turning off the locking correctness validator. I looked at our stack usage a bit. It seems some work is in order. $ make checkstack | grep ib_ 0x00000603 mthca_init_hca [ib_mthca]: 764 0x000014ed mthca_init_hca [ib_mthca]: 764 0x000065ae ipoib_cm_tx_start [ib_ipoib]: 368 0x00006b0b ipoib_cm_tx_start [ib_ipoib]: 368 0x0000135f ib_uverbs_query_device [ib_uverbs]: 348 0x000015f9 ib_uverbs_query_device [ib_uverbs]: 348 0x000005d0 ib_ucm_init_qp_attr [ib_ucm]: 300 0x00000697 ib_ucm_init_qp_attr [ib_ucm]: 300 0x00007f9e ipoib_path_seq_show [ib_ipoib]: 264 0x00008092 ipoib_path_seq_show [ib_ipoib]: 264 0x00005b56 ipoib_cm_rx_handler [ib_ipoib]: 220 0x00005eec ipoib_cm_rx_handler [ib_ipoib]: 220 0x00007934 ipoib_cm_tx_handler [ib_ipoib]: 208 0x00007ce0 ipoib_cm_tx_handler [ib_ipoib]: 208 0x000032fe ib_uverbs_create_qp [ib_uverbs]: 192 0x000036fd ib_uverbs_create_qp [ib_uverbs]: 192 0x000028a9 srp_reset_host [ib_srp]: 192 0x00002a96 srp_reset_host [ib_srp]: 192 0x00001c99 show_sys_image_guid [ib_core]: 188 0x00001d2b show_sys_image_guid [ib_core]: 188 0x000001f9 ib_sa_service_rec_callback [ib_sa]: 180 0x00000234 ib_sa_service_rec_callback [ib_sa]: 180 0x00001b3c path_rec_completion [ib_ipoib]: 180 0x00002020 path_rec_completion [ib_ipoib]: 180 0x000070cf ipoib_cm_handle_rx_wc [ib_ipoib]: 180 0x00007402 ipoib_cm_handle_rx_wc [ib_ipoib]: 180 0x000009a7 srp_create_target [ib_srp]: 176 0x0000125f srp_create_target [ib_srp]: 176 0x00000d9d ib_cm_listen [ib_cm]: 172 0x000010b3 ib_cm_listen [ib_cm]: 172 0x00004455 ipoib_mcast_send [ib_ipoib]: 172 0x000048e0 ipoib_mcast_send [ib_ipoib]: 172 0x000015c1 ipoib_start_xmit [ib_ipoib]: 164 0x00001b2d ipoib_start_xmit [ib_ipoib]: 164 0x000056c8 mthca_make_profile [ib_mthca]: 160 0x00006051 mthca_make_profile [ib_mthca]: 160 0x00002abb ipoib_ib_dev_stop [ib_ipoib]: 160 0x00002d19 ipoib_ib_dev_stop [ib_ipoib]: 160 0x0000202b ib_uverbs_query_qp [ib_uverbs]: 156 0x000022c0 ib_uverbs_query_qp [ib_uverbs]: 156 0x00005269 ipoib_init_qp [ib_ipoib]: 152 0x000053bc ipoib_init_qp [ib_ipoib]: 152 0x0000327f ipoib_mcast_join [ib_ipoib]: 144 0x0000349d ipoib_mcast_join [ib_ipoib]: 144 0x00002092 ib_find_send_mad [ib_mad]: 140 0x000023fa ib_find_send_mad [ib_mad]: 140 0x000022cf ib_uverbs_modify_qp [ib_uverbs]: 140 0x000024f2 ib_uverbs_modify_qp [ib_uverbs]: 140 0x0000bc8e mthca_modify_qp [ib_mthca]: 136 0x0000c9cc mthca_modify_qp [ib_mthca]: 136 0x00010cb1 mthca_reg_phys_mr [ib_mthca]: 136 0x0001117a mthca_reg_phys_mr [ib_mthca]: 136 0x000035b4 ipoib_mcast_join_finish [ib_ipoib]: 136 0x00003a33 ipoib_mcast_join_finish [ib_ipoib]: 136 0x00000793 iser_cma_handler [ib_iser]: 132 0x00000bc1 iser_cma_handler [ib_iser]: 132 0x00001e37 srp_queuecommand [ib_srp]: 132 0x0000273b srp_queuecommand [ib_srp]: 132 0x00008a5a mthca_poll_cq [ib_mthca]: 128 0x00009204 mthca_poll_cq [ib_mthca]: 128 0x00003a42 ipoib_mcast_join_complete [ib_ipoib]: 128 0x00003e6e ipoib_mcast_join_complete [ib_ipoib]: 128 0x00004a58 ipoib_mcast_restart_task [ib_ipoib]: 128 0x00004eb8 ipoib_mcast_restart_task [ib_ipoib]: 128 0x000038e6 ib_uverbs_create_ah [ib_uverbs]: 116 0x00003ac4 ib_uverbs_create_ah [ib_uverbs]: 116 0x0000f6a5 mthca_process_mad [ib_mthca]: 116 0x0000fa93 mthca_process_mad [ib_mthca]: 116 0x000011ef mcast_work_handler [ib_sa]: 112 0x000016e6 mcast_work_handler [ib_sa]: 112 0x00000a20 ib_ucm_send_req [ib_ucm]: 112 0x00000b7c ib_ucm_send_req [ib_ucm]: 112 0x00001697 ib_post_send_mad [ib_mad]: 112 0x00001b05 ib_post_send_mad [ib_mad]: 112 0x0000030e iser_post_send [ib_iser]: 112 0x000003c5 iser_post_send [ib_iser]: 112 0x00001605 ib_uverbs_query_port [ib_uverbs]: 108 0x00001713 ib_uverbs_query_port [ib_uverbs]: 108 0x0000171d ib_uverbs_create_srq [ib_uverbs]: 108 0x00001946 ib_uverbs_create_srq [ib_uverbs]: 108 0x00002409 ib_mad_completion_handler [ib_mad]: 104 0x00002907 ib_mad_completion_handler [ib_mad]: 104 0x00002454 iser_reg_rdma_mem [ib_iser]: 104 0x00002aa8 iser_reg_rdma_mem [ib_iser]: 104 0x0416 rdma_resolve_ip [ib_addr]: 100 0x066f rdma_resolve_ip [ib_addr]: 100 0x00007dcb ipoib_mcg_seq_show [ib_ipoib]: 100 0x00007e65 ipoib_mcg_seq_show [ib_ipoib]: 100 -- MST From a.p.zijlstra at chello.nl Sun Mar 11 08:25:19 2007 From: a.p.zijlstra at chello.nl (Peter Zijlstra) Date: Sun, 11 Mar 2007 16:25:19 +0100 Subject: [ofa-general] Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!) In-Reply-To: <20070311135051.GA31985@mellanox.co.il> References: <45E552FC.4040305@mellanox.co.il> <20070311135051.GA31985@mellanox.co.il> Message-ID: <1173626719.5182.10.camel@lappy> On Sun, 2007-03-11 at 15:50 +0200, Michael S. Tsirkin wrote: > > Quoting Roland Dreier : > > Subject: Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! > > > > >Feb 27 17:47:52 sw169 kernel: [] _spin_lock_irqsave+0x15/0x24 > > >Feb 27 17:47:52 sw169 kernel: [] :ib_ipoib:ipoib_neigh_destructor+0xc2/0x139 > > > > It looks like this is deadlocking trying to take priv->lock in ipoib_neigh_destructor(). > > One idea I just had would be to build a kernel with CONFIG_PROVE_LOCKING > > turned on, and then rerun this test. There's a good chance that this would > > diagnose the deadlock. (I don't have good access to my test machines right now, or > > else I would do it myself) > > OK, I did that. But I get > [13440.761857] INFO: trying to register non-static key. > [13440.766903] the code is fine but needs lockdep annotation. > [13440.772455] turning off the locking correctness validator. > and I am not sure what triggers this, or how to fix it to have the > validator actually do its job. It usually indicates a spinlock is not properly initialized. Like __SPIN_LOCK_UNLOCKED() used in a non-static context, use spin_lock_init() in these cases. However looking at the code, ipoib_neight_destructor only uses &priv->lock, and that seems to get properly initialized in ipoib_setup() using spin_lock_init(). So either there are other sites that instanciate those objects and forget about the lock init, or the object is corrupted (use after free?) > Ingo, what key does the message refer to? > > The stack dump seems to point to drivers/infiniband/ulp/ipoib/ipoib_main.c line > 829. > > Full message below: > > [13440.761857] INFO: trying to register non-static key. > [13440.766903] the code is fine but needs lockdep annotation. > [13440.772455] turning off the locking correctness validator. > [13440.778008] [] __lock_acquire+0xae4/0xbb9 > [13440.783078] [] lock_acquire+0x56/0x71 > [13440.787784] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13440.794412] [] _spin_lock_irqsave+0x32/0x41 > [13440.799649] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13440.806275] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13440.812897] [] dst_run_gc+0xc/0x118 > [13440.817439] [] run_timer_softirq+0x37/0x16b > [13440.822673] [] dst_run_gc+0x0/0x118 > [13440.827221] [] neigh_destroy+0xbe/0x104 > [13440.832114] [] dst_destroy+0x4d/0xab > [13440.836751] [] dst_run_gc+0x55/0x118 > [13440.841384] [] run_timer_softirq+0x108/0x16b > [13440.846711] [] __do_softirq+0x5a/0xd5 > [13440.851427] [] trace_hardirqs_on+0x106/0x141 > [13440.856754] [] __do_softirq+0x69/0xd5 > [13440.861470] [] do_softirq+0x37/0x4d > [13440.866016] [] smp_apic_timer_interrupt+0x6b/0x77 > [13440.871774] [] default_idle+0x3b/0x54 > [13440.876491] [] default_idle+0x3b/0x54 > [13440.881211] [] apic_timer_interrupt+0x33/0x38 > [13440.886624] [] default_idle+0x3b/0x54 > [13440.891342] [] default_idle+0x3d/0x54 > [13440.896061] [] cpu_idle+0xa2/0xbb > [13440.900436] ======================= > [13768.711447] BUG: spinlock lockup on CPU#1, swapper/0, c0687880 > [13768.717353] [] _raw_spin_lock+0xda/0xfd > [13768.722247] [] _spin_lock_irqsave+0x39/0x41 > [13768.727486] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13768.734110] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13768.740735] [] dst_run_gc+0xc/0x118 > [13768.745276] [] run_timer_softirq+0x37/0x16b > [13768.750517] [] dst_run_gc+0x0/0x118 > [13768.755061] [] neigh_destroy+0xbe/0x104 > [13768.759955] [] dst_destroy+0x4d/0xab > [13768.764586] [] dst_run_gc+0x55/0x118 > [13768.769218] [] run_timer_softirq+0x108/0x16b > [13768.774542] [] __do_softirq+0x5a/0xd5 > [13768.779261] [] trace_hardirqs_on+0x106/0x141 > [13768.784588] [] __do_softirq+0x69/0xd5 > [13768.789308] [] do_softirq+0x37/0x4d > [13768.793851] [] smp_apic_timer_interrupt+0x6b/0x77 > [13768.799609] [] default_idle+0x3b/0x54 > [13768.804326] [] default_idle+0x3b/0x54 > [13768.809054] [] apic_timer_interrupt+0x33/0x38 > [13768.814471] [] default_idle+0x3b/0x54 > [13768.819187] [] default_idle+0x3d/0x54 > [13768.823903] [] cpu_idle+0xa2/0xbb > [13768.828279] ======================= > > > -- > MST > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ From monis at voltaire.com Sun Mar 11 08:27:57 2007 From: monis at voltaire.com (Moni Shoua) Date: Sun, 11 Mar 2007 17:27:57 +0200 Subject: [ofa-general] [RFC] [PATCH v3] IB/ipoib: Add bonding support to IPoIB Message-ID: <45F41FFD.102@voltaire.com> Hi, The post follows the discussions in http://openib.org/pipermail/openib-general/2007-February/032598.html (version 1) and http://openib.org/pipermail/openib-general/2007-February/033434.html (version 2) I got there some comments from Michael Tsirkin that helped me find bugs in the bonding code. I decided to stay with version #1 since IPoIB cleans up its paths and MC lists when a device is stopped. Therefore I assume that hotplug is safe and that using bonding with IPoIB doesn't break it. With the current solution it is still unsafe to unload ib_ipoib before bonding but I tend to agree with Michael and my opinion now is that this should be fixed in the upper layer (i.e. bonding). ------------------------------------------------------- IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb->dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Combing these two flows, there is a hole if some code at ipoib (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev is an ipoib device so for example netdev_priv(n->dev) would be of type struct ipoib_dev_priv. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one. Signed-off-by: Moni Shoua Signed-off-by: Or Gerlitz --- ipoib.h | 4 +++- ipoib_main.c | 26 +++++++++++++------------- ipoib_multicast.c | 2 +- 3 files changed, 17 insertions(+), 15 deletions(-) Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-01-25 11:05:32.000000000 +0200 +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-04 19:32:55.000000000 +0200 @@ -216,6 +216,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_head list; }; @@ -232,7 +233,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-01-25 11:05:32.000000000 +0200 +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-04 19:32:55.000000000 +0200 @@ -248,7 +248,6 @@ static void path_free(struct net_device struct ipoib_neigh *neigh, *tn; struct sk_buff *skb; unsigned long flags; - while ((skb = __skb_dequeue(&path->queue))) dev_kfree_skb_irq(skb); @@ -490,7 +489,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb->dst->neighbour); + neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (!neigh) { ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); @@ -769,32 +768,32 @@ static void ipoib_set_mcast_list(struct static void ipoib_neigh_destructor(struct neighbour *n) { struct ipoib_neigh *neigh; - struct ipoib_dev_priv *priv = netdev_priv(n->dev); + struct ipoib_dev_priv *priv; unsigned long flags; struct ipoib_ah *ah = NULL; - ipoib_dbg(priv, - "neigh_destructor for %06x " IPOIB_GID_FMT "\n", - IPOIB_QPN(n->ha), - IPOIB_GID_RAW_ARG(n->ha + 4)); - - spin_lock_irqsave(&priv->lock, flags); neigh = *to_ipoib_neigh(n); if (neigh) { + priv = netdev_priv(neigh->dev); + ipoib_dbg(priv, + "neigh_destructor for %06x " IPOIB_GID_FMT "\n", + IPOIB_QPN(n->ha), + IPOIB_GID_RAW_ARG(n->ha + 4)); + + spin_lock_irqsave(&priv->lock, flags); if (neigh->ah) ah = neigh->ah; list_del(&neigh->list); ipoib_neigh_free(n->dev, neigh); + spin_unlock_irqrestore(&priv->lock, flags); } - - spin_unlock_irqrestore(&priv->lock, flags); - if (ah) ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) { struct ipoib_neigh *neigh; @@ -803,6 +802,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st return NULL; neigh->neighbour = neighbour; + neigh->dev = dev; *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(&neigh->queue); Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-01-25 11:05:32.000000000 +0200 +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-04 12:21:46.000000000 +0200 @@ -774,7 +774,7 @@ out: if (skb->dst && skb->dst->neighbour && !*to_ipoib_neigh(skb->dst->neighbour)) { - struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour); + struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (neigh) { kref_get(&mcast->ah->ref); From mst at mellanox.co.il Sun Mar 11 08:40:42 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Mar 2007 17:40:42 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v3] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45F41FFD.102@voltaire.com> References: <45F41FFD.102@voltaire.com> Message-ID: <20070311154042.GJ31985@mellanox.co.il> > With the current solution it is still unsafe to unload ib_ipoib before bonding but > I tend to agree with Michael and my opinion now is that this should be fixed in the > upper layer (i.e. bonding). This looks simple. Will we get to see the patch to core bonding code as well soon? I guess bonding support in OFED will need to be patched somehow? -- MST From mst at mellanox.co.il Sun Mar 11 09:12:31 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Mar 2007 18:12:31 +0200 Subject: [ofa-general] Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup detected on CPU#0!) In-Reply-To: <1173626719.5182.10.camel@lappy> References: <45E552FC.4040305@mellanox.co.il> <20070311135051.GA31985@mellanox.co.il> <1173626719.5182.10.camel@lappy> Message-ID: <20070311161231.GC13817@mellanox.co.il> Quoting Peter Zijlstra : Subject: Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup detected on CPU#0!) > On Sun, 2007-03-11 at 15:50 +0200, Michael S. Tsirkin wrote: > > > Quoting Roland Dreier : > > > Subject: Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! > > > > > > >Feb 27 17:47:52 sw169 kernel: [] _spin_lock_irqsave+0x15/0x24 > > > >Feb 27 17:47:52 sw169 kernel: [] :ib_ipoib:ipoib_neigh_destructor+0xc2/0x139 > > > > > > It looks like this is deadlocking trying to take priv->lock in ipoib_neigh_destructor(). > > > One idea I just had would be to build a kernel with CONFIG_PROVE_LOCKING > > > turned on, and then rerun this test. There's a good chance that this would > > > diagnose the deadlock. (I don't have good access to my test machines right now, or > > > else I would do it myself) > > > > OK, I did that. But I get > > [13440.761857] INFO: trying to register non-static key. > > [13440.766903] the code is fine but needs lockdep annotation. > > [13440.772455] turning off the locking correctness validator. > > and I am not sure what triggers this, or how to fix it to have the > > validator actually do its job. > > It usually indicates a spinlock is not properly initialized. Like > __SPIN_LOCK_UNLOCKED() used in a non-static context, use > spin_lock_init() in these cases. > > However looking at the code, ipoib_neight_destructor only uses > &priv->lock, and that seems to get properly initialized in ipoib_setup() > using spin_lock_init(). > > So either there are other sites that instanciate those objects and > forget about the lock init, or the object is corrupted (use after free?) OK, thanks for the hint. So I added this: diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index f9dbc6f..2eea467 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -821,8 +821,15 @@ static void ipoib_neigh_destructor(struct neighbour *n) unsigned long flags; struct ipoib_ah *ah = NULL; + if (n->dev->type != ARPHRD_INFINIBAND) { + printk(KERN_ERR "ipoib_neigh_destructor lock %p wrong type %d !!!!!!!!!!\n", + &priv->lock, n->dev->type); + BUG_ON(n->dev->type != ARPHRD_INFINIBAND); + return; + } + ipoib_dbg(priv, "neigh_destructor for %06x " IPOIB_GID_FMT "\n", IPOIB_QPN(n->ha), IPOIB_GID_RAW_ARG(n->ha + 4)); And sure enough it triggers: [ 858.503010] ipoib_neigh_destructor lock c0687880 wrong type 772 !!!!!!!!!! [ 858.510036] ------------[ cut here ]------------ [ 858.514723] kernel BUG at drivers/infiniband/ulp/ipoib/ipoib_main.c:827! [ 858.521486] invalid opcode: 0000 [#1] [ 858.525212] SMP [ 858.527173] Modules linked in: rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ibv [ 858.538736] CPU: 0 [ 858.538737] EIP: 0060:[] Not tainted VLI [ 858.538738] EFLAGS: 00010206 (2.6.21-rc3-i686-dbg #4) [ 858.551755] EIP is at ipoib_neigh_destructor+0x40/0x178 [ib_ipoib] [ 858.557996] eax: c0687300 ebx: f240e880 ecx: c0223114 edx: c064f280 [ 858.564851] esi: f240e880 edi: f240e880 ebp: c0687880 esp: c06c7e9c [ 858.571702] ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 [ 858.577602] Process swapper (pid: 0, ti=c06c6000 task=c064f280 task.ti=c06c6000) [ 858.584883] Stack: f89a37be c0687880 00000304 c022af6e c064f280 00000000 00000000 0000 [ 858.593573] 00000000 c06a2554 00000000 c064f280 00000001 00000000 c064f280 0000 [ 858.602259] c0860be0 c2a1fba0 00000246 c06a2554 f240e880 00000000 f240e880 c04a [ 858.610946] Call Trace: [ 858.613723] [] run_timer_softirq+0x37/0x16b [ 858.618959] [] dst_run_gc+0x0/0x118 [ 858.623498] [] neigh_destroy+0xbe/0x104 [ 858.628382] [] dst_destroy+0x4d/0xab [ 858.632998] [] dst_run_gc+0x55/0x118 [ 858.637620] [] run_timer_softirq+0x108/0x16b [ 858.642934] [] __do_softirq+0x5a/0xd5 [ 858.647648] [] trace_hardirqs_on+0x106/0x141 [ 858.652970] [] __do_softirq+0x69/0xd5 [ 858.657677] [] do_softirq+0x37/0x4d [ 858.662210] [] smp_apic_timer_interrupt+0x6b/0x77 [ 858.667965] [] default_idle+0x3b/0x54 [ 858.672681] [] default_idle+0x3b/0x54 [ 858.677391] [] apic_timer_interrupt+0x33/0x38 [ 858.682796] [] default_idle+0x3b/0x54 [ 858.687505] [] default_idle+0x3d/0x54 [ 858.692211] [] cpu_idle+0xa2/0xbb [ 858.696569] [] start_kernel+0x40b/0x413 [ 858.701453] [] unknown_bootoption+0x0/0x205 [ 858.706678] ======================= [ 858.710321] Code: 66 83 f8 20 74 29 0f b7 c0 89 44 24 08 89 6c 24 04 c7 04 24 be 37 9a [ 858.730997] EIP: [] ipoib_neigh_destructor+0x40/0x178 [ib_ipoib] SS:ESP 0068c [ 858.740271] Kernel panic - not syncing: Fatal exception in interrupt Either something is corrupting neighbour dev pointer, or somehow a neighbour not related to infiniband is assigned ipoib_neigh_destructor. -- MST From mingo at elte.hu Sun Mar 11 11:48:19 2007 From: mingo at elte.hu (Ingo Molnar) Date: Sun, 11 Mar 2007 19:48:19 +0100 Subject: [ofa-general] Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup detected on CPU#0!) In-Reply-To: <20070311161231.GC13817@mellanox.co.il> References: <45E552FC.4040305@mellanox.co.il> <20070311135051.GA31985@mellanox.co.il> <1173626719.5182.10.camel@lappy> <20070311161231.GC13817@mellanox.co.il> Message-ID: <20070311184819.GA2567@elte.hu> * Michael S. Tsirkin wrote: > > So either there are other sites that instanciate those objects and > > forget about the lock init, or the object is corrupted (use after free?) > > OK, thanks for the hint. So I added this: > And sure enough it triggers: > > [ 858.503010] ipoib_neigh_destructor lock c0687880 wrong type 772 !!!!!!!!!! could you turn on CONFIG_SLAB_DEBUG as well? that should catch certain types of use-after-free accesses, and lockdep will also warn if a still locked object is freed. Ingo From mingo at elte.hu Sun Mar 11 12:19:20 2007 From: mingo at elte.hu (Ingo Molnar) Date: Sun, 11 Mar 2007 20:19:20 +0100 Subject: [ofa-general] Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!) In-Reply-To: <20070311150431.GE31985@mellanox.co.il> References: <45E552FC.4040305@mellanox.co.il> <20070311135051.GA31985@mellanox.co.il> <20070311150431.GE31985@mellanox.co.il> Message-ID: <20070311191920.GA7049@elte.hu> * Michael S. Tsirkin wrote: > After adding some printks, I started getting these: > > [ 597.036720] BUG: MAX_STACK_TRACE_ENTRIES too low! this should go away if you double the size of MAX_STACK_TRACE_ENTRIES in kernel/lockdep_internals.h. (keep it a power of two) If it doesnt go away then it might signal some sort of leak. Ingo From xma at us.ibm.com Sun Mar 11 15:27:29 2007 From: xma at us.ibm.com (Shirley Ma) Date: Sun, 11 Mar 2007 15:27:29 -0700 Subject: [ofa-general] Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup detected on CPU#0!) In-Reply-To: <20070311161231.GC13817@mellanox.co.il> Message-ID: >Either something is corrupting neighbour dev pointer, or somehow a neighbour not related to infiniband is assigned ipoib_neigh_destructor. Is that possible the original dev neighbour gone and a new neighbour created in the same location? Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From boris at mellanox.com Sun Mar 11 16:43:20 2007 From: boris at mellanox.com (Boris Shpolyansky) Date: Sun, 11 Mar 2007 16:43:20 -0700 Subject: [ofa-general] RE: uDAPL question In-Reply-To: <1E3DCD1C63492545881FACB6063A57C1D522CD@mtiexch01.mti.com> Message-ID: <1E3DCD1C63492545881FACB6063A57C1D522D1@mtiexch01.mti.com> On the other hand after reviewing source code it seems like DAT_CONNECTION_EVENT_BROKEN is returned in case of failure to establish connection - so it looks more like a CM issue. Any suggestion on how to debug this one ? Thanks, Boris. ________________________________ From: Boris Shpolyansky Sent: Friday, March 09, 2007 8:40 PM To: 'general at lists.openfabrics.org' Subject: uDAPL question Hi, I'm trying to get simple Intel MPI benchmark running over IB (uDAPL) using OFED-1.1 stack. I'm consistently getting the following error: [root at ibd005 ~]# ./runjob_I_MPI.boris 2 Task 0 of 2 tasks started on host ibd005.ibd.mti.com clock_resolution = 1.00e-06 s Task 1 of 2 tasks started on host ibd006.ibd.mti.com [0:ibd005] unexpected DAPL event 4006 from 1:ibd006 [1:ibd006] unexpected DAPL event 4006 from 0:ibd005 rank 0 in job 14 ibd005_36193 caused collective abort of all ranks exit status of rank 0: return code 254 I did some digging and found out that event 4006 (actually 0x4006) means DAT_CONNECTION_EVENT_BROKEN and it is returned by function dat_rmr_bind. So my question is why this function consistently fails. I'm using standard dat.conf file: OpenIB-cma u1.2 nonthreadsafe default /usr/local/ofed/lib64/libdaplcma.so mv_dapl.1.2 "ib0 0" "" Appreciate your help, Boris Shpolyansky -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Mon Mar 12 01:18:08 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 12 Mar 2007 10:18:08 +0200 Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/or VERBS In-Reply-To: <45EFCCEC.4010205@gmail.com> References: <45EFCCEC.4010205@gmail.com> Message-ID: <45F50CC0.2000506@mellanox.co.il> Moni Shoua wrote: > > ib-bonding is based on standard Linux bonding with some required changes to make it work with IPoIB. > What is the status of accepting the specific IB changes to Linux kernel? Tziporet From dotanb at dev.mellanox.co.il Mon Mar 12 01:30:29 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Mon, 12 Mar 2007 10:30:29 +0200 Subject: [ofa-general] [PATCH, RFC] libibverbs: Add hooks for rereg_mr, memory windows In-Reply-To: References: <1334.85.65.223.188.1172833964.squirrel@dev.mellanox.co.il> Message-ID: <45F50FA5.3070004@dev.mellanox.co.il> Roland Dreier wrote: > OK, I pushed out the patch below. Please let me know if you see any > problems caused by it, or if you think there may be a problem handling > rereg MR and/or MWs with this ABI. > > Dotan, do you have any further comments about the completion channel > closing issue? I would really like to freeze the libibverbs ABI as > soon as possible. > Later on this day i plan to send a patch about this issue (user level only change). if you will think that this fix is good enough and no kernel involvement is needed then we will keep this patch without any kernel level changes. thanks Dotan From mst at mellanox.co.il Mon Mar 12 01:35:23 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Mar 2007 10:35:23 +0200 Subject: [ofa-general] Re: Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com> <1172845045.21241.0.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net> <1172854585.21241.14.camel@stevo-desktop> <1172854873.21241.19.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net> <1172856154.21241.34.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039979EC@G3W0634.americas.hpqcorp.net> Message-ID: <20070312083523.GB4928@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: Re: Is ibv_get_async_event() a blocking call ? > > > Back to my orignal question, if I don't call > > ibv_get_async_event() for a long time, and there are a lot of events > > generated during the time, do I loss any event when I eventually call > > ibv_get_async_event() ? > > > > (another way, how many events can you queue ) ? > > Actually this is not handled very well right now. There is no limit > on the length of the queue so you can eventually use up all the memory > in the system if you never pick up asyc events. Should we add a size parameter for event channels? And, we might need to add "event channel overrun" flag as well. If we want to address the problem in this way, we need to do this before libibverbs 1.1 freezes I think. -- MST From mst at mellanox.co.il Mon Mar 12 01:55:15 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Mar 2007 10:55:15 +0200 Subject: [ofa-general] [Bug 418] IPoIB CM causes large message IPv4 multicast to fail (was OFED 1.2 beta blocking bugs) In-Reply-To: <45F1AD86.4050808@ichips.intel.com> References: <000201c761ce$71f2e710$ff0da8c0@amr.corp.intel.com> <45F094B7.7080406@ichips.intel.com> <1173437058.465.159160.camel@hal.voltaire.com> <45F195C1.8000402@ichips.intel.com> <45F1AD86.4050808@ichips.intel.com> Message-ID: <20070312085515.GA6663@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [ofa-general] bug 418: was OFED 1.2 beta blocking bugs > > > ib0: failed send event (status=1, wrid=35 vend_err 69) > > I believe that this is causing the QP to transition into the error state. > > > ib_mthca 0000:08:00.0: modify QP 3->3 returned status 10. > > The mthca status of 0x10 indicates a bad QP state. The transition from 3->3 is > RTS to RTS, but the QP is not in the RTS state, which makes sense given the > previous error. The other receive side errors in the bug report are a fallout > from not recovering from the send error. Errors on UD QP typically indicates a software problem. It seems we are posting packets that exceed the MTU size. But I do not see this problem here at the lab. How to reproduce this problem? > I don't know if this causes any problems, but at first glance it appears that > the IPoIB CM code begins listening for connection requests before the code has > had a chance to join the IPoIB broadcast group. This allows a connection to > form before the broadcast traffic is ready. Someone more familiar with the code > than I am will need to determine if this can lead to any undesirable race > conditions. I don't see why is this a problem - I don't need to be a member of a broadcast group to get incoming packets. -- MST From HNGUYEN at de.ibm.com Mon Mar 12 02:03:17 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Mon, 12 Mar 2007 10:03:17 +0100 Subject: [ofa-general] Re: RFC: pull from 2.6.21 In-Reply-To: <20070311130103.GC28400@mellanox.co.il> Message-ID: Hi, > > > I'm considering pulling git from linus 2.6.21-rc3 - this would > > > make development easier by cutting down the number of patches > > > we have to apply. > > > 3. ehca-xxx - these all seem to be applied > > Mostly, except for "ehca: Fix sync between completion handler and destroy > > cq", > > which is queued for rc4. > Is that ehca_3_fix_race_condition_locking_issues.patch? > Seems to be upstream already, just not in rc3. No, it's another previous one. According Vladimir it is: kernel_patches/fixes/ehca_6_fix_mismatched_sync.patch See also: http://lists.openfabrics.org/pipermail/ewg/2007-March/002853.html Regards Nam From vlad at lists.openfabrics.org Mon Mar 12 02:15:08 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Mon, 12 Mar 2007 02:15:08 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070312-0200 daily build status Message-ID: <20070312091508.7A491E603C6@openfabrics.org> This email was generated automatically, please do not reply Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.19 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.16 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.12 Passed on ia64 with linux-2.6.14 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.13 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.14 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ia64 with linux-2.6.16.21-0.8-default Failed: From dotanb at dev.mellanox.co.il Mon Mar 12 02:38:14 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Mon, 12 Mar 2007 11:38:14 +0200 Subject: [ofa-general] [PATCH - libibverbs] Added reference count to completion event channels Message-ID: <1173692295.17886.1.camel@mtldesk014.lab.mtl.com> Added reference count to completion event channels. Signed-off-by: Dotan Barak --- Index: gen2_devel_user/src/userspace/libibverbs/include/infiniband/verbs.h =================================================================== --- gen2_devel_user.orig/src/userspace/libibverbs/include/infiniband/verbs.h 2007-02-26 16:01:56.000000000 +0200 +++ gen2_devel_user/src/userspace/libibverbs/include/infiniband/verbs.h 2007-03-04 10:44:34.696598288 +0200 @@ -546,11 +546,14 @@ struct ibv_qp { }; struct ibv_comp_channel { + pthread_mutex_t mutex; + int refcnt; int fd; }; struct ibv_cq { struct ibv_context *context; + struct ibv_comp_channel *channel; void *cq_context; uint32_t handle; int cqe; Index: gen2_devel_user/src/userspace/libibverbs/src/verbs.c =================================================================== --- gen2_devel_user.orig/src/userspace/libibverbs/src/verbs.c 2007-02-26 16:01:56.000000000 +0200 +++ gen2_devel_user/src/userspace/libibverbs/src/verbs.c 2007-03-04 10:42:41.073871568 +0200 @@ -226,7 +226,9 @@ struct ibv_comp_channel *ibv_create_comp return NULL; } - channel->fd = resp.fd; + channel->refcnt = 0; + channel->fd = resp.fd; + pthread_mutex_init(&channel->mutex, NULL); return channel; } @@ -243,6 +245,12 @@ int ibv_destroy_comp_channel(struct ibv_ if (abi_ver <= 2) return ibv_destroy_comp_channel_v2(channel); + pthread_mutex_lock(&channel->mutex); + if (channel->refcnt) { + pthread_mutex_unlock(&channel->mutex); + return EBUSY; + } + pthread_mutex_unlock(&channel->mutex); close(channel->fd); free(channel); @@ -260,8 +268,14 @@ struct ibv_cq *__ibv_create_cq(struct ib cq->cq_context = cq_context; cq->comp_events_completed = 0; cq->async_events_completed = 0; + cq->channel = channel; pthread_mutex_init(&cq->mutex, NULL); pthread_cond_init(&cq->cond, NULL); + if (channel) { + pthread_mutex_lock(&channel->mutex); + channel->refcnt ++; + pthread_mutex_unlock(&channel->mutex); + } } return cq; @@ -279,7 +293,17 @@ default_symver(__ibv_resize_cq, ibv_resi int __ibv_destroy_cq(struct ibv_cq *cq) { - return cq->context->ops.destroy_cq(cq); + struct ibv_comp_channel *channel = cq->channel; + int ret; + + ret = cq->context->ops.destroy_cq(cq); + if (!ret && channel) { + pthread_mutex_lock(&channel->mutex); + channel->refcnt --; + pthread_mutex_unlock(&channel->mutex); + } + + return ret; } default_symver(__ibv_destroy_cq, ibv_destroy_cq); From dotanb at dev.mellanox.co.il Mon Mar 12 03:00:43 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Mon, 12 Mar 2007 12:00:43 +0200 Subject: [ofa-general] [PATCH V2 - libibverbs] Added reference count to completion event channels Message-ID: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> Added reference count to completion event channels. Signed-off-by: Dotan Barak --- Index: gen2_devel_user/src/userspace/libibverbs/include/infiniband/verbs.h =================================================================== --- gen2_devel_user.orig/src/userspace/libibverbs/include/infiniband/verbs.h 2007-02-26 16:01:56.000000000 +0200 +++ gen2_devel_user/src/userspace/libibverbs/include/infiniband/verbs.h 2007-03-04 10:44:34.696598288 +0200 @@ -546,11 +546,14 @@ struct ibv_qp { }; struct ibv_comp_channel { + pthread_mutex_t mutex; + int refcnt; int fd; }; struct ibv_cq { struct ibv_context *context; + struct ibv_comp_channel *channel; void *cq_context; uint32_t handle; int cqe; Index: gen2_devel_user/src/userspace/libibverbs/src/verbs.c =================================================================== --- gen2_devel_user.orig/src/userspace/libibverbs/src/verbs.c 2007-02-26 16:01:56.000000000 +0200 +++ gen2_devel_user/src/userspace/libibverbs/src/verbs.c 2007-03-04 10:42:41.073871568 +0200 @@ -226,7 +226,9 @@ struct ibv_comp_channel *ibv_create_comp return NULL; } - channel->fd = resp.fd; + channel->refcnt = 0; + channel->fd = resp.fd; + pthread_mutex_init(&channel->mutex, NULL); return channel; } @@ -243,6 +245,12 @@ int ibv_destroy_comp_channel(struct ibv_ if (abi_ver <= 2) return ibv_destroy_comp_channel_v2(channel); + pthread_mutex_lock(&channel->mutex); + if (channel->refcnt) { + pthread_mutex_unlock(&channel->mutex); + return EBUSY; + } + pthread_mutex_unlock(&channel->mutex); close(channel->fd); free(channel); @@ -260,8 +268,14 @@ struct ibv_cq *__ibv_create_cq(struct ib cq->cq_context = cq_context; cq->comp_events_completed = 0; cq->async_events_completed = 0; + cq->channel = channel; pthread_mutex_init(&cq->mutex, NULL); pthread_cond_init(&cq->cond, NULL); + if (channel) { + pthread_mutex_lock(&channel->mutex); + channel->refcnt++; + pthread_mutex_unlock(&channel->mutex); + } } return cq; @@ -279,7 +293,17 @@ default_symver(__ibv_resize_cq, ibv_resi int __ibv_destroy_cq(struct ibv_cq *cq) { - return cq->context->ops.destroy_cq(cq); + struct ibv_comp_channel *channel = cq->channel; + int ret; + + ret = cq->context->ops.destroy_cq(cq); + if (!ret && channel) { + pthread_mutex_lock(&channel->mutex); + channel->refcnt--; + pthread_mutex_unlock(&channel->mutex); + } + + return ret; } default_symver(__ibv_destroy_cq, ibv_destroy_cq); From mst at mellanox.co.il Mon Mar 12 06:29:30 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Mar 2007 15:29:30 +0200 Subject: [ofa-general] Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup detected on CPU#0!) In-Reply-To: <20070311184819.GA2567@elte.hu> References: <45E552FC.4040305@mellanox.co.il> <20070311135051.GA31985@mellanox.co.il> <1173626719.5182.10.camel@lappy> <20070311161231.GC13817@mellanox.co.il> <20070311184819.GA2567@elte.hu> Message-ID: <20070312132826.GB20549@mellanox.co.il> > Quoting Ingo Molnar : > Subject: Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup detected on CPU#0!) > > > * Michael S. Tsirkin wrote: > > > > So either there are other sites that instanciate those objects and > > > forget about the lock init, or the object is corrupted (use after free?) > > > > OK, thanks for the hint. So I added this: > > > And sure enough it triggers: > > > > [ 858.503010] ipoib_neigh_destructor lock c0687880 wrong type 772 !!!!!!!!!! > > could you turn on CONFIG_SLAB_DEBUG as well? > > that should catch certain types of use-after-free accesses, and lockdep > will also warn if a still locked object is freed. Hmm, no, this does not look like use-after-free. I enabled CONFIG_SLAB_DEBUG, and I still see the same message, so the memory was not overwritten by slab debugger. -- MST From mingo at elte.hu Mon Mar 12 06:51:12 2007 From: mingo at elte.hu (Ingo Molnar) Date: Mon, 12 Mar 2007 14:51:12 +0100 Subject: [ofa-general] Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup detected on CPU#0!) In-Reply-To: <20070312132826.GB20549@mellanox.co.il> References: <45E552FC.4040305@mellanox.co.il> <20070311135051.GA31985@mellanox.co.il> <1173626719.5182.10.camel@lappy> <20070311161231.GC13817@mellanox.co.il> <20070311184819.GA2567@elte.hu> <20070312132826.GB20549@mellanox.co.il> Message-ID: <20070312135112.GA18158@elte.hu> * Michael S. Tsirkin wrote: > > could you turn on CONFIG_SLAB_DEBUG as well? > > > > that should catch certain types of use-after-free accesses, and > > lockdep will also warn if a still locked object is freed. > > Hmm, no, this does not look like use-after-free. I enabled > CONFIG_SLAB_DEBUG, and I still see the same message, so the memory was > not overwritten by slab debugger. that's still not conclusive - the memory might not have been allocated by slab again to detect it. Your magic-number check definitely shows some sort of corruption going on, right? Ingo From monisonlists at gmail.com Mon Mar 12 06:53:50 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Mon, 12 Mar 2007 15:53:50 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v3] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <20070311154042.GJ31985@mellanox.co.il> References: <45F41FFD.102@voltaire.com> <20070311154042.GJ31985@mellanox.co.il> Message-ID: <45F55B6E.7010004@gmail.com> Michael S. Tsirkin wrote: >> With the current solution it is still unsafe to unload ib_ipoib before bonding but >> I tend to agree with Michael and my opinion now is that this should be fixed in the >> upper layer (i.e. bonding). > > This looks simple. Will we get to see the patch to core bonding code as well > soon? I guess bonding support in OFED will need to be patched somehow? > Sure. I am working on it now (making ib_ipoib safer to remove before bonding does) and I would be happy to share when I'm done. However, I think that this is a separate issue and should not prevent from this patch to get in. Don't you agree? BTW: It is often claimed to module unload is unsafe by definition and is not a production issue. Doesn't this claim make the unload issue "less critical"? From mst at mellanox.co.il Mon Mar 12 07:13:09 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Mar 2007 16:13:09 +0200 Subject: [ofa-general] Re: Re: [RFC] [PATCH v3] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45F55B6E.7010004@gmail.com> References: <45F41FFD.102@voltaire.com> <20070311154042.GJ31985@mellanox.co.il> <45F55B6E.7010004@gmail.com> Message-ID: <20070312141309.GB21643@mellanox.co.il> > Quoting Moni Shoua : > Subject: Re: Re: [RFC] [PATCH v3] IB/ipoib: Add bonding support to?IPoIB > > Michael S. Tsirkin wrote: > >> With the current solution it is still unsafe to unload ib_ipoib before bonding but > >> I tend to agree with Michael and my opinion now is that this should be fixed in the > >> upper layer (i.e. bonding). > > > > This looks simple. Will we get to see the patch to core bonding code as well > > soon? I guess bonding support in OFED will need to be patched somehow? > > > Sure. I am working on it now (making ib_ipoib safer to remove before bonding > does) and I would be happy to share when I'm > done. However, I think that this is a separate issue and should not prevent > from this patch to get in. Don't you agree? There's no rush I guess - Roland's on vacation so we have time to review how everything works together. > BTW: It is often claimed to module unload is unsafe by definition and is not > a production issue. Doesn't this claim make the unload issue "less critical"? I know our users depend on module unload being stable. Module unload remains the simplest way to test hotplug, so bugs there might hide real issues. Where does the claim that module unload is unsafe by definition come from? Weren't the races solved in 2.6 with the new in-kernel loader? -- MST From mst at mellanox.co.il Mon Mar 12 07:20:13 2007 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Mar 2007 16:20:13 +0200 Subject: [ofa-general] Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup detected on CPU#0!) In-Reply-To: <20070312135112.GA18158@elte.hu> References: <45E552FC.4040305@mellanox.co.il> <20070311135051.GA31985@mellanox.co.il> <1173626719.5182.10.camel@lappy> <20070311161231.GC13817@mellanox.co.il> <20070311184819.GA2567@elte.hu> <20070312132826.GB20549@mellanox.co.il> <20070312135112.GA18158@elte.hu> Message-ID: <20070312142013.GC21643@mellanox.co.il> > Quoting Ingo Molnar : > Subject: Re: lockdep question (was Re: IPoIB caused a kernel: BUG: softlockup detected on CPU#0!) > > > * Michael S. Tsirkin wrote: > > > > could you turn on CONFIG_SLAB_DEBUG as well? > > > > > > that should catch certain types of use-after-free accesses, and > > > lockdep will also warn if a still locked object is freed. > > > > Hmm, no, this does not look like use-after-free. I enabled > > CONFIG_SLAB_DEBUG, and I still see the same message, so the memory was > > not overwritten by slab debugger. > > that's still not conclusive - the memory might not have been allocated > by slab again to detect it. Your magic-number check definitely shows > some sort of corruption going on, right? Not necessarily in such a direct way. I currently think we are somehow getting neighbours where neigh->dev points to a loopback device - that's type 772, and this seems to make sense. I printed out the device name and sure enough it is "lo". Is it true that sticking the following static int ipoib_neigh_setup_dev(struct net_device *dev, struct neigh_parms *parms) { parms->neigh_destructor = ipoib_neigh_destructor; return 0; } in dev->neigh_setup, as ipoib does, guarantees that neighbour->dev will point to the current device for any neighbour which ipoib_neigh_destructor gets? That's the assumption IPoIB makes, and it seems broken in this instance. How could that be? -- MST From monisonlists at gmail.com Mon Mar 12 07:48:40 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Mon, 12 Mar 2007 16:48:40 +0200 Subject: [ofa-general] infiniband bonding/merging/aggregation with SDP and/or VERBS In-Reply-To: <45F50CC0.2000506@mellanox.co.il> References: <45EFCCEC.4010205@gmail.com> <45F50CC0.2000506@mellanox.co.il> Message-ID: <45F56848.5020008@gmail.com> Tziporet Koren wrote: > Moni Shoua wrote: >> >> ib-bonding is based on standard Linux bonding with some required >> changes to make it work with IPoIB. >> > > What is the status of accepting the specific IB changes to Linux kernel? > > Tziporet > I've sent a new patch yesterday with a comment about module unload being unsafe under specific scenarios. Michael's opinion is that we should fix this issue first but I think that there is a place to push the current patch now and wait for the other fix to come later (which is probably not in IB or at least not just IB). From mst at dev.mellanox.co.il Mon Mar 12 08:03:55 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Mar 2007 17:03:55 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v3] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45F41FFD.102@voltaire.com> References: <45F41FFD.102@voltaire.com> Message-ID: <20070312150355.GB28157@mellanox.co.il> > Quoting Moni Shoua : > To fix it, this patch adds a dev field to struct ipoib_neigh which is used > instead of the struct neighbour dev one. > > Signed-off-by: Moni Shoua > Signed-off-by: Or Gerlitz > --- > > ipoib.h | 4 +++- > ipoib_main.c | 26 +++++++++++++------------- > ipoib_multicast.c | 2 +- > 3 files changed, 17 insertions(+), 15 deletions(-) > Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib.h > =================================================================== > --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-01-25 11:05:32.000000000 +0200 > +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-04 19:32:55.000000000 +0200 > @@ -216,6 +216,7 @@ struct ipoib_neigh { > struct sk_buff_head queue; > > struct neighbour *neighbour; > + struct net_device *dev; > > struct list_head list; > }; > @@ -232,7 +233,8 @@ static inline struct ipoib_neigh **to_ip > INFINIBAND_ALEN, sizeof(void *)); > } > > -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); > +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, > + struct net_device *dev); > void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); > > extern struct workqueue_struct *ipoib_workqueue; > Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c > =================================================================== > --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-01-25 11:05:32.000000000 +0200 > +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-04 19:32:55.000000000 +0200 > @@ -248,7 +248,6 @@ static void path_free(struct net_device > struct ipoib_neigh *neigh, *tn; > struct sk_buff *skb; > unsigned long flags; > - > while ((skb = __skb_dequeue(&path->queue))) > dev_kfree_skb_irq(skb); > > @@ -490,7 +489,7 @@ static void neigh_add_path(struct sk_buf > struct ipoib_path *path; > struct ipoib_neigh *neigh; > > - neigh = ipoib_neigh_alloc(skb->dst->neighbour); > + neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); > if (!neigh) { > ++priv->stats.tx_dropped; > dev_kfree_skb_any(skb); > @@ -769,32 +768,32 @@ static void ipoib_set_mcast_list(struct > static void ipoib_neigh_destructor(struct neighbour *n) > { > struct ipoib_neigh *neigh; > - struct ipoib_dev_priv *priv = netdev_priv(n->dev); > + struct ipoib_dev_priv *priv; > unsigned long flags; > struct ipoib_ah *ah = NULL; > > - ipoib_dbg(priv, > - "neigh_destructor for %06x " IPOIB_GID_FMT "\n", > - IPOIB_QPN(n->ha), > - IPOIB_GID_RAW_ARG(n->ha + 4)); > - > - spin_lock_irqsave(&priv->lock, flags); > > neigh = *to_ipoib_neigh(n); > if (neigh) { > + priv = netdev_priv(neigh->dev); > + ipoib_dbg(priv, > + "neigh_destructor for %06x " IPOIB_GID_FMT "\n", > + IPOIB_QPN(n->ha), > + IPOIB_GID_RAW_ARG(n->ha + 4)); > + > + spin_lock_irqsave(&priv->lock, flags); > if (neigh->ah) > ah = neigh->ah; > list_del(&neigh->list); > ipoib_neigh_free(n->dev, neigh); > + spin_unlock_irqrestore(&priv->lock, flags); > } > - > - spin_unlock_irqrestore(&priv->lock, flags); > - > if (ah) > ipoib_put_ah(ah); > } > > -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) > +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, > + struct net_device *dev) > { > struct ipoib_neigh *neigh; > > @@ -803,6 +802,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st > return NULL; > > neigh->neighbour = neighbour; > + neigh->dev = dev; > *to_ipoib_neigh(neighbour) = neigh; > skb_queue_head_init(&neigh->queue); > > Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > =================================================================== > --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-01-25 11:05:32.000000000 +0200 > +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-04 12:21:46.000000000 +0200 > @@ -774,7 +774,7 @@ out: > if (skb->dst && > skb->dst->neighbour && > !*to_ipoib_neigh(skb->dst->neighbour)) { > - struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour); > + struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); > > if (neigh) { > kref_get(&mcast->ah->ref); I'm re-reading this, and old archives. Was the following problem addressed? # It seems that in this design, if multiple ipoib interfaces are present, we might # get an skb such that skb->dev will be different from the new dev field in struct # ipoib_neigh. # # ipoib_neigh ah field includes struct ib_ah *. # This selects important parameters which depend on both packet source and # destination interfaces. # # It seems that the result will be that the packet will be sent on a wrong interface. # Right? # # I think the right thing might be to compare ipoib_neigh dev and # skb->dev, and destroy ipoib_neigh if these do not match. # # However, this will affect performance negatively if this happens a lot. # Need to understand the usage model for the bonding driver and whether # there is some "locality" here. -- MST From mst at dev.mellanox.co.il Mon Mar 12 07:57:16 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Mar 2007 16:57:16 +0200 Subject: [ofa-general] Re: infiniband bonding/merging/aggregation with SDP and/or VERBS In-Reply-To: <45F56848.5020008@gmail.com> References: <45EFCCEC.4010205@gmail.com> <45F50CC0.2000506@mellanox.co.il> <45F56848.5020008@gmail.com> Message-ID: <20070312145716.GA28157@mellanox.co.il> > Quoting Moni Shoua : > Subject: Re: infiniband bonding/merging/aggregation with SDP and/or?VERBS > > Tziporet Koren wrote: > > Moni Shoua wrote: > > > > > > ib-bonding is based on standard Linux bonding with some required > > > changes to make it work with IPoIB. > > > > What is the status of accepting the specific IB changes to Linux kernel? > > I've sent a new patch yesterday with a comment about module unload being > unsafe under specific scenarios. > Michael's opinion is that we should fix this issue first but I think that > there is a place to push the current patch now and wait for the other fix to > come later (which is probably not in IB or at least not just IB). I don't think this summarizes my opinion correctly. I just would like to see all the patches together - I have a feeling caching the device in ipoib->dev is problematic and module unloading issues are just a symptom pointing to deeper issues. -- MST From monisonlists at gmail.com Mon Mar 12 08:13:04 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Mon, 12 Mar 2007 17:13:04 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v3] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <20070312141309.GB21643@mellanox.co.il> References: <45F41FFD.102@voltaire.com> <20070311154042.GJ31985@mellanox.co.il> <45F55B6E.7010004@gmail.com> <20070312141309.GB21643@mellanox.co.il> Message-ID: <45F56E00.9000701@gmail.com> Michael S. Tsirkin wrote: >> Quoting Moni Shoua : >> Subject: Re: Re: [RFC] [PATCH v3] IB/ipoib: Add bonding support to?IPoIB >> >> Michael S. Tsirkin wrote: >>>> With the current solution it is still unsafe to unload ib_ipoib before bonding but >>>> I tend to agree with Michael and my opinion now is that this should be fixed in the >>>> upper layer (i.e. bonding). >>> This looks simple. Will we get to see the patch to core bonding code as well >>> soon? I guess bonding support in OFED will need to be patched somehow? >>> >> Sure. I am working on it now (making ib_ipoib safer to remove before bonding >> does) and I would be happy to share when I'm >> done. However, I think that this is a separate issue and should not prevent >> from this patch to get in. Don't you agree? > > There's no rush I guess - Roland's on vacation so we have time to > review how everything works together. > >> BTW: It is often claimed to module unload is unsafe by definition and is not >> a production issue. Doesn't this claim make the unload issue "less critical"? > > I know our users depend on module unload being stable. > Module unload remains the simplest way to test hotplug, > so bugs there might hide real issues. > ib_ipoib can still be unloaded even when bonding is used (it is just that bonding should be removed first). About hotplug: it is still working well and correct me if I'm wrong but my way to test hotplug is to unload ib_mthca which is not affected by the presence of bonding. As I said, I would be happy to share the work on bonding and get reviews about it but I still think that this patch alone can be a first step. > Where does the claim that module unload is unsafe by definition > come from? Weren't the races solved in 2.6 with the new in-kernel > loader? > Well, what happens with bonding and IPoIB speaks for itself, doesn't it? If I can unload a module that is being referenced from the outside then I am not protected by the kernel from doing something wrong. From robert.j.woodruff at intel.com Mon Mar 12 08:23:31 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Mon, 12 Mar 2007 08:23:31 -0700 Subject: [ofa-general] uDAPL question In-Reply-To: <1E3DCD1C63492545881FACB6063A57C1D522CD@mtiexch01.mti.com> Message-ID: This is a known problem and should be fixed by now, There was a bad patch that somehow got into OFED that was not in Sean main tree. Assuming this bad patch has been removed, the problem should be fixed. woody ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Boris Shpolyansky Sent: Friday, March 09, 2007 8:40 PM To: general at lists.openfabrics.org Subject: [ofa-general] uDAPL question Hi, I'm trying to get simple Intel MPI benchmark running over IB (uDAPL) using OFED-1.1 stack. I'm consistently getting the following error: [root at ibd005 ~]# ./runjob_I_MPI.boris 2 Task 0 of 2 tasks started on host ibd005.ibd.mti.com clock_resolution = 1.00e-06 s Task 1 of 2 tasks started on host ibd006.ibd.mti.com [0:ibd005] unexpected DAPL event 4006 from 1:ibd006 [1:ibd006] unexpected DAPL event 4006 from 0:ibd005 rank 0 in job 14 ibd005_36193 caused collective abort of all ranks exit status of rank 0: return code 254 I did some digging and found out that event 4006 (actually 0x4006) means DAT_CONNECTION_EVENT_BROKEN and it is returned by function dat_rmr_bind. So my question is why this function consistently fails. I'm using standard dat.conf file: OpenIB-cma u1.2 nonthreadsafe default /usr/local/ofed/lib64/libdaplcma.so mv_dapl.1.2 "ib0 0" "" Appreciate your help, Boris Shpolyansky From boris at mellanox.com Mon Mar 12 08:28:28 2007 From: boris at mellanox.com (Boris Shpolyansky) Date: Mon, 12 Mar 2007 08:28:28 -0700 Subject: [ofa-general] uDAPL question In-Reply-To: Message-ID: <1E3DCD1C63492545881FACB6063A57C1D522D5@mtiexch01.mti.com> Hi Woody, Thanks for your help. I guess the problem is in the CM - is it ? Can you point me to relevant communication/bug reports that explain the fix for this issue ? Would Sean be the right person to ask regarding what exact patch should be added/removed ? I would prefer to stick to OFED-1.1 code with minimal changes - if possible - to avoid compatibility issues. Thanks, Boris -----Original Message----- From: Woodruff, Robert J [mailto:robert.j.woodruff at intel.com] Sent: Monday, March 12, 2007 8:24 AM To: Boris Shpolyansky; general at lists.openfabrics.org; Hefty, Sean Subject: RE: [ofa-general] uDAPL question This is a known problem and should be fixed by now, There was a bad patch that somehow got into OFED that was not in Sean main tree. Assuming this bad patch has been removed, the problem should be fixed. woody ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Boris Shpolyansky Sent: Friday, March 09, 2007 8:40 PM To: general at lists.openfabrics.org Subject: [ofa-general] uDAPL question Hi, I'm trying to get simple Intel MPI benchmark running over IB (uDAPL) using OFED-1.1 stack. I'm consistently getting the following error: [root at ibd005 ~]# ./runjob_I_MPI.boris 2 Task 0 of 2 tasks started on host ibd005.ibd.mti.com clock_resolution = 1.00e-06 s Task 1 of 2 tasks started on host ibd006.ibd.mti.com [0:ibd005] unexpected DAPL event 4006 from 1:ibd006 [1:ibd006] unexpected DAPL event 4006 from 0:ibd005 rank 0 in job 14 ibd005_36193 caused collective abort of all ranks exit status of rank 0: return code 254 I did some digging and found out that event 4006 (actually 0x4006) means DAT_CONNECTION_EVENT_BROKEN and it is returned by function dat_rmr_bind. So my question is why this function consistently fails. I'm using standard dat.conf file: OpenIB-cma u1.2 nonthreadsafe default /usr/local/ofed/lib64/libdaplcma.so mv_dapl.1.2 "ib0 0" "" Appreciate your help, Boris Shpolyansky From mst at dev.mellanox.co.il Mon Mar 12 08:32:51 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Mar 2007 17:32:51 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v3] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45F56E00.9000701@gmail.com> References: <45F41FFD.102@voltaire.com> <20070311154042.GJ31985@mellanox.co.il> <45F55B6E.7010004@gmail.com> <20070312141309.GB21643@mellanox.co.il> <45F56E00.9000701@gmail.com> Message-ID: <20070312153251.GC28157@mellanox.co.il> > Quoting Moni Shoua : > > > Where does the claim that module unload is unsafe by definition > > come from? Weren't the races solved in 2.6 with the new in-kernel > > loader? > > Well, what happens with bonding and IPoIB speaks for itself, doesn't it? Not to my eyes. > If I can unload a module that is being referenced from the outside This just means there's a bug in the specific module(s). In this case it is either ipoib or bonding module (or both). > then I am not protected by the kernel from doing something wrong. Yes, this is really fundamental in kernel programming. That's what makes it interesting. -- MST From mst at dev.mellanox.co.il Mon Mar 12 08:47:34 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Mar 2007 17:47:34 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v3] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45F41FFD.102@voltaire.com> References: <45F41FFD.102@voltaire.com> Message-ID: <20070312154734.GD28157@mellanox.co.il> Another question. > @@ -769,32 +768,32 @@ static void ipoib_set_mcast_list(struct > static void ipoib_neigh_destructor(struct neighbour *n) > { > struct ipoib_neigh *neigh; > - struct ipoib_dev_priv *priv = netdev_priv(n->dev); > + struct ipoib_dev_priv *priv; > unsigned long flags; > struct ipoib_ah *ah = NULL; > > - ipoib_dbg(priv, > - "neigh_destructor for %06x " IPOIB_GID_FMT "\n", > - IPOIB_QPN(n->ha), > - IPOIB_GID_RAW_ARG(n->ha + 4)); > - > - spin_lock_irqsave(&priv->lock, flags); > > neigh = *to_ipoib_neigh(n); > if (neigh) { > + priv = netdev_priv(neigh->dev); > + ipoib_dbg(priv, > + "neigh_destructor for %06x " IPOIB_GID_FMT "\n", > + IPOIB_QPN(n->ha), > + IPOIB_GID_RAW_ARG(n->ha + 4)); > + > + spin_lock_irqsave(&priv->lock, flags); > if (neigh->ah) > ah = neigh->ah; > list_del(&neigh->list); > ipoib_neigh_free(n->dev, neigh); > + spin_unlock_irqrestore(&priv->lock, flags); > } > - > - spin_unlock_irqrestore(&priv->lock, flags); > - > if (ah) > ipoib_put_ah(ah); > } Using to_ipoib_neigh outside priv->lock looks problematic. Can you convince me this does not introduce new races? -- MST From halr at voltaire.com Mon Mar 12 09:47:43 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Mar 2007 11:47:43 -0500 Subject: [ofa-general] Re: [PATCH - libibumad] Added release_ca in error flow to prevent resource leak In-Reply-To: <1173620535.11125.3.camel@mtldesk014.lab.mtl.com> References: <1173620535.11125.3.camel@mtldesk014.lab.mtl.com> Message-ID: <1173718062.5995.7397.camel@hal.voltaire.com> On Sun, 2007-03-11 at 08:42, Dotan Barak wrote: > Added release_ca in error flow to prevent resource leak. > > Signed-off-by: Dotan Barak Thanks. Applied (to both trunk and ofed_1_2). -- Hal From halr at voltaire.com Mon Mar 12 09:47:55 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Mar 2007 11:47:55 -0500 Subject: [ofa-general] Re: [PATCH 0/4] opensm: more routing optimizations In-Reply-To: <11733615061517-git-send-email-sashak@voltaire.com> References: <11733615061517-git-send-email-sashak@voltaire.com> Message-ID: <1173718068.5995.7401.camel@hal.voltaire.com> On Thu, 2007-03-08 at 08:45, Sasha Khapyorsky wrote: > Mostly it implements the "switch only" optimization idea and affects > the min hops matrices building phase (for both up/down and default > builders). > > The main trick is to keep the min hop tables _ONLY_ for switches base > LIDs and don't bother with CAs, routers LIDs and any secondary LIDs in > case when LMC > 0 - this saves a lot of memory and cpu cycles needed > for calculation and storing the huge whole fabric matrices. > > For CA and router ports we will refer its neighbor switch's min hop > vectors. And for LMC > 0 case we will use base LID's min hop vectors > for any secondary LIDs (for CAs and routers it will neighbor switch's > base LID). > > Preliminary testing shows 3-4x speedup in the min-hop generation phase > and yet another 2x for up/down. More nice work! Thanks, Sasha. -- Hal > Sasha From halr at voltaire.com Mon Mar 12 09:50:39 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Mar 2007 11:50:39 -0500 Subject: [ofa-general] Re: [PATCH 1/4] opensm: use only switches min hop vectors In-Reply-To: <11733615092932-git-send-email-sashak@voltaire.com> References: <11733615061517-git-send-email-sashak@voltaire.com> <11733615092932-git-send-email-sashak@voltaire.com> Message-ID: <1173718071.5995.7403.camel@hal.voltaire.com> On Thu, 2007-03-08 at 08:45, Sasha Khapyorsky wrote: > Use only switch base LIDs min hop vectors in the best port lookup > routines - osm_switch_recommend*_path(). > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only (at least for now). -- Hal From halr at voltaire.com Mon Mar 12 09:50:47 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Mar 2007 11:50:47 -0500 Subject: [ofa-general] Re: [PATCH 3/4] opensm: build min hop tables only for switches base LIDs In-Reply-To: <11733615163275-git-send-email-sashak@voltaire.com> References: <11733615061517-git-send-email-sashak@voltaire.com> <11733615163275-git-send-email-sashak@voltaire.com> Message-ID: <1173718076.5995.7407.camel@hal.voltaire.com> On Thu, 2007-03-08 at 08:45, Sasha Khapyorsky wrote: > up/down and default min hop builder will calculate min hop matrices > only for switches base LIDs - don't bother with CA or router ports and > LMC. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only (at least for now). -- Hal From halr at voltaire.com Mon Mar 12 09:50:43 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Mar 2007 11:50:43 -0500 Subject: [ofa-general] Re: [PATCH 2/4] opensm: mcast_mgr uses osm_switch_get_port_least_hops() In-Reply-To: <11733615123337-git-send-email-sashak@voltaire.com> References: <11733615061517-git-send-email-sashak@voltaire.com> <11733615123337-git-send-email-sashak@voltaire.com> Message-ID: <1173718073.5995.7405.camel@hal.voltaire.com> On Thu, 2007-03-08 at 08:45, Sasha Khapyorsky wrote: > Instead of the direct accessing min hop tables mcast_mgr uses function > osm_switch_get_port_least_hops() which will evaluate only switches' base > LID min hop vectors. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only (at least for now). -- Hal From halr at voltaire.com Mon Mar 12 09:50:50 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Mar 2007 11:50:50 -0500 Subject: [ofa-general] Re: [PATCH 4/4] opensm: dump functions adoption In-Reply-To: <11733615191040-git-send-email-sashak@voltaire.com> References: <11733615061517-git-send-email-sashak@voltaire.com> <11733615191040-git-send-email-sashak@voltaire.com> Message-ID: <1173718078.5995.7409.camel@hal.voltaire.com> On Thu, 2007-03-08 at 08:45, Sasha Khapyorsky wrote: > This adopts routing dump functions to work properly with reduced min hop > tables. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only (at least for now). -- Hal From mshefty at ichips.intel.com Mon Mar 12 09:11:37 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 12 Mar 2007 09:11:37 -0700 Subject: [ofa-general] RFC: pull from 2.6.21 In-Reply-To: <20070311091253.GF6858@mellanox.co.il> References: <20070311091253.GF6858@mellanox.co.il> Message-ID: <45F57BB9.3020209@ichips.intel.com> > 1. merged_sean_rdma_dev_ofed_1_2.patch - I think all multicast bits > are merged in 2.6.21-rc3 so we only have to take code from local_sa > branch now. Right? Correct - though I would need to updated my branches to 2.6.21-rc3 first, which I will do today. - Sean From bugzilla-daemon at lists.openfabrics.org Mon Mar 12 10:24:55 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 12 Mar 2007 10:24:55 -0700 (PDT) Subject: [ofa-general] [Bug 447] New: ib_ipoib kernel 2.6.9-34 panic when routing to 10G ethernet Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=447 Summary: ib_ipoib kernel 2.6.9-34 panic when routing to 10G ethernet Product: OpenFabrics Linux Version: 1.1 Platform: X86-64 OS/Version: RHEL 4 Status: NEW Severity: blocker Priority: P1 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: DarylGrunau at gmail.com We're experiencing kernel panics of the following ilk on our I/O nodes used as routers between our IB fabric and 10GE network (providing service to a Panasas filesystem). The panic can be triggered by simply mounting the Panasas filesystem via the I/O node - some time later (as soon as 1 minute, and sometimes overnight/weekend) the node panics. Using the compute-node-mounted filesystem accelerates the timetable. Kernel BUG at dev:1121 invalid operand: 0000 [1] SMP CPU 7 Modules linked in: myri10ge(U) ib_ipoib ib_mthca ib_uverbs ib_umad ib_ucm ib_sa ib_cm ib_mad ib_core bluesmoke_k8 bluesmoke_mc perfctr ipmi_devintf ipmi_si ipmi_msghandler bnx2 ext3 jbd nfs lockd nfs_acl sunrpc Pid: 0, comm: swapper Not tainted 2.6.9-34.ELsmp.lanl RIP: 0010:[] {__skb_linearize+62} RSP: 0018:00000102270efcf8 EFLAGS: 00010203 RAX: 0000000000000001 RBX: 000000000000001c RCX: 000001061fef7680 RDX: 00000000ffffffdc RSI: 0000000000000220 RDI: 000001061fef7600 RBP: 000001021fedabc0 R08: 0000000000000000 R09: 000000000000003c R10: 0000000000000000 R11: 0000000000000000 R12: 000001021f459a80 R13: 0000000000000000 R14: 000001081d741000 R15: 0000000000000000 FS: 0000002a95ac76e0(0000) GS:ffffffff804d8600(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000429ff0 CR3: 00000000dfcae000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo 000001081ff38000, task 0000010220035030) Stack: 000001081d741000 000001081d741000 00000000fffffff4 000001021f459a80 0000000000000000 ffffffff802ab133 000001021fedabc0 000001021f459ac0 000001021fedabc0 ffffffff802b01c8 Call Trace: {dev_queue_xmit+93} {neigh_resolve_output+578} {neigh_update+626} {arp_process+1257} {:ib_ipoib:ipoib_ib_completion+936} {netif_receive_skb+590} {process_backlog+136} {net_rx_action+129} {__do_softirq+88} {do_softirq+49} {do_IRQ+328} {ret_from_intr+0} {default_idle+0} {default_idle+32} {cpu_idle+26} Code: 0f 0b 41 ee 31 80 ff ff ff ff 61 04 85 d2 b8 00 00 00 00 0f RIP {__skb_linearize+62} RSP <00000102270efcf8> <0>Kernel panic - not syncing: Oops ---------------- Our HCA hardware/firmware is: -bash-3.00# ./ibv_devinfo hca_id: mthca0 fw_ver: 5.1.937 node_guid: 0002:c902:0023:85cc sys_image_guid: 0002:c902:0023:85cf vendor_id: 0x02c9 vendor_part_id: 25218 hw_ver: 0xA0 board_id: MT_0370110001 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 1 port_lid: 69 port_lmc: 0x00 port: 2 state: PORT_DOWN (1) max_mtu: 2048 (4) active_mtu: 512 (2) sm_lid: 0 port_lid: 0 port_lmc: 0x00 -bash-3.00# lspci -vvv [[ snip ]] 41:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (rev a0) Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- https://bugs.openfabrics.org/show_bug.cgi?id=447 Summary: ib_ipoib kernel 2.6.9-34 panic when routing to 10G ethernet Product: OpenFabrics Linux Version: 1.1 Platform: X86-64 OS/Version: RHEL 4 Status: NEW Severity: blocker Priority: P1 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: DarylGrunau at gmail.com We're experiencing kernel panics of the following ilk on our I/O nodes used as routers between our IB fabric and 10GE network (providing service to a Panasas filesystem). The panic can be triggered by simply mounting the Panasas filesystem via the I/O node - some time later (as soon as 1 minute, and sometimes overnight/weekend) the node panics. Using the compute-node-mounted filesystem accelerates the timetable. Kernel BUG at dev:1121 invalid operand: 0000 [1] SMP CPU 7 Modules linked in: myri10ge(U) ib_ipoib ib_mthca ib_uverbs ib_umad ib_ucm ib_sa ib_cm ib_mad ib_core bluesmoke_k8 bluesmoke_mc perfctr ipmi_devintf ipmi_si ipmi_msghandler bnx2 ext3 jbd nfs lockd nfs_acl sunrpc Pid: 0, comm: swapper Not tainted 2.6.9-34.ELsmp.lanl RIP: 0010:[] {__skb_linearize+62} RSP: 0018:00000102270efcf8 EFLAGS: 00010203 RAX: 0000000000000001 RBX: 000000000000001c RCX: 000001061fef7680 RDX: 00000000ffffffdc RSI: 0000000000000220 RDI: 000001061fef7600 RBP: 000001021fedabc0 R08: 0000000000000000 R09: 000000000000003c R10: 0000000000000000 R11: 0000000000000000 R12: 000001021f459a80 R13: 0000000000000000 R14: 000001081d741000 R15: 0000000000000000 FS: 0000002a95ac76e0(0000) GS:ffffffff804d8600(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000429ff0 CR3: 00000000dfcae000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo 000001081ff38000, task 0000010220035030) Stack: 000001081d741000 000001081d741000 00000000fffffff4 000001021f459a80 0000000000000000 ffffffff802ab133 000001021fedabc0 000001021f459ac0 000001021fedabc0 ffffffff802b01c8 Call Trace: {dev_queue_xmit+93} {neigh_resolve_output+578} {neigh_update+626} {arp_process+1257} {:ib_ipoib:ipoib_ib_completion+936} {netif_receive_skb+590} {process_backlog+136} {net_rx_action+129} {__do_softirq+88} {do_softirq+49} {do_IRQ+328} {ret_from_intr+0} {default_idle+0} {default_idle+32} {cpu_idle+26} Code: 0f 0b 41 ee 31 80 ff ff ff ff 61 04 85 d2 b8 00 00 00 00 0f RIP {__skb_linearize+62} RSP <00000102270efcf8> <0>Kernel panic - not syncing: Oops ---------------- Our HCA hardware/firmware is: -bash-3.00# ./ibv_devinfo hca_id: mthca0 fw_ver: 5.1.937 node_guid: 0002:c902:0023:85cc sys_image_guid: 0002:c902:0023:85cf vendor_id: 0x02c9 vendor_part_id: 25218 hw_ver: 0xA0 board_id: MT_0370110001 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 1 port_lid: 69 port_lmc: 0x00 port: 2 state: PORT_DOWN (1) max_mtu: 2048 (4) active_mtu: 512 (2) sm_lid: 0 port_lid: 0 port_lmc: 0x00 -bash-3.00# lspci -vvv [[ snip ]] 41:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (rev a0) Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Message-ID: Appologies for sending multiple copies. As far as I am aware, I sent a plain text only with 1 attachment. The reason for the attachment is because sometimes the mailer munges the white spaces causing problems when the patch is applied. I am not sure how you saw 3 of them. Pradeep pradeep at us.ibm.com "Michael S. Tsirkin" wrote on 03/10/2007 11:13:00 PM: > > Quoting Pradeep Satyanarayana : > > Subject: IPOIB CM (NOSRQ) patch for review > > > > > > Here is a first version of the IPOIB_CM_NOSRQ patch for review. > > Please, avoid sending multiple copies of the patch. > This messages had 3 of them: > - plain text > - HTML version > - attachment > > And the multipart is set in such a way that one sees the HTML > part by default whic is most likely not the intended effect. > > Just send a plain text message with an attachment inline, please. > > -- > MST -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Mon Mar 12 11:27:20 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 12 Mar 2007 13:27:20 -0500 Subject: [ofa-general] RFC: pull from 2.6.21 In-Reply-To: <20070311091253.GF6858@mellanox.co.il> References: <20070311091253.GF6858@mellanox.co.il> Message-ID: <1173724040.7183.64.camel@stevo-desktop> For the chelsio drivers, you're gonna have problems since they are now in 2.6.21. So pulling will result in a collision since you committed these directly into ofed_1_2 (no common ancestor for the merge). However if you want to do this, I'm all for it so that the code bases are aligned exactly as of 2.6.21. The code isn't very different except that I did more clean up work for 2.6.21 (like getting rid of the core directory). If you do this, then are you comitting to do a final pull when 2.6.21 goes gold? Steve. On Sun, 2007-03-11 at 11:12 +0200, Michael S. Tsirkin wrote: > Hi! > I'm considering pulling git from linus 2.6.21-rc3 - this would > make development easier by cutting down the number of patches > we have to apply. > > 2 things I'm still checking and I'd like to get confirmation on: > > 1. merged_sean_rdma_dev_ofed_1_2.patch - I think all multicast bits > are merged in 2.6.21-rc3 so we only have to take code from local_sa > branch now. Right? > > 2. ipath-xxx - except ipath-26-wc-qp.patch, these aren't upstream > so they still have to be applied. Correct? > > 3. ehca-xxx - these all seem to be applied > From mshefty at ichips.intel.com Mon Mar 12 12:01:31 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 12 Mar 2007 12:01:31 -0700 Subject: [ofa-general] Re: [PATCH]] ucma backport to 2.6.19 In-Reply-To: <20070311064332.GN17114@mellanox.co.il> References: <20070308192530.GD17114@mellanox.co.il> <000101c761c0$cf0d7810$ff0da8c0@amr.corp.intel.com> <20070311064332.GN17114@mellanox.co.il> Message-ID: <45F5A38B.50005@ichips.intel.com> > With the cross-build setup, its actually quite easy to test > a patch on all kernels. But oh well. I guess I'll do it for now. How can I do this? From pradeep at us.ibm.com Mon Mar 12 12:13:36 2007 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Mon, 12 Mar 2007 12:13:36 -0700 Subject: [ofa-general] IPOIB CM (NOSRQ) patch for review In-Reply-To: Message-ID: Roland Dreier wrote on 03/09/2007 03:10:44 PM: > > +EXTRA_CFLAGS += -DIPOIB_CM_NOSRQ > > This type of compile-time selection is obviously unacceptable for > anything that we actually merge upstream. What is needed is for the > IPoIB driver to decide at runtime what to do, so that on a system with > multiple different types of HCAs installed, IPoIB CM uses SRQs on the > HCAs that support SRQs, and does not use SRQs on the HCAs that don't. I dug through the spec and found that ib_query_device() tells one if the HCA supports SRQ or not. Is that what you had in mind? > > Not to mention the fact that basically mixing together two different > implementations with a liberal sprinkling of #ifdef IPOIB_CM_NOSRQ > makes the code basically unreadable and unmaintainable. One way to alleviate this problem would be to duplicate mainly the receive side functions and name them something like xyz_nosrq(). However, there will still be many instances of if(SRQ) xyz(); else xyz_nosrq(); Is that a better solution than the #ifdef IPOIB_CM_NOSRQ? On the other hand, this will add duplicate code and may pose some maintainability issues in the future. I would like to understand as to which one is the preferred approach. > > > + /* This is missing in Michael's code -Do we need this */ > > seems like it would be easy to answer this question -- just try it > without the change. And I think the answer is no, there's no reason > to move QPs that are not used to send data to the RTS state. Yes, you are right. I should have dropped this before submitting the patch. > > > void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) > > > +#ifdef IPOIB_CM_NOSRQ > > + spin_lock_irqsave(&priv->lock, flags); > > + list_for_each_entry(rx_ptr, &priv->cm.passive_ids, list) > > + if(qp_num == rx_ptr->qp->qp_num) { > > + found = 1; > > + break; > > + } > > + spin_unlock_irqrestore(&priv->lock, flags); > > This seems crazy -- you do a linear search through a list of QPs > (which potentially has 100s of entries) for every receive completion! > Just the spinlock alone is something we would want to avoid in the hot > performance path. I envisaged the NOSRQ case for small clusters only. Othewise, this may be a memory hog and affect (other) application performance. Pradeep pradeep at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Mar 12 13:36:10 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Mar 2007 15:36:10 -0500 Subject: [ofa-general] [PATCH][MINOR] OpenSM/libvendor/osm_vendor_ibumad.c: In osm_vendor_send, simplify redundant code Message-ID: <1173731769.5995.22172.camel@hal.voltaire.com> OpenSM/libvendor/osm_vendor_ibumad.c: In osm_vendor_send, simplify redundant code Signed-off-by: Hal Rosenstock diff --git a/osm/libvendor/osm_vendor_ibumad.c b/osm/libvendor/osm_vendor_ibumad.c index 24b5e11..8661731 100644 --- a/osm/libvendor/osm_vendor_ibumad.c +++ b/osm/libvendor/osm_vendor_ibumad.c @@ -1075,13 +1075,14 @@ osm_vendor_send( umad_set_grh(p_vw->umad, 0); goto Resp; } + /* GSI classes */ + umad_set_addr_net(p_vw->umad, p_mad_addr->dest_lid, + p_mad_addr->addr_type.gsi.remote_qp, + p_mad_addr->addr_type.gsi.service_level, + IB_QP1_WELL_KNOWN_Q_KEY); + umad_set_grh(p_vw->umad, 0); /* FIXME: GRH support */ + umad_set_pkey(p_vw->umad, p_mad_addr->addr_type.gsi.pkey); if (ib_class_is_rmpp(p_mad->mgmt_class)) { /* RMPP GSI classes FIXME: no GRH */ - umad_set_addr_net(p_vw->umad, p_mad_addr->dest_lid, - p_mad_addr->addr_type.gsi.remote_qp, - p_mad_addr->addr_type.gsi.service_level, - IB_QP1_WELL_KNOWN_Q_KEY); - umad_set_grh(p_vw->umad, 0); /* FIXME: GRH support */ - umad_set_pkey(p_vw->umad, p_mad_addr->addr_type.gsi.pkey); if (!ib_rmpp_is_flag_set((ib_rmpp_mad_t *)p_sa, IB_RMPP_FLAG_ACTIVE)) { /* Clear RMPP header when RMPP not ACTIVE */ @@ -1108,13 +1109,6 @@ osm_vendor_send( p_sa->paylen_newwin = cl_ntoh32(paylen); } #endif - } else { /* non RMPP GSI classes FIXME: no GRH */ - umad_set_addr_net(p_vw->umad, p_mad_addr->dest_lid, - p_mad_addr->addr_type.gsi.remote_qp, - p_mad_addr->addr_type.gsi.service_level, - IB_QP1_WELL_KNOWN_Q_KEY); - umad_set_grh(p_vw->umad, 0); /* FIXME: GRH support */ - umad_set_pkey(p_vw->umad, p_mad_addr->addr_type.gsi.pkey); } Resp: From mst at dev.mellanox.co.il Mon Mar 12 13:24:18 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Mar 2007 22:24:18 +0200 Subject: [ofa-general] Re: RFC: pull from 2.6.21 In-Reply-To: <1173724040.7183.64.camel@stevo-desktop> References: <20070311091253.GF6858@mellanox.co.il> <1173724040.7183.64.camel@stevo-desktop> Message-ID: <20070312202418.GC8995@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: RFC: pull from 2.6.21 > > For the chelsio drivers, you're gonna have problems since they are now > in 2.6.21. So pulling will result in a collision since you committed > these directly into ofed_1_2 (no common ancestor for the merge). Yes we'll have ot replace these. > However if you want to do this, I'm all for it so that the code bases > are aligned exactly as of 2.6.21. The code isn't very different except > that I did more clean up work for 2.6.21 (like getting rid of the core > directory). Will there be problems with backports? Are you OK with fixing them if necessary? > If you do this, then are you comitting to do a final pull when 2.6.21 > goes gold? Depends on the time scale, but basically yes. -- MST From sean.hefty at intel.com Mon Mar 12 13:30:04 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 12 Mar 2007 13:30:04 -0700 Subject: [ofa-general] Re: RFC: pull from 2.6.21 In-Reply-To: <20070312202418.GC8995@mellanox.co.il> Message-ID: <000401c764e5$37fa56f0$7acc180a@amr.corp.intel.com> >Will there be problems with backports? >Are you OK with fixing them if necessary? I expect that there will be minor backport or other conflicts updating to 2.6.21, but I would rather see OFED sync up with the upstream kernel myself. I heard that OFED voted this morning not to update to 2.6.21. Is this correct, or is updating to 2.6.21 still a possibility? - Sean From mst at dev.mellanox.co.il Mon Mar 12 13:31:24 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Mar 2007 22:31:24 +0200 Subject: [ofa-general] Re: [PATCH]] ucma backport to 2.6.19 In-Reply-To: <45F5A38B.50005@ichips.intel.com> References: <20070308192530.GD17114@mellanox.co.il> <000101c761c0$cf0d7810$ff0da8c0@amr.corp.intel.com> <20070311064332.GN17114@mellanox.co.il> <45F5A38B.50005@ichips.intel.com> Message-ID: <20070312203124.GD8995@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [PATCH]] ucma backport to 2.6.19 > > >With the cross-build setup, its actually quite easy to test > >a patch on all kernels. But oh well. I guess I'll do it for now. > > How can I do this? Here's a mail from Vlad: On ssh.openfabrics.org: Run env git_url=/home/mst/scm/ofed_1_2_devel.git git_branch=ofed_1_2 \ CHECK_LOCAL=yes \ CHECK_KERNEL_ORG=yes \ CHECK_CROSS=yes /home/vlad/scripts/build_ofa_kernel.sh git_url= is a clone of ofed kernel tree. change other parameters as needed - look them up inside the script /home/vlad/scripts/build_ofa_kernel.sh. you can also upload more kernels in you home directory and we can add them to build. -- MST From swise at opengridcomputing.com Mon Mar 12 13:39:48 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 12 Mar 2007 15:39:48 -0500 Subject: [ofa-general] Re: RFC: pull from 2.6.21 In-Reply-To: <20070312202418.GC8995@mellanox.co.il> References: <20070311091253.GF6858@mellanox.co.il> <1173724040.7183.64.camel@stevo-desktop> <20070312202418.GC8995@mellanox.co.il> Message-ID: <1173731988.7183.96.camel@stevo-desktop> On Mon, 2007-03-12 at 22:24 +0200, Michael S. Tsirkin wrote: > > Quoting Steve Wise : > > Subject: Re: RFC: pull from 2.6.21 > > > > For the chelsio drivers, you're gonna have problems since they are now > > in 2.6.21. So pulling will result in a collision since you committed > > these directly into ofed_1_2 (no common ancestor for the merge). > > Yes we'll have ot replace these. > > > However if you want to do this, I'm all for it so that the code bases > > are aligned exactly as of 2.6.21. The code isn't very different except > > that I did more clean up work for 2.6.21 (like getting rid of the core > > directory). > > Will there be problems with backports? There were only one or two commits to drivers/net/cxgb3 that didn't get pulled into ofed_1_2. So the backport pain should be minor. > Are you OK with fixing them if necessary? > Yes I will fix them, assuming they're easy :-) > > If you do this, then are you comitting to do a final pull when 2.6.21 > > goes gold? > > Depends on the time scale, but basically yes. > From bugzilla-daemon at lists.openfabrics.org Mon Mar 12 13:46:23 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 12 Mar 2007 13:46:23 -0700 (PDT) Subject: [ofa-general] [Bug 447] ib_ipoib kernel 2.6.9-34 panic when routing to 10G ethernet In-Reply-To: Message-ID: <20070312204623.D600EE60804@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=447 ------- Comment #1 from mst at mellanox.co.il 2007-03-12 13:46 ------- Subject: New: ib_ipoib kernel 2.6.9-34 panic when routing to 10G ethernet This is for OFED 1.1, isn't it? Did you apply the patches listed at the support page (some of these are for memory corruption issues)? -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at dev.mellanox.co.il Mon Mar 12 14:36:01 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Mar 2007 23:36:01 +0200 Subject: [ofa-general] Re: IPOIB CM (NOSRQ) patch for review In-Reply-To: References: <20070311071300.GQ17114@mellanox.co.il> Message-ID: <20070312213600.GE8995@mellanox.co.il> > Quoting Pradeep Satyanarayana : > Subject: Re: IPOIB CM (NOSRQ) patch for review > > > Appologies for sending multiple copies. As far as I am aware, I sent a plain > text only with 1 attachment. The reason for the > attachment is because sometimes the mailer munges the white spaces causing > problems when the patch is applied. Can't you fix the mailer? Or switch mail clients - it's not like there's a shortage of these. > I am not sure how you saw 3 of them. That's what I'm saying - you are sending multipart/alternate which means a copy in HTML and a copy in plain text format. Just turning the HTML thing off in your client will usually fix this. -- MST From robert.j.woodruff at intel.com Mon Mar 12 14:36:41 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Mon, 12 Mar 2007 14:36:41 -0700 Subject: [ofa-general] uDAPL question In-Reply-To: <1E3DCD1C63492545881FACB6063A57C1D522D5@mtiexch01.mti.com> Message-ID: Ah, the bug I was referring to was in OFED 1.2 alpha, this must be something else, Arlin can look and the debug output you just sent that should provide a better clues as to what is happening. -----Original Message----- From: Boris Shpolyansky [mailto:boris at mellanox.com] Sent: Monday, March 12, 2007 8:28 AM To: Woodruff, Robert J; general at lists.openfabrics.org; Hefty, Sean Subject: RE: [ofa-general] uDAPL question Hi Woody, Thanks for your help. I guess the problem is in the CM - is it ? Can you point me to relevant communication/bug reports that explain the fix for this issue ? Would Sean be the right person to ask regarding what exact patch should be added/removed ? I would prefer to stick to OFED-1.1 code with minimal changes - if possible - to avoid compatibility issues. Thanks, Boris -----Original Message----- From: Woodruff, Robert J [mailto:robert.j.woodruff at intel.com] Sent: Monday, March 12, 2007 8:24 AM To: Boris Shpolyansky; general at lists.openfabrics.org; Hefty, Sean Subject: RE: [ofa-general] uDAPL question This is a known problem and should be fixed by now, There was a bad patch that somehow got into OFED that was not in Sean main tree. Assuming this bad patch has been removed, the problem should be fixed. woody ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Boris Shpolyansky Sent: Friday, March 09, 2007 8:40 PM To: general at lists.openfabrics.org Subject: [ofa-general] uDAPL question Hi, I'm trying to get simple Intel MPI benchmark running over IB (uDAPL) using OFED-1.1 stack. I'm consistently getting the following error: [root at ibd005 ~]# ./runjob_I_MPI.boris 2 Task 0 of 2 tasks started on host ibd005.ibd.mti.com clock_resolution = 1.00e-06 s Task 1 of 2 tasks started on host ibd006.ibd.mti.com [0:ibd005] unexpected DAPL event 4006 from 1:ibd006 [1:ibd006] unexpected DAPL event 4006 from 0:ibd005 rank 0 in job 14 ibd005_36193 caused collective abort of all ranks exit status of rank 0: return code 254 I did some digging and found out that event 4006 (actually 0x4006) means DAT_CONNECTION_EVENT_BROKEN and it is returned by function dat_rmr_bind. So my question is why this function consistently fails. I'm using standard dat.conf file: OpenIB-cma u1.2 nonthreadsafe default /usr/local/ofed/lib64/libdaplcma.so mv_dapl.1.2 "ib0 0" "" Appreciate your help, Boris Shpolyansky From or.gerlitz at gmail.com Mon Mar 12 14:41:59 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Mon, 12 Mar 2007 23:41:59 +0200 Subject: [ofa-general] Re: RFC: pull from 2.6.21 In-Reply-To: <000401c764e5$37fa56f0$7acc180a@amr.corp.intel.com> References: <20070312202418.GC8995@mellanox.co.il> <000401c764e5$37fa56f0$7acc180a@amr.corp.intel.com> Message-ID: <15ddcffd0703121441q5096a323h942acc90d78f4a6b@mail.gmail.com> On 3/12/07, Sean Hefty wrote: > >Will there be problems with backports? > >Are you OK with fixing them if necessary? > > I expect that there will be minor backport or other conflicts updating to > 2.6.21, but I would rather see OFED sync up with the upstream kernel myself. > > I heard that OFED voted this morning not to update to 2.6.21. Is this correct, > or is updating to 2.6.21 still a possibility? What does it means that "OFED voted for this or that matter" who is excatly the voting body? do you mean to the "openfabrics board"? i understand it is populated with marketing people from quite bunch of companies (say no more). Or. From sean.hefty at intel.com Mon Mar 12 14:50:16 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 12 Mar 2007 14:50:16 -0700 Subject: [ofa-general] Re: RFC: pull from 2.6.21 In-Reply-To: <15ddcffd0703121441q5096a323h942acc90d78f4a6b@mail.gmail.com> Message-ID: <000501c764f0$6bb10380$7acc180a@amr.corp.intel.com> >What does it means that "OFED voted for this or that matter" who is >excatly the voting body? do you mean to the "openfabrics board"? i >understand it is populated with marketing people from quite bunch of >companies (say no more). How about the EWG decided not to update to 2.6.21 for the OFED 1.2 release? I was not on the conference call, and am only reporting this second-hand. I care because there is a fix in 2.6.21-rc3 that I've been asked to pull into OFED 1.2. So I either need to back that fix into the OFED kernel or update my kernel sa_cache branch to 2.6.21-rc3. - Sean From robert.j.woodruff at intel.com Mon Mar 12 14:52:30 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Mon, 12 Mar 2007 14:52:30 -0700 Subject: [ofa-general] Re: RFC: pull from 2.6.21 In-Reply-To: <000501c764f0$6bb10380$7acc180a@amr.corp.intel.com> Message-ID: I was on the call and people thought that it was too late in the release cycle (the day before Beta) to switch to a new kernel since all the testing has been done on 2.6.20. -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Sean Hefty Sent: Monday, March 12, 2007 2:50 PM To: 'Or Gerlitz' Cc: openfabrics-ewg at openib.org; general at lists.openfabrics.org Subject: RE: [ofa-general] Re: RFC: pull from 2.6.21 >What does it means that "OFED voted for this or that matter" who is >excatly the voting body? do you mean to the "openfabrics board"? i >understand it is populated with marketing people from quite bunch of >companies (say no more). How about the EWG decided not to update to 2.6.21 for the OFED 1.2 release? I was not on the conference call, and am only reporting this second-hand. I care because there is a fix in 2.6.21-rc3 that I've been asked to pull into OFED 1.2. So I either need to back that fix into the OFED kernel or update my kernel sa_cache branch to 2.6.21-rc3. - Sean _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mshefty at ichips.intel.com Mon Mar 12 15:49:08 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 12 Mar 2007 15:49:08 -0700 Subject: [ofa-general] Re: [PATCH]] ucma backport to 2.6.19 In-Reply-To: <20070312203124.GD8995@mellanox.co.il> References: <20070308192530.GD17114@mellanox.co.il> <000101c761c0$cf0d7810$ff0da8c0@amr.corp.intel.com> <20070311064332.GN17114@mellanox.co.il> <45F5A38B.50005@ichips.intel.com> <20070312203124.GD8995@mellanox.co.il> Message-ID: <45F5D8E4.6090607@ichips.intel.com> > Here's a mail from Vlad: > > On ssh.openfabrics.org: > Run > env git_url=/home/mst/scm/ofed_1_2_devel.git git_branch=ofed_1_2 \ > CHECK_LOCAL=yes \ > CHECK_KERNEL_ORG=yes \ > CHECK_CROSS=yes /home/vlad/scripts/build_ofa_kernel.sh > > git_url= is a clone of ofed kernel tree. > change other parameters as needed - look them up inside > the script /home/vlad/scripts/build_ofa_kernel.sh. I had GIT_DIR=. set as part of my login, but after removing that I'm able to get farther. Should all of the builds succeed? I see failures for just about every build. Looking at what appears to be the log files, the failures are related to cxgb3. - Sean From halr at voltaire.com Mon Mar 12 17:19:33 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Mar 2007 19:19:33 -0500 Subject: [ofa-general] [PATCH]{MINOR] OpenSM/libvendor/osm_vendor_ibumad.c: In umad_receiver, display DR path of sent MAD when it times out Message-ID: <1173745171.5995.36362.camel@hal.voltaire.com> OpenSM/libvendor/osm_vendor_ibumad.c: In umad_receiver, display DR path of sent MAD when it times out Signed-off-by: Hal Rosenstock diff --git a/osm/libvendor/osm_vendor_ibumad.c b/osm/libvendor/osm_vendor_ibumad.c index 8661731..0bbd4c7 100644 --- a/osm/libvendor/osm_vendor_ibumad.c +++ b/osm/libvendor/osm_vendor_ibumad.c @@ -64,6 +64,7 @@ #include #include #include +#include #include /****s* OpenSM: Vendor AL/osm_umad_bind_info_t @@ -342,9 +343,13 @@ umad_receiver(void *p_ptr) mad->mgmt_class, cl_ntoh16(ib_mad_addr->lid)); } else { + ib_smp_t *smp; + /* Direct routed SMP */ + smp = (ib_smp_t *)mad; osm_log(p_vend->p_log, OSM_LOG_ERROR, - "umad_receiver: ERR 5411: DR SMP\n"); + "umad_receiver: ERR 5411: DR SMP Hop Ptr: 0x%X\n", smp->hop_ptr); + osm_dump_smp_dr_path(p_vend->p_log, smp, OSM_LOG_ERROR); } if (!(p_req_madw = get_madw(p_vend, &mad->trans_id))) { From pradeep at us.ibm.com Mon Mar 12 16:58:45 2007 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Mon, 12 Mar 2007 16:58:45 -0700 Subject: [ofa-general] Re: IPOIB CM (NOSRQ) patch for review Message-ID: Missed out sending to the list. Pradeep pradeep at us.ibm.com ----- Forwarded by Pradeep Satyanarayana/Beaverton/IBM on 03/12/2007 04:53 PM ----- Pradeep Satyanarayana/Beaverton/IBM 03/12/2007 03:06 PM To "Michael S. Tsirkin" cc Subject Re: IPOIB CM (NOSRQ) patch for review Ok this will be set right the next time I submit a patch. Pradeep pradeep at us.ibm.com "Michael S. Tsirkin" wrote on 03/12/2007 02:36:01 PM: > > Quoting Pradeep Satyanarayana : > > Subject: Re: IPOIB CM (NOSRQ) patch for review > > > > > > Appologies for sending multiple copies. As far as I am aware, I sent a plain > > text only with 1 attachment. The reason for the > > attachment is because sometimes the mailer munges the white spaces causing > > problems when the patch is applied. > > Can't you fix the mailer? Or switch mail clients - it's not like > there's a shortage of these. > > > I am not sure how you saw 3 of them. > > That's what I'm saying - you are sending multipart/alternate > which means a copy in HTML and a copy in plain text format. > Just turning the HTML thing off in your client will usually > fix this. > > -- > MST From bugzilla-daemon at lists.openfabrics.org Tue Mar 13 01:55:04 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 13 Mar 2007 01:55:04 -0700 (PDT) Subject: [ofa-general] [Bug 449] New: DMA vs CQ race on IA64 Altix platform Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=449 Summary: DMA vs CQ race on IA64 Altix platform Product: OpenFabrics Linux Version: 1.2alpha1 Platform: IA64 OS/Version: All Status: NEW Severity: major Priority: P2 Component: Verbs AssignedTo: bugzilla at openib.org ReportedBy: monil at voltaire.com This issue was reported by SGI in the following email thread: http://openib.org/pipermail/openib-general/2006-December/030251.html The problem was discussed later on a few occasions and now it bocomes an issue that needs to be fixed for OFED 1.2 in order to support SGI. -- Moni -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From vlad at lists.openfabrics.org Tue Mar 13 02:14:34 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Tue, 13 Mar 2007 02:14:34 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070313-0200 daily build status Message-ID: <20070313091434.6F18EE60804@openfabrics.org> This email was generated automatically, please do not reply Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.20 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.16 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.19 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.15 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ia64 with linux-2.6.16.21-0.8-default Failed: From monisonlists at gmail.com Tue Mar 13 02:39:00 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Tue, 13 Mar 2007 11:39:00 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v3] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <20070312154734.GD28157@mellanox.co.il> References: <45F41FFD.102@voltaire.com> <20070312154734.GD28157@mellanox.co.il> Message-ID: <45F67134.3000708@gmail.com> Michael S. Tsirkin wrote: > Another question. > >> @@ -769,32 +768,32 @@ static void ipoib_set_mcast_list(struct >> static void ipoib_neigh_destructor(struct neighbour *n) >> { >> struct ipoib_neigh *neigh; >> - struct ipoib_dev_priv *priv = netdev_priv(n->dev); >> + struct ipoib_dev_priv *priv; >> unsigned long flags; >> struct ipoib_ah *ah = NULL; >> >> - ipoib_dbg(priv, >> - "neigh_destructor for %06x " IPOIB_GID_FMT "\n", >> - IPOIB_QPN(n->ha), >> - IPOIB_GID_RAW_ARG(n->ha + 4)); >> - >> - spin_lock_irqsave(&priv->lock, flags); >> >> neigh = *to_ipoib_neigh(n); >> if (neigh) { >> + priv = netdev_priv(neigh->dev); >> + ipoib_dbg(priv, >> + "neigh_destructor for %06x " IPOIB_GID_FMT "\n", >> + IPOIB_QPN(n->ha), >> + IPOIB_GID_RAW_ARG(n->ha + 4)); >> + >> + spin_lock_irqsave(&priv->lock, flags); >> if (neigh->ah) >> ah = neigh->ah; >> list_del(&neigh->list); >> ipoib_neigh_free(n->dev, neigh); >> + spin_unlock_irqrestore(&priv->lock, flags); >> } >> - >> - spin_unlock_irqrestore(&priv->lock, flags); >> - >> if (ah) >> ipoib_put_ah(ah); >> } > > Using to_ipoib_neigh outside priv->lock looks problematic. > Can you convince me this does not introduce new races? > > I can try... ipoib_neigh_destructor is called from neigh_destroy() and this is when the kernel neighbour is under destruction itself and no one holds a reference to it. My opinion is that if I can't assume that no one is touching ipoib_neigh when kernel neighbour is being destroyed then we have a bigger problem. From ogerlitz at voltaire.com Tue Mar 13 02:47:23 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 13 Mar 2007 11:47:23 +0200 Subject: [ofa-general] Re: RFC: pull from 2.6.21 In-Reply-To: <000501c764f0$6bb10380$7acc180a@amr.corp.intel.com> References: <000501c764f0$6bb10380$7acc180a@amr.corp.intel.com> Message-ID: <45F6732B.9060404@voltaire.com> Sean Hefty wrote: >> What does it means that "OFED voted for this or that matter" who is >> excatly the voting body? do you mean to the "openfabrics board"? i >> understand it is populated with marketing people from quite bunch of >> companies (say no more). > > How about the EWG decided not to update to 2.6.21 for the OFED 1.2 release? I > was not on the conference call, and am only reporting this second-hand. sure, before writing this i have scanned the ewg mailing list and saw no such thread and as there was no minutes-reporting email i forgot that there was a meeting this week... Or. From monisonlists at gmail.com Tue Mar 13 02:52:29 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Tue, 13 Mar 2007 11:52:29 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v3] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <20070312153251.GC28157@mellanox.co.il> References: <45F41FFD.102@voltaire.com> <20070311154042.GJ31985@mellanox.co.il> <45F55B6E.7010004@gmail.com> <20070312141309.GB21643@mellanox.co.il> <45F56E00.9000701@gmail.com> <20070312153251.GC28157@mellanox.co.il> Message-ID: <45F6745D.2060003@gmail.com> Michael S. Tsirkin wrote: >> Quoting Moni Shoua : >> >>> Where does the claim that module unload is unsafe by definition >>> come from? Weren't the races solved in 2.6 with the new in-kernel >>> loader? >> Well, what happens with bonding and IPoIB speaks for itself, doesn't it? > > Not to my eyes. > >> If I can unload a module that is being referenced from the outside > > This just means there's a bug in the specific module(s). > In this case it is either ipoib or bonding module (or both). > >> then I am not protected by the kernel from doing something wrong. > > Yes, this is really fundamental in kernel programming. > That's what makes it interesting. > I'm not trying to avoid the challenge/fun of kernel programming but only to fix one bug without introducing the other fixes at the same time. My opinion is that IPoIB shouldn't assume that n->dev is an IPoIB device because event if we fix the dependency bug between ib_ipoib and bonding this assumption is still wrong. One more thing... We can tell the customer that unloading modules is allowed but that they have to do it in the right order and anyway, From mst at dev.mellanox.co.il Tue Mar 13 02:57:32 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Mar 2007 11:57:32 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v3] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <45F67134.3000708@gmail.com> References: <45F41FFD.102@voltaire.com> <20070312154734.GD28157@mellanox.co.il> <45F67134.3000708@gmail.com> Message-ID: <20070313095732.GG2608@mellanox.co.il> > Quoting Moni Shoua : > Subject: Re: [ofa-general] Re: [RFC] [PATCH v3] IB/ipoib: Add bonding support to?IPoIB > > Michael S. Tsirkin wrote: > > Another question. > > > >> @@ -769,32 +768,32 @@ static void ipoib_set_mcast_list(struct > >> static void ipoib_neigh_destructor(struct neighbour *n) > >> { > >> struct ipoib_neigh *neigh; > >> - struct ipoib_dev_priv *priv = netdev_priv(n->dev); > >> + struct ipoib_dev_priv *priv; > >> unsigned long flags; > >> struct ipoib_ah *ah = NULL; > >> > >> - ipoib_dbg(priv, > >> - "neigh_destructor for %06x " IPOIB_GID_FMT "\n", > >> - IPOIB_QPN(n->ha), > >> - IPOIB_GID_RAW_ARG(n->ha + 4)); > >> - > >> - spin_lock_irqsave(&priv->lock, flags); > >> > >> neigh = *to_ipoib_neigh(n); > >> if (neigh) { > >> + priv = netdev_priv(neigh->dev); > >> + ipoib_dbg(priv, > >> + "neigh_destructor for %06x " IPOIB_GID_FMT "\n", > >> + IPOIB_QPN(n->ha), > >> + IPOIB_GID_RAW_ARG(n->ha + 4)); > >> + > >> + spin_lock_irqsave(&priv->lock, flags); > >> if (neigh->ah) > >> ah = neigh->ah; > >> list_del(&neigh->list); > >> ipoib_neigh_free(n->dev, neigh); > >> + spin_unlock_irqrestore(&priv->lock, flags); > >> } > >> - > >> - spin_unlock_irqrestore(&priv->lock, flags); > >> - > >> if (ah) > >> ipoib_put_ah(ah); > >> } > > > > Using to_ipoib_neigh outside priv->lock looks problematic. > > Can you convince me this does not introduce new races? > > > > > I can try... > ipoib_neigh_destructor is called from neigh_destroy() and this is when the > kernel neighbour is under destruction itself and no one holds a reference to > it. OK but we might have references to ipoib_neigh. Specifically path and mcast group all might have it - that's what neigh_list is. > My opinion is that if I can't assume that no one is touching ipoib_neigh when kernel > neighbour is being destroyed then we have a bigger problem. That's what locks you remove seem to be there for. -- MST From monisonlists at gmail.com Tue Mar 13 03:13:00 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Tue, 13 Mar 2007 12:13:00 +0200 Subject: [ofa-general] Re: infiniband bonding/merging/aggregation with SDP and/or VERBS In-Reply-To: <20070312145716.GA28157@mellanox.co.il> References: <45EFCCEC.4010205@gmail.com> <45F50CC0.2000506@mellanox.co.il> <45F56848.5020008@gmail.com> <20070312145716.GA28157@mellanox.co.il> Message-ID: <45F6792C.10603@gmail.com> Michael S. Tsirkin wrote: >> Quoting Moni Shoua : >> Subject: Re: infiniband bonding/merging/aggregation with SDP and/or?VERBS >> >> Tziporet Koren wrote: >>> Moni Shoua wrote: >>>> ib-bonding is based on standard Linux bonding with some required >>>> changes to make it work with IPoIB. >>> What is the status of accepting the specific IB changes to Linux kernel? >> I've sent a new patch yesterday with a comment about module unload being >> unsafe under specific scenarios. >> Michael's opinion is that we should fix this issue first but I think that >> there is a place to push the current patch now and wait for the other fix to >> come later (which is probably not in IB or at least not just IB). > > I don't think this summarizes my opinion correctly. > I just would like to see all the patches together - I have a feeling caching the > device in ipoib->dev is problematic and module unloading issues are just a > symptom pointing to deeper issues. > I'll try to summarize the concept of bonding and IPoiB soon but in the meantime I just want to make sure that I understand you. You're saying that caching the device in ipoib->dev is problematic. What do you mean by that? This is how bonding works (even for Ethernet). It takes a pointer of dev and remembers it for its operation. From mst at dev.mellanox.co.il Tue Mar 13 03:21:16 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Mar 2007 12:21:16 +0200 Subject: [ofa-general] Re: infiniband bonding/merging/aggregation with SDP and/or VERBS In-Reply-To: <45F6792C.10603@gmail.com> References: <45EFCCEC.4010205@gmail.com> <45F50CC0.2000506@mellanox.co.il> <45F56848.5020008@gmail.com> <20070312145716.GA28157@mellanox.co.il> <45F6792C.10603@gmail.com> Message-ID: <20070313102116.GK2608@mellanox.co.il> > Quoting Moni Shoua : > Subject: Re: infiniband bonding/merging/aggregation with SDP and/or?VERBS > > Michael S. Tsirkin wrote: > >> Quoting Moni Shoua : > >> Subject: Re: infiniband bonding/merging/aggregation with SDP and/or?VERBS > >> > >> Tziporet Koren wrote: > >>> Moni Shoua wrote: > >>>> ib-bonding is based on standard Linux bonding with some required > >>>> changes to make it work with IPoIB. > >>> What is the status of accepting the specific IB changes to Linux kernel? > >> I've sent a new patch yesterday with a comment about module unload being > >> unsafe under specific scenarios. > >> Michael's opinion is that we should fix this issue first but I think that > >> there is a place to push the current patch now and wait for the other fix to > >> come later (which is probably not in IB or at least not just IB). > > > > I don't think this summarizes my opinion correctly. > > I just would like to see all the patches together - I have a feeling caching the > > device in ipoib->dev is problematic and module unloading issues are just a > > symptom pointing to deeper issues. > > > I'll try to summarize the concept of bonding and IPoiB soon but in the > meantime I just want to make sure that I understand you. You're saying that > caching the device in ipoib->dev is problematic. What do you mean by that? I am merely speaking about caching the device pointer inside struct ipoib_neigh. I have a feeling this addresses a symptom and not the root cause. > This is how bonding works (even for Ethernet). It takes a pointer of dev and > remembers it for its operation. That's fine, but maybe bonding can be fixed to have neighbour->dev and skb->dev match, and if they don't, destroy the neighbour and create a new one. -- MST From bugzilla-daemon at lists.openfabrics.org Tue Mar 13 05:02:31 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 13 Mar 2007 05:02:31 -0700 (PDT) Subject: [ofa-general] [Bug 450] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=450 Summary: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 Product: OpenFabrics Linux Version: 1.2alpha1 Platform: X86 OS/Version: RHEL 4 Status: NEW Severity: major Priority: P2 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: monil at voltaire.com Reproduced by using iperf default parameters TCP BW test (iperf -s / iperf -c -i 1) over ipoib ibterface with mtu=-1500, on _32bit_ RH4-U3 we see major performance drop (from 1.3Gb downto 50Mb). We do not see the same behavior on 64 bit OS or mtu=2044. [root at src2 ~]# iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 8.00 MByte (default) ------------------------------------------------------------ [ 4] local 172.16.0.120 port 5001 connected with 172.16.0.121 port 35022 [ 4] 0.0-28.6 sec 3.52 GBytes 1.06 Gbits/sec [root at sink2 ~]# iperf -c 172.16.0.120 -i 1 -t 1000 ------------------------------------------------------------ Client connecting to 172.16.0.120, TCP port 5001 TCP window size: 8.00 MByte (default) ------------------------------------------------------------ [ 3] local 172.16.0.121 port 35022 connected with 172.16.0.120 port 5001 [ 3] 0.0- 1.0 sec 153 MBytes 1.28 Gbits/sec [ 3] 1.0- 2.0 sec 153 MBytes 1.29 Gbits/sec [ 3] 2.0- 3.0 sec 153 MBytes 1.29 Gbits/sec [ 3] 3.0- 4.0 sec 153 MBytes 1.28 Gbits/sec [ 3] 4.0- 5.0 sec 153 MBytes 1.29 Gbits/sec [ 3] 5.0- 6.0 sec 153 MBytes 1.29 Gbits/sec [ 3] 6.0- 7.0 sec 153 MBytes 1.28 Gbits/sec [ 3] 7.0- 8.0 sec 153 MBytes 1.28 Gbits/sec [ 3] 8.0- 9.0 sec 153 MBytes 1.28 Gbits/sec [ 3] 9.0-10.0 sec 153 MBytes 1.28 Gbits/sec [ 3] 10.0-11.0 sec 154 MBytes 1.29 Gbits/sec [ 3] 11.0-12.0 sec 154 MBytes 1.29 Gbits/sec [ 3] 12.0-13.0 sec 153 MBytes 1.29 Gbits/sec [ 3] 13.0-14.0 sec 153 MBytes 1.29 Gbits/sec [ 3] 14.0-15.0 sec 153 MBytes 1.28 Gbits/sec [ 3] 15.0-16.0 sec 153 MBytes 1.29 Gbits/sec [ 3] 16.0-17.0 sec 153 MBytes 1.29 Gbits/sec [ 3] 17.0-18.0 sec 153 MBytes 1.29 Gbits/sec [ 3] 18.0-19.0 sec 153 MBytes 1.29 Gbits/sec [ 3] 19.0-20.0 sec 153 MBytes 1.28 Gbits/sec [ 3] 20.0-21.0 sec 153 MBytes 1.29 Gbits/sec [ 3] 21.0-22.0 sec 153 MBytes 1.29 Gbits/sec [ 3] 22.0-23.0 sec 153 MBytes 1.29 Gbits/sec [ 3] 23.0-24.0 sec 44.3 MBytes 372 Mbits/sec [ 3] 24.0-25.0 sec 7.92 MBytes 66.5 Mbits/sec [ 3] 25.0-26.0 sec 7.98 MBytes 66.9 Mbits/sec [ 3] 26.0-27.0 sec 7.97 MBytes 66.8 Mbits/sec [root at sink2 ~]# -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Tue Mar 13 05:05:00 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 13 Mar 2007 05:05:00 -0700 (PDT) Subject: [ofa-general] [Bug 450] IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 In-Reply-To: Message-ID: <20070313120500.C1614E6080B@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=450 ------- Comment #1 from monil at voltaire.com 2007-03-13 05:05 ------- Created an attachment (id=96) --> (https://bugs.openfabrics.org/attachment.cgi?id=96&action=view) Full description of the test environment -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From monisonlists at gmail.com Tue Mar 13 05:12:57 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Tue, 13 Mar 2007 14:12:57 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v3] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <20070313095732.GG2608@mellanox.co.il> References: <45F41FFD.102@voltaire.com> <20070312154734.GD28157@mellanox.co.il> <45F67134.3000708@gmail.com> <20070313095732.GG2608@mellanox.co.il> Message-ID: <45F69549.6030908@gmail.com> >>> Using to_ipoib_neigh outside priv->lock looks problematic. >>> Can you convince me this does not introduce new races? >>> >>> >> I can try... >> ipoib_neigh_destructor is called from neigh_destroy() and this is when the >> kernel neighbour is under destruction itself and no one holds a reference to >> it. > > OK but we might have references to ipoib_neigh. Specifically path and mcast > group all might have it - that's what neigh_list is. > Maybe I'm not get something but how does the presence of ipoib_neigh on the list is a problem? to_ipoib_neigh() takes the pointer from the neighbour itself without caring if it is on a list or not. Destruction itself is being done under lock. >> My opinion is that if I can't assume that no one is touching ipoib_neigh when kernel >> neighbour is being destroyed then we have a bigger problem. > > That's what locks you remove seem to be there for. > From mst at dev.mellanox.co.il Tue Mar 13 05:37:15 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Mar 2007 14:37:15 +0200 Subject: [ofa-general] [Bug 450] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 In-Reply-To: References: Message-ID: <20070313123715.GS2608@mellanox.co.il> Looks like you have some protocol errors. Why are you playing with ifc mtu? What does tcpdump show? -- MST From bugzilla-daemon at lists.openfabrics.org Tue Mar 13 05:36:40 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 13 Mar 2007 05:36:40 -0700 (PDT) Subject: [ofa-general] [Bug 450] IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 In-Reply-To: Message-ID: <20070313123640.9F015E6080E@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=450 ------- Comment #2 from mst at mellanox.co.il 2007-03-13 05:36 ------- Subject: New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 Looks like you have some protocol errors. Why are you playing with ifc mtu? What does tcpdump show? -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From swise at aoot.com Tue Mar 13 07:08:23 2007 From: swise at aoot.com (Steve Wise) Date: Tue, 13 Mar 2007 09:08:23 -0500 Subject: [ofa-general] mapping kernel memory to userspace in <= 2.6.14 Message-ID: <1173794903.32342.17.camel@stevo-desktop> Hey Roland, Remember my little ofed 1.2 bug where the chelsio WQ and CQ memory aren't getting mapped to userspace correctly for RHEL4U4? Well through experimentation I've shown that it fails all the way through 2.6.14 and works fine with 2.6.15 and beyond. Perusing the mm/memory.c log from v2.6.14..v2.6.15 shows lots of changes. Many of the comments talk about no longer needing to set the reserved bit on page entries. On a hunch I hacked in calling SetPageReserved() on each page entry for the memory allocated via dma_alloc_coherent(). And BOOM...things start working. Below is a patch to show the hack. Does this make sense to you? I'm not a VM expert. Also, looking at other 2.6.14 drivers, I see that many of them seem to do this same trick apparently for making sure the pages aren't swapped. However, they also clear the bit before freeing the memory, which makes sense. My original hack had a ClearPageReserved() loop before freeing my dma coherent memory, but I got crashes on process exit where the map count on pages wasn't correct (the refcnt went to -1 apparently). It hit the BUG_ON() in page_remove_rmap() at line 487 in mm/rmap.c (2.6.14.7 kernel). So I'm thinking this hack isn't quite correct. Got and ideas? Thanks, Steve. diff -up /home/swise/git/ofed_1_2/drivers/infiniband/hw/cxgb3/iwch_provider.c drivers/infiniband/hw/cxgb3/iwch_provider.c --- /home/swise/git/ofed_1_2/drivers/infiniband/hw/cxgb3/iwch_provider.c 2007-03-10 14:14:31.000000000 -0600 +++ drivers/infiniband/hw/cxgb3/iwch_provider.c 2007-03-12 22:24:42.000000000 -0500 @@ -139,6 +139,15 @@ static int iwch_destroy_cq(struct ib_cq return 0; } +static void reserve_pages(void *p, int size) +{ + while (size > 0) { + SetPageReserved(virt_to_page(p)); + p += PAGE_SIZE; + size -= PAGE_SIZE; + } +} + static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries, struct ib_ucontext *context, struct ib_udata *udata) @@ -205,6 +214,7 @@ static struct ib_cq *iwch_create_cq(stru iwch_destroy_cq(&chp->ibcq); return ERR_PTR(-EFAULT); } + reserve_pages(chp->cq.queue, entries * sizeof (struct t3_cq)); mm->addr = uresp.physaddr; mm->len = PAGE_ALIGN((1UL << uresp.size_log2) * sizeof (struct t3_cqe)); @@ -848,6 +843,7 @@ static struct ib_qp *iwch_create_qp(stru insert_mmap(ucontext, mm1); mm2->addr = uresp.doorbell & PAGE_MASK; mm2->len = PAGE_SIZE; + reserve_pages(qhp->wq.queue, wqsize * sizeof(union t3_wr)); insert_mmap(ucontext, mm2); } qhp->ibqp.qp_num = qhp->wq.qpid; From bugzilla-daemon at lists.openfabrics.org Tue Mar 13 08:18:43 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 13 Mar 2007 08:18:43 -0700 (PDT) Subject: [ofa-general] [Bug 454] New: modprobe of ib_mthca on IA64 with RHAS4U2 fail with unknown symbol ia64_max_cacheline_size Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=454 Summary: modprobe of ib_mthca on IA64 with RHAS4U2 fail with unknown symbol ia64_max_cacheline_size Product: OpenFabrics Linux Version: gen2 Platform: IA64 OS/Version: RHEL 4 Status: NEW Severity: major Priority: P1 Component: IB Core AssignedTo: bugzilla at openib.org ReportedBy: yohadd at mellanox.co.il CC: mst at mellanox.co.il, dotanb at mellanox.co.il, amitk at mellanox.co.il, tziporet at mellanox.co.il modprobe of ib_mthca on IA64 with RHAS4U2 fail with unknown symbol ia64_max_cacheline_size. [root at sw065 gen2_devel_kernel]# /mswg/utils/bin/hostinfo Name =sw065.lab.mtl.com IP =10.4.3.65 CpuNum =2 CpuVendor = CpuModel = CpuMhz = MemSizeKb =2042464 MachType =ia64 KernelRev =2.6.9-22.EL ChipSet = Os =Red Hat Enterprise Linux AS release 4 (Nahant Update 2) IBDevsNum =0 HCA0Name =NONE HCA0Desc =NONE HCA0Type =NONE HCA0FWVer =NONE HCA0PSID =NONE HCA0GUIDS =NONE HCA0Ports =NONE HCA1Name =NONE HCA1Desc =NONE HCA1Type =NONE HCA1FWVer =NONE HCA1PSID =NONE HCA1GUIDS =NONE HCA1Ports =NONE IBStack =/usr/local/ IBStackType =gen2 IBStackVer =gen2_devel-20070312-1821 IBMPI =NONE MST_BUILD =4.3.5 IBADM_BUILD =IBADM 2.1.0, 20060720-1410 WRITE_BW =/usr/local/bin/ib_write_bw [root at sw065 gen2_devel_kernel]# cat /usr/local/BUILD_ID gen2_devel-20070312-1821 Kernel: Git: git://git.openfabrics.org/~vlad/ofed_1_2/.git commit afbf7e678d1d89c1aaf1ea4de50d1690499431bd Kernel: Git: git://git.openfabrics.org/~vlad/ofed_1_2/.git commit afbf7e678d1d89c1aaf1ea4de50d1690499431bd [root at sw065 gen2_devel_kernel]# modprobe ib_mthca FATAL: Error inserting ib_mthca (/lib/modules/2.6.9-22.EL/updates/kernel/drivers/infiniband/ib_mthca.ko): Unknown symbol in module, or unknown parameter (see dmesg) [root at sw065 gen2_devel_kernel]# demesg ib_mthca: Unknown symbol ia64_max_cacheline_size -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Tue Mar 13 08:24:17 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 13 Mar 2007 08:24:17 -0700 (PDT) Subject: [ofa-general] [Bug 454] modprobe of ib_mthca on IA64 with RHAS4U2 fail with unknown symbol ia64_max_cacheline_size In-Reply-To: Message-ID: <20070313152417.805F3E607FD@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=454 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |mst at mellanox.co.il -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Tue Mar 13 08:36:24 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 13 Mar 2007 08:36:24 -0700 (PDT) Subject: [ofa-general] [Bug 450] IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 In-Reply-To: Message-ID: <20070313153624.376F9E607FD@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=450 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sweitzen at cisco.com -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at dev.mellanox.co.il Tue Mar 13 09:04:46 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Mar 2007 18:04:46 +0200 Subject: [ofa-general] IA64 question (was Fwd: [Bug 454] New: modprobe of ib_mthca on IA64 with RHAS4U2 fail with unknown symbol ia64_max_cacheline_size) Message-ID: <20070313160446.GB16246@mellanox.co.il> > modprobe of ib_mthca on IA64 with RHAS4U2 fail with unknown symbol > ia64_max_cacheline_size. Ugh, it seems the symbol is not exported on RHEL4U2. But we don't really need to know the exact value - an upper bound would be enough. Any IA64 expert here can say what the upper bound on cache line size is on IA64? -- MST From monil at voltaire.com Tue Mar 13 09:08:17 2007 From: monil at voltaire.com (Moni Levy) Date: Tue, 13 Mar 2007 18:08:17 +0200 Subject: [ofa-general] [Bug 450] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 In-Reply-To: <20070313123715.GS2608@mellanox.co.il> References: <20070313123715.GS2608@mellanox.co.il> Message-ID: <6a122cc00703130908v2b97b85fg2816cc22e179da50@mail.gmail.com> On 3/13/07, Michael S. Tsirkin wrote: > Looks like you have some protocol errors. That might be. I will investigate further and update you. > Why are you playing with ifc mtu? We needed to do some IP forwarding tests between IB and Ethernet and wanted to have the same MTU for the eth0 and ib0 interfaces, after that we reproduced that in a peer to peer configuration. > What does tcpdump show? Don't know yet, I'll let you know tomorrow. Any other info that may help ? -- Moni > > > -- > MST From bugzilla-daemon at lists.openfabrics.org Tue Mar 13 09:08:21 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 13 Mar 2007 09:08:21 -0700 (PDT) Subject: [ofa-general] [Bug 450] IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 In-Reply-To: Message-ID: <20070313160821.2500AE60813@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=450 ------- Comment #3 from monil at voltaire.com 2007-03-13 09:08 ------- Subject: Re: [ofa-general] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 On 3/13/07, Michael S. Tsirkin wrote: > Looks like you have some protocol errors. That might be. I will investigate further and update you. > Why are you playing with ifc mtu? We needed to do some IP forwarding tests between IB and Ethernet and wanted to have the same MTU for the eth0 and ib0 interfaces, after that we reproduced that in a peer to peer configuration. > What does tcpdump show? Don't know yet, I'll let you know tomorrow. Any other info that may help ? -- Moni > > > -- > MST -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at dev.mellanox.co.il Tue Mar 13 09:18:40 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Mar 2007 18:18:40 +0200 Subject: [ofa-general] [Bug 450] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 In-Reply-To: <6a122cc00703130908v2b97b85fg2816cc22e179da50@mail.gmail.com> References: <20070313123715.GS2608@mellanox.co.il> <6a122cc00703130908v2b97b85fg2816cc22e179da50@mail.gmail.com> Message-ID: <20070313161840.GD16246@mellanox.co.il> > >Why are you playing with ifc mtu? > > We needed to do some IP forwarding tests between IB and Ethernet and > wanted to have the same MTU for the eth0 and ib0 interfaces, after > that we reproduced that in a peer to peer configuration. OK but PMTU discovery would do this better, won't it? As a side note, with 1.5K MTU it's probably better to use datagram mode anyway. -- MST From halr at voltaire.com Tue Mar 13 10:18:11 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Mar 2007 12:18:11 -0500 Subject: [ofa-general] IsSMdisabled and user_mad.c Message-ID: <1173806290.5995.98573.camel@hal.voltaire.com> Roland, Sorry if you get this twice. My mailer is doing some funny things... Currently user_mad.c does not currently support the IsSMdisabled capability mask bit in PortInfo attribute. I propose adding support for a per port issmdisabled similar to issm in user_mad.c. I also think an API change may not be necessary as applications can deal with the lack of this file gracefully. If this sounds acceptable, I will work on a patch for this. Thanks. -- Hal From bugzilla-daemon at lists.openfabrics.org Tue Mar 13 09:18:06 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 13 Mar 2007 09:18:06 -0700 (PDT) Subject: [ofa-general] [Bug 450] IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 In-Reply-To: Message-ID: <20070313161806.AC024E6080B@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=450 ------- Comment #4 from mst at mellanox.co.il 2007-03-13 09:18 ------- Subject: Re: [ofa-general] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 > >Why are you playing with ifc mtu? > > We needed to do some IP forwarding tests between IB and Ethernet and > wanted to have the same MTU for the eth0 and ib0 interfaces, after > that we reproduced that in a peer to peer configuration. OK but PMTU discovery would do this better, won't it? As a side note, with 1.5K MTU it's probably better to use datagram mode anyway. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From xma at us.ibm.com Tue Mar 13 09:37:58 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 13 Mar 2007 09:37:58 -0700 Subject: [ofa-general] Re: [openib-general] IPOIB NAPI In-Reply-To: Message-ID: Hello Roland, I would like to continue this discussion. We have a severe customer issue on our HS21. A simple netperf TCP_STREAM test shows that 40-50% UDP receiving errors for 5300MB/s throughput no matter how big the UDP buffer size is. We have seen heavy interrupt rates on two CPUs. After we bind netserver to less busy CPU, the receiving errors are totally gone, we are able to gain around 680MB/s UDP throughput. But that doesn't work for customer's UDP application, now their UDP applications are not performing well in this cluster. I thought Mellanox does IRQ affinity, why it used 2 CPUs for one link here? I do think we need the missed event patch in upper stream and modify the UPLs to use event mode to reduce interrupt rates independent of NAPI. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Tue Mar 13 09:53:52 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 13 Mar 2007 18:53:52 +0200 Subject: [ofa-general] OFED 1.2 Mar-12 meeting summary on beta readiness Message-ID: <45F6D720.809@mellanox.co.il> This is the OFED 1.2 Mar-12 meeting summary on beta readiness: Minutes / summary: * Decided that code is ready for Beta now: o PPC issues will be solved for RC1 o IPoIB CM issues will be solved for RC1 * RC1 due date is 29-Mar o In the next meeting we will review status toward RC1 * Kernel code re-base to 2.6.21: Decided not to change the kernel base code o Reason was the risk that is can insert and delaying the release. Note: There was a further discussion on the mailing list regarding the change to 2.6.21 - especially Sean wanted this so maybe we should reconsider this decision and decide on the mail. Action Items: * Scott to test madaye * Vlad & Tziporet to publish beta release * All - continue to fix bugs according to priority * Tziporet - to check status of IPoIB patches to support the bonding module Discussion: Talked about subjects to talk in the coming Sonoma workshop; everybody is welcome to send topics for the developers session. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Tue Mar 13 10:20:58 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 13 Mar 2007 10:20:58 -0700 Subject: [ofa-general] Re: OFED 1.2 Mar-12 meeting summary on beta readiness In-Reply-To: <45F6D720.809@mellanox.co.il> References: <45F6D720.809@mellanox.co.il> Message-ID: <45F6DD7A.2000903@ichips.intel.com> > Note: There was a further discussion on the mailing list regarding the > change to 2.6.21 - especially Sean wanted this so maybe we should > reconsider this decision and decide on the mail. To be clear, I do not disagree with the decision. I haven't looked at the differences between 2.6.20/OFED code base and 2.6.21-rc3 to understand the risk. I only needed to know the decision in order to determine if I needed to backport a fix in rc3 into OFED, which is what I'm doing. - Sean From bugzilla-daemon at lists.openfabrics.org Tue Mar 13 12:00:35 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Tue, 13 Mar 2007 12:00:35 -0700 (PDT) Subject: [ofa-general] [Bug 447] ib_ipoib kernel 2.6.9-34 panic when routing to 10G ethernet In-Reply-To: Message-ID: <20070313190035.CDA2FE607FD@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=447 ------- Comment #2 from DarylGrunau at gmail.com 2007-03-13 12:00 ------- last night I discovered an inconsistency in the IB stack running on the back-end of our machines where the majority of the kernel modules getting loaded were the stock RHEL4u3 and a few (trying to load) were out of ofed 1.1. After reconciling the problem by loading the proper GridStack everywhere I was able to mount up Panasas last evening and both run a simple job over the IB fabric as well as a small-sized Panasas stress test. No I/O node panic'd - usu. overnight was enough to at least get one. In light of this, I would like to simply keep the case open for a week or so to monitor progress but no need to investigate further if we're panic free. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From swise at opengridcomputing.com Tue Mar 13 12:25:07 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 13 Mar 2007 14:25:07 -0500 Subject: [ofa-general] mapping kernel memory to userspace in <= 2.6.14 In-Reply-To: <1173794903.32342.17.camel@stevo-desktop> References: <1173794903.32342.17.camel@stevo-desktop> Message-ID: <1173813907.32342.44.camel@stevo-desktop> On Tue, 2007-03-13 at 09:08 -0500, Steve Wise wrote: > Hey Roland, > > Remember my little ofed 1.2 bug where the chelsio WQ and CQ memory > aren't getting mapped to userspace correctly for RHEL4U4? Well through > experimentation I've shown that it fails all the way through 2.6.14 and > works fine with 2.6.15 and beyond. > > Perusing the mm/memory.c log from v2.6.14..v2.6.15 shows lots of > changes. Many of the comments talk about no longer needing to set the > reserved bit on page entries. On a hunch I hacked in calling > SetPageReserved() on each page entry for the memory allocated via > dma_alloc_coherent(). And BOOM...things start working. Below is a > patch to show the hack. > > Does this make sense to you? I'm not a VM expert. > > Also, looking at other 2.6.14 drivers, I see that many of them seem to > do this same trick apparently for making sure the pages aren't swapped. > However, they also clear the bit before freeing the memory, which makes > sense. My original hack had a ClearPageReserved() loop before freeing > my dma coherent memory, but I got crashes on process exit where the map > count on pages wasn't correct (the refcnt went to -1 apparently). It hit > the BUG_ON() in page_remove_rmap() at line 487 in mm/rmap.c (2.6.14.7 > kernel). > > So I'm thinking this hack isn't quite correct. Got and ideas? > I figured out why I was hitting the BUG_ON() in page_remove_rmap(). Its because my library was destroying the QP or CQ object _before_ unmaping the objects. I changed it to unmap first, and things work as expected. So my conclusion is that SetPageReserved() is needed to map kernel memory into userspace for kernels older than 2.6.15. My guess is ehca and ipath have similar issues on these older kernels. Steve. From mst at dev.mellanox.co.il Tue Mar 13 13:02:39 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Mar 2007 22:02:39 +0200 Subject: [ofa-general] Re: mapping kernel memory to userspace in <= 2.6.14 In-Reply-To: <1173813907.32342.44.camel@stevo-desktop> References: <1173794903.32342.17.camel@stevo-desktop> <1173813907.32342.44.camel@stevo-desktop> Message-ID: <20070313200239.GA23285@mellanox.co.il> > Quoting Steve Wise : > I figured out why I was hitting the BUG_ON() in page_remove_rmap(). Its > because my library was destroying the QP or CQ object _before_ unmaping > the objects. I changed it to unmap first, and things work as expected. I guess kernel side needs to be fixed then to prevent a buggy userspace from crashing the kernel. -- MST From bos at pathscale.com Tue Mar 13 13:03:36 2007 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 13 Mar 2007 13:03:36 -0700 Subject: [ofa-general] mapping kernel memory to userspace in <= 2.6.14 In-Reply-To: <1173813907.32342.44.camel@stevo-desktop> References: <1173794903.32342.17.camel@stevo-desktop> <1173813907.32342.44.camel@stevo-desktop> Message-ID: <45F70398.1000605@pathscale.com> Steve Wise wrote: > My guess is ehca and ipath have similar issues on these older kernels. No. We worked through all of that a long, long time ago. References: <1173794903.32342.17.camel@stevo-desktop> <1173813907.32342.44.camel@stevo-desktop> <45F70398.1000605@pathscale.com> Message-ID: <1173816792.6991.6.camel@stevo-desktop> On Tue, 2007-03-13 at 13:03 -0700, Bryan O'Sullivan wrote: > Steve Wise wrote: > > > My guess is ehca and ipath have similar issues on these older kernels. > > No. We worked through all of that a long, long time ago. > > References: <1173794903.32342.17.camel@stevo-desktop> <1173813907.32342.44.camel@stevo-desktop> <45F70398.1000605@pathscale.com> <1173816792.6991.6.camel@stevo-desktop> Message-ID: <1173816943.6991.9.camel@stevo-desktop> On Tue, 2007-03-13 at 15:13 -0500, Steve Wise wrote: > On Tue, 2007-03-13 at 13:03 -0700, Bryan O'Sullivan wrote: > > Steve Wise wrote: > > > > > My guess is ehca and ipath have similar issues on these older kernels. > > > > No. We worked through all of that a long, long time ago. > > > > > Care to explain how? I see in ipath_mmap_mem() you call > remap_pfn_range() to map memory that was allocated via > dma_alloc_coherent(). > > What I've found is that this isn't sufficient. You need to also reserve > the pages of that memory for kernels < 2.6.15. > > Maybe I'm missing where you do this? > > Maybe you don't really do kernel bypass like cxgb3 is doing? > > Never mind. I see the patches in kernel_patches/backports. Wish I would have seen this weeks ago :-\ From robert.j.woodruff at intel.com Tue Mar 13 13:46:37 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Tue, 13 Mar 2007 13:46:37 -0700 Subject: FW: [promoters] [Fwd: [ofa-general] OFA web page needs updating] Message-ID: Sure, if people have suggestions about the website, I can collect them and or send them to Jeff for implementation, or perhaps send them to me and Jeff too. woody -----Original Message----- From: Thad Omura [mailto:Thad at Mellanox.com] Sent: Tuesday, March 13, 2007 10:51 AM To: Woodruff, Robert J Cc: Tziporet Koren; Michael S. Tsirkin; Jeff at SplitRockPR.com; Ryan, Jim; Jeff Squyres (jsquyres) Subject: FW: [promoters] [Fwd: [ofa-general] OFA web page needs updating] Bob, Can you please gather all of the changes required for the OFA web site after getting inputs from the developers and then work directly with Jeff Scott from Split Rock (jeff at splitrockpr.com) to make sure they all happen? Below, Michael Tsirkin has expressed his thoughts and proposed changes he'd like implemented quickly - if there are others, lets get them all together and implemented ASAP. THAD Thad Omura | VP of Product Marketing | Mellanox Technologies, Inc. thad at mellanox.com | Tel 408-916-0020 | Cell 408-750-6236 -----Original Message----- From: Michael S. Tsirkin Sent: Sunday, March 11, 2007 9:46 PM To: Jeffrey Scott Cc: Thad Omura; Tziporet Koren; Sean Hefty; Jeff Squyres (jsquyres) Subject: Re: [promoters] [Fwd: [ofa-general] OFA web page needs updating] > Michael S. Tsirkin wrote: > > >The developers resources section is still significantly out of date. > >And there's no reason I see to have everyone click-through an extra page > >to get to the actual info, which is in the wiki. > > > >Can > >http://git.openfabrics.org/resources.htm > >be removed, and the link changed to point at the wiki directly? > > > >Then we can fix it. > > Quoting Jeffrey Scott : > Subject: Re: [promoters] [Fwd: [ofa-general] OFA web page needs updating] > > Michael- > Please clarify. Are you saying that you want us to eliminate the entire > "Developer Resources" page? You believe that we should simply have one > link to the wiki? Yes. > If we eliminate the "Developer Resources" page, would developers sill > have access to the gen2 code, subversion repository, and Bugzilla? Let's just put these on the front page in wiki. > And > is there any concern about deleting the Subversion Acceptable Use > Policy, which is included on the Developer Resources page? That one is broken and needs an update. -- MST From bos at pathscale.com Tue Mar 13 13:53:19 2007 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 13 Mar 2007 13:53:19 -0700 Subject: [ofa-general] mapping kernel memory to userspace in <= 2.6.14 In-Reply-To: <1173816943.6991.9.camel@stevo-desktop> References: <1173794903.32342.17.camel@stevo-desktop> <1173813907.32342.44.camel@stevo-desktop> <45F70398.1000605@pathscale.com> <1173816792.6991.6.camel@stevo-desktop> <1173816943.6991.9.camel@stevo-desktop> Message-ID: <45F70F3F.7040609@pathscale.com> Steve Wise wrote: > Never mind. I see the patches in kernel_patches/backports. > > Wish I would have seen this weeks ago :-\ Yep. This stuff took me weeks to figure out, too, with plenty of help from Linus and Nick Piggin. Ugh. References: <45F6D720.809@mellanox.co.il> Message-ID: <20070313211619.GC23285@mellanox.co.il> > Kernel code re-base to 2.6.21: Decided not to change the kernel base code > Reason was the risk that is can insert and delaying the release. I agree it makes sense for beta. I suggest we estimate the risk and discuss this again before RC1. -- MST From sweitzen at cisco.com Tue Mar 13 14:51:44 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 13 Mar 2007 14:51:44 -0700 Subject: [ofa-general] RE: [ewg] Re: OFED 1.2 Mar-12 meeting summary on beta readiness In-Reply-To: <20070313211619.GC23285@mellanox.co.il> References: <45F6D720.809@mellanox.co.il> <20070313211619.GC23285@mellanox.co.il> Message-ID: I personally don't want to change the kernel code base unless we also slip the release date. Scott > -----Original Message----- > From: ewg-bounces at lists.openfabrics.org > [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of > Michael S. Tsirkin > Sent: Tuesday, March 13, 2007 2:18 PM > To: Tziporet Koren > Cc: Sean Hefty; EWG; OPENIB > Subject: [ewg] Re: OFED 1.2 Mar-12 meeting summary on beta readiness > > > Kernel code re-base to 2.6.21: Decided not to change the > kernel base code > > Reason was the risk that is can insert and delaying the release. > > I agree it makes sense for beta. > I suggest we estimate the risk and discuss this again before RC1. > > -- > MST > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > From bos at pathscale.com Tue Mar 13 15:02:46 2007 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 13 Mar 2007 15:02:46 -0700 Subject: [ofa-general] Re: OFED 1.2 Mar-12 meeting summary on beta readiness In-Reply-To: <20070313211619.GC23285@mellanox.co.il> References: <45F6D720.809@mellanox.co.il> <20070313211619.GC23285@mellanox.co.il> Message-ID: <45F71F86.7090801@pathscale.com> Michael S. Tsirkin wrote: >> Kernel code re-base to 2.6.21: Decided not to change the kernel base code >> Reason was the risk that is can insert and delaying the release. > > I agree it makes sense for beta. Are you saying that we *should* rebase to 2.6.21-rc3? Not without some notion of the concrete, valuable benefits it will bring, and how thosse weigh against the pain, annoyance, and churn, we shouldn't. Unless there's a solidly compelling reason to rebase, I am quite against the idea. I'm trying to resolve an oversubscription problem of 1 server receiving streams from 4 hosts. If you connect the server with 4 cables to the switch, it should be resolved when I define the routes that should be used. But is it possible to define the routing tables in the subnet manager? Maybe just running the fattree routing module is sufficient? I also saw that it is possible to set the routes by using a file. Can someone give an example of this? Currently I'm trying a combination of partitioning and linux routing. Is this a good idea? BTW: we are only interested in SDP and IPoIB. In a previous thread I discovered that bonding/aggregation/merging is not possible in an active/active situation. So this doesn't improve our setup. greetz, Koen *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Tue Mar 13 15:39:40 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 13 Mar 2007 15:39:40 -0700 Subject: [ofa-general] [GIT PULL] OFED 1.2: CM scaling fixes Message-ID: <000001c765c0$7d3bdd70$8698070a@amr.corp.intel.com> Vlad, please pull from: git://git.openfabrics.org/~shefty/ofed_1_2.git ofed_1_2 This should add some necessary fixes to the OFED code: RDMA/ucma: avoid sending reject if backlog is full RDMA/cma: Request reversible paths only IB/cm: remove broken MRA timeout patch I compile tested on different kernel versions, but couldn't get the cross platform compile to work on different architectures (without these patches). - Sean From halr at voltaire.com Tue Mar 13 17:01:43 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Mar 2007 19:01:43 -0500 Subject: [ofa-general] oversubscription In-Reply-To: References: Message-ID: <1173830502.5995.124208.camel@hal.voltaire.com> On Tue, 2007-03-13 at 17:30, SEGERS Koen wrote: > I'm trying to resolve an oversubscription problem of 1 server > receiving streams from 4 hosts. > > If you connect the server with 4 cables to the switch, it should be > resolved when I define the routes that should be used. But is it > possible to define the routing tables in the subnet manager? Maybe > just running the fattree routing module is sufficient? I also saw that > it is possible to set the routes by using a file. Can someone give an > example of this? > > Currently I'm trying a combination of partitioning and linux routing. > Is this a good idea? > > BTW: we are only interested in SDP and IPoIB. In a previous thread I > discovered that bonding/aggregation/merging is not possible in an > active/active situation. So this doesn't improve our setup. > > greetz, > > Koen > *** Disclaimer *** > > Vlaamse Radio- en Televisieomroep > Auguste Reyerslaan 52, 1043 Brussel > > nv van publiek recht > BTW BE 0244.142.664 > RPR Brussel > http://www.vrt.be/disclaimer > > > > ______________________________________________________________________ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Tue Mar 13 17:15:18 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Mar 2007 19:15:18 -0500 Subject: [ofa-general] oversubscription In-Reply-To: References: Message-ID: <1173831317.5995.125066.camel@hal.voltaire.com> On Tue, 2007-03-13 at 17:30, SEGERS Koen wrote: > I'm trying to resolve an oversubscription problem of 1 server > receiving streams from 4 hosts. > > If you connect the server with 4 cables to the switch, it should be > resolved when I define the routes that should be used. But is it > possible to define the routing tables in the subnet manager? Depends on the SM as to whether this is supported or not. OpenSM supports the ability to do this as do some vendor SMs. > Maybe just running the fattree routing module is sufficient? I'm not sure; Yevgeny would be the best to answer this. > I also saw that it is possible to set the routes by using a file. Can > someone give an example of this? This capability is documented in the opensm man page. You can obtain this file via dump_lfts.sh script. > Currently I'm trying a combination of partitioning and linux routing. > Is this a good idea? What do you mean by Linux routing ? Is this IP routing ? -- Hal > BTW: we are only interested in SDP and IPoIB. In a previous thread I > discovered that bonding/aggregation/merging is not possible in an > active/active situation. So this doesn't improve our setup. > > greetz, > > Koen > *** Disclaimer *** > > Vlaamse Radio- en Televisieomroep > Auguste Reyerslaan 52, 1043 Brussel > > nv van publiek recht > BTW BE 0244.142.664 > RPR Brussel > http://www.vrt.be/disclaimer > > > > ______________________________________________________________________ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From greg.lindahl at qlogic.com Tue Mar 13 16:49:31 2007 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Tue, 13 Mar 2007 16:49:31 -0700 Subject: [ofa-general] oversubscription In-Reply-To: References: Message-ID: <20070313234931.GB4968@dhcp-2-200.internal.keyresearch.com> On Tue, Mar 13, 2007 at 11:30:14PM +0100, SEGERS Koen wrote: > I'm trying to resolve an oversubscription problem of 1 server receiving streams from 4 hosts. A host can only receive so fast, no matter how many IB cards you stuff into it. This is doubly so if you plan on doing something with all that data. Getting the routing tables right is the least of your worries. -- greg From arlin.r.davis at intel.com Tue Mar 13 16:58:26 2007 From: arlin.r.davis at intel.com (Arlin Davis) Date: Tue, 13 Mar 2007 16:58:26 -0700 Subject: [ofa-general] [PATCH] uDAPL dtest - add provider option, set default to OpenIB-cma Message-ID: <000001c765cb$7dd97c00$9f97070a@amr.corp.intel.com> add provider option to dtest, set default to OpenIB-cma (applied - master and ofed_1_2) Signed-off by: Arlin Davis ardavis at ichips.intel.com diff --git a/test/dtest/dtest.c b/test/dtest/dtest.c index 86b70cc..690915d 100644 --- a/test/dtest/dtest.c +++ b/test/dtest/dtest.c @@ -44,7 +44,7 @@ #include #ifndef DAPL_PROVIDER -#define DAPL_PROVIDER "OpenIB-ib0" +#define DAPL_PROVIDER "OpenIB-cma" #endif #define MAX_POLLING_CNT 50000 @@ -107,6 +107,7 @@ static DAT_VLEN registered_size_send_msg; static DAT_VADDR registered_addr_send_msg; static DAT_EP_ATTR ep_attr; char hostname[256] = {0}; +char provider[256] = DAPL_PROVIDER; /* rdma pointers */ char *rbuf = NULL; @@ -189,7 +190,7 @@ main(int argc, char **argv) DAT_RETURN ret; /* parse arguments */ - while ((c = getopt(argc, argv, "scvpb:d:B:h:")) != -1) + while ((c = getopt(argc, argv, "scvpb:d:B:h:P:")) != -1) { switch(c) { @@ -225,6 +226,9 @@ main(int argc, char **argv) server = 0; strcpy (hostname, optarg); break; + case 'P': + strcpy (provider, optarg); + break; default: print_usage(); exit(-12); @@ -232,9 +236,9 @@ main(int argc, char **argv) } if (!server) { - printf("%d Running as client\n",getpid()); fflush(stdout); + printf("%d Running as client - %s\n",getpid(),provider); fflush(stdout); } else { - printf("%d Running as server\n",getpid()); fflush(stdout); + printf("%d Running as server - %s\n",getpid(),provider); fflush(stdout); } /* allocate send and receive buffers */ @@ -250,7 +254,7 @@ main(int argc, char **argv) /* dat_ia_open, dat_pz_create */ h_async_evd = DAT_HANDLE_NULL; start = get_time(); - ret = dat_ia_open( DAPL_PROVIDER, 8, &h_async_evd, &h_ia ); + ret = dat_ia_open( provider, 8, &h_async_evd, &h_ia ); stop = get_time(); time.open += ((stop - start)*1.0e6); if(ret != DAT_SUCCESS) { @@ -1802,6 +1806,7 @@ void print_usage() printf("b: buf length to allocate\n"); printf("B: burst count, rdma and msgs \n"); printf("h: hostname\n"); + printf("P: provider (default=OpenIB-cma)\n"); printf("\n"); } From sean.hefty at intel.com Tue Mar 13 17:06:27 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 13 Mar 2007 17:06:27 -0700 Subject: [ofa-general] bug 400: ipoib error messages Message-ID: <000201c765cc$9c90fd70$8698070a@amr.corp.intel.com> {snippet from bug 400 report because I don't want to try to have a discussion on this inside a bug report...} IPoIB CM HA is working much better in OFED-1.2-20070311-0600. I have been running for a few hours flipping an IB port every 10 seconds. I do still see some junk in dmesg, let me know if I should open a new bug or reopen this bug. ib1: dev_queue_xmit failed to requeue packet ib_mthca 0000:04:00.0: QP 000404 not found in MGM ib0: ib_detach_mcast failed (result = -22) ib0: ipoib_mcast_detach failed (result = -22) ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet Scott, is this the start of the message log, or just a snapshot? Specifically, do you see ib_detach_mcast failures for ib1? Is the dev_queue_xmit the first error? - Sean From xma at us.ibm.com Tue Mar 13 19:56:23 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 13 Mar 2007 19:56:23 -0700 Subject: [ofa-general] IPoIB CM mode throughput In-Reply-To: <20070313211619.GC23285@mellanox.co.il> Message-ID: Hello Michael, In one of your email, you mentioned that IPoIB-CM 800MB/s or more throughput you got is on a Mellanox 4x back-to-back DDR system. I assume that you get the high performance for PCIe bus not PCI_X? Is that right? Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at dev.mellanox.co.il Tue Mar 13 21:38:49 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Mar 2007 06:38:49 +0200 Subject: [ofa-general] Re: IPoIB CM mode throughput In-Reply-To: References: <20070313211619.GC23285@mellanox.co.il> Message-ID: <20070314043849.GD23285@mellanox.co.il> > Quoting Shirley Ma : > Subject: IPoIB CM mode throughput > > Hello Michael, > > In one of your email, you mentioned that IPoIB-CM 800MB/s or more throughput > you got is on a Mellanox 4x back-to-back DDR system. I assume that you get the > high performance for PCIe bus not PCI_X? Is that right? I haven't tested PCI-X for performance yet. But I do expect it to give performance similiar to PCI-Ex: the system is CPU-bound, not bus-BW bound. One thing to note: since PCI-X has higher latency than PCI-E, you might need to tweak TCP window sizes to get good performance with TCP. Same applies to DDR/SDR. -- MST From mst at dev.mellanox.co.il Tue Mar 13 21:46:36 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Mar 2007 06:46:36 +0200 Subject: [ofa-general] Re: OFED 1.2 Mar-12 meeting summary on beta readiness In-Reply-To: <45F71F86.7090801@pathscale.com> References: <45F6D720.809@mellanox.co.il> <20070313211619.GC23285@mellanox.co.il> <45F71F86.7090801@pathscale.com> Message-ID: <20070314044636.GE23285@mellanox.co.il> > Quoting Bryan O'Sullivan : > Subject: Re: [ofa-general] Re: OFED 1.2 Mar-12 meeting summary on beta readiness > > Michael S. Tsirkin wrote: > >> Kernel code re-base to 2.6.21: Decided not to change the kernel base > >> code > >> Reason was the risk that is can insert and delaying the release. > > > >I agree it makes sense for beta. > > Are you saying that we *should* rebase to 2.6.21-rc3? Yes. > Not without some > notion of the concrete, valuable benefits it will bring, and how thosse > weigh against the pain, annoyance, and churn, we shouldn't. I expect very minor pain. I'll do most of the work. It's probably 1 day or so. > Unless > there's a solidly compelling reason to rebase, I am quite against the idea. The point of this is to be good citizens of upstream kernel. We currently have 3 code-bases: 2.6.20, OFED and 2.6.21. This increases support load (e.g. preparing patches for stable releases requires doing the same work 3 times, and OFED and upstream testing needs to be done 3 times). I know this does not help ipath since apparently you didn't submit your patches for upstream inclusion, but for core, chelsio, ehca and mthca this makes support much simpler as we have a single code-stream between upstreamand OFED. I will post more detail later. -- MST From mst at dev.mellanox.co.il Tue Mar 13 22:11:58 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Mar 2007 07:11:58 +0200 Subject: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <000001c765c0$7d3bdd70$8698070a@amr.corp.intel.com> References: <000001c765c0$7d3bdd70$8698070a@amr.corp.intel.com> Message-ID: <20070314051158.GA7997@mellanox.co.il> > Quoting Sean Hefty : > Subject: [GIT PULL] OFED 1.2: CM scaling fixes > > Vlad, please pull from: > > git://git.openfabrics.org/~shefty/ofed_1_2.git ofed_1_2 > > This should add some necessary fixes to the OFED code: > > RDMA/ucma: avoid sending reject if backlog is full > RDMA/cma: Request reversible paths only > IB/cm: remove broken MRA timeout patch Sean, before applying this, please discuss the MRA timeout patch on the general list with Ishai. Can you fix the patch instead of removing it? It helps him work-around bugs in his SRP target. > I compile tested on different kernel versions, but couldn't get the cross > platform compile to work on different architectures (without these patches). Please post what you did and the errors you get, and we'll try to help. -- MST From mst at dev.mellanox.co.il Tue Mar 13 22:19:43 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Mar 2007 07:19:43 +0200 Subject: [ofa-general] Re: mapping kernel memory to userspace in <= 2.6.14 In-Reply-To: <1173813907.32342.44.camel@stevo-desktop> References: <1173794903.32342.17.camel@stevo-desktop> <1173813907.32342.44.camel@stevo-desktop> Message-ID: <20070314051943.GA8249@mellanox.co.il> > I figured out why I was hitting the BUG_ON() in page_remove_rmap(). Its > because my library was destroying the QP or CQ object _before_ unmaping > the objects. I changed it to unmap first, and things work as expected. I guess kernel side also needs to be fixed, otherwise buggy userspace can trigger BUG_ON's? -- MST From xma at us.ibm.com Tue Mar 13 22:30:19 2007 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 13 Mar 2007 22:30:19 -0700 Subject: [ofa-general] Re: IPoIB CM mode throughput In-Reply-To: <20070314043849.GD23285@mellanox.co.il> Message-ID: Thanks Michael. Have you tried duplex performance test? If you have tried, have you seen 1.7GB/s with DDR? Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dotanb at dev.mellanox.co.il Tue Mar 13 23:59:51 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Wed, 14 Mar 2007 08:59:51 +0200 Subject: FW: [promoters] [Fwd: [ofa-general] OFA web page needs updating] In-Reply-To: References: Message-ID: <45F79D67.5090305@dev.mellanox.co.il> Hi Woody. Woodruff, Robert J wrote: > > Sure, if people have suggestions about the website, > I can collect them and or send them to Jeff for implementation, > or perhaps send them to me and Jeff too. > > woody > Can the mailing lists archive be added to the website? thanks Dotan From kliteyn at dev.mellanox.co.il Wed Mar 14 00:24:47 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 14 Mar 2007 09:24:47 +0200 Subject: [ofa-general] oversubscription In-Reply-To: <1173831317.5995.125066.camel@hal.voltaire.com> References: <1173831317.5995.125066.camel@hal.voltaire.com> Message-ID: <45F7A33F.3090906@dev.mellanox.co.il> Hal Rosenstock wrote: > On Tue, 2007-03-13 at 17:30, SEGERS Koen wrote: >> I'm trying to resolve an oversubscription problem of 1 server >> receiving streams from 4 hosts. >> >> If you connect the server with 4 cables to the switch, it should be >> resolved when I define the routes that should be used. But is it >> possible to define the routing tables in the subnet manager? > > Depends on the SM as to whether this is supported or not. OpenSM > supports the ability to do this as do some vendor SMs. > >> Maybe just running the fattree routing module is sufficient? > > I'm not sure; Yevgeny would be the best to answer this. What is the fabric topology? Basically, you can try using fat-tree routing, and if the topology is not a fat-tree, OpenSM will issue an error message and will use default routing. You can get more details from the osm/doc/current-routing.txt -- Yevgeny >> I also saw that it is possible to set the routes by using a file. Can >> someone give an example of this? > > This capability is documented in the opensm man page. You can obtain > this file via dump_lfts.sh script. > >> Currently I'm trying a combination of partitioning and linux routing. >> Is this a good idea? > > What do you mean by Linux routing ? Is this IP routing ? > > -- Hal > >> BTW: we are only interested in SDP and IPoIB. In a previous thread I >> discovered that bonding/aggregation/merging is not possible in an >> active/active situation. So this doesn't improve our setup. >> >> greetz, >> >> Koen >> *** Disclaimer *** >> >> Vlaamse Radio- en Televisieomroep >> Auguste Reyerslaan 52, 1043 Brussel >> >> nv van publiek recht >> BTW BE 0244.142.664 >> RPR Brussel >> http://www.vrt.be/disclaimer >> >> >> >> ______________________________________________________________________ >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at dev.mellanox.co.il Wed Mar 14 00:33:14 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Mar 2007 09:33:14 +0200 Subject: [ofa-general] [Bug 400] ipoib error messages In-Reply-To: <000201c765cc$9c90fd70$8698070a@amr.corp.intel.com> References: <000201c765cc$9c90fd70$8698070a@amr.corp.intel.com> Message-ID: <20070314073313.GC4644@mellanox.co.il> Quoting Sean Hefty > Scott, is this the start of the message log, or just a snapshot? Specifically, > do you see ib_detach_mcast failures for ib1? Is the dev_queue_xmit the first > error? Forwarding to bugzilla. BTW, this example shows how you can use mail to add data to bugzilla. -- MST From sweitzen at cisco.com Wed Mar 14 00:26:04 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 14 Mar 2007 00:26:04 -0700 Subject: [ofa-general] RE: bug 400: ipoib error messages In-Reply-To: <000201c765cc$9c90fd70$8698070a@amr.corp.intel.com> References: <000201c765cc$9c90fd70$8698070a@amr.corp.intel.com> Message-ID: Sure, let me give you more detail. I'm looping a script that does this: shutdown ib0 port of host #1 (via switch CLI) sleep 10 bring up ib0 port of host #1 sleep 10 shutdown ib1 port of host #1 sleep 10 bring up ib1 port of host #1 sleep 10 shutdown ib0 portof host #2 sleep 10 bring up ib0 port of host #12 sleep 10 shutdown ib1 port of host #2 sleep 10 bring up ib1 port of host #2 sleep 10 While this port failover script is running, I'm running netperf over IPoIB between the 2 hosts. Because of bug 455 (https://bugs.openfabrics.org/show_bug.cgi?id=455), there is output in dmesg every time there is IPoIB HA failover, so that gives a rough sense for the rate of failover. Right now I am having a hard time getting failures to happen, I'll keep trying. Here's an example of several minutes of dmesg output: ib0: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: enabling connected mode will cause multicast packet drops ib0: enabling connected mode will cause multicast packet drops ib0: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: enabling connected mode will cause multicast packet drops ib0: enabling connected mode will cause multicast packet drops ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops ib0: enabling connected mode will cause multicast packet drops ib0: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops ib0: enabling connected mode will cause multicast packet drops ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops ib0: enabling connected mode will cause multicast packet drops ib0: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops ib0: enabling connected mode will cause multicast packet drops ib0: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops ib0: enabling connected mode will cause multicast packet drops ib0: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops ib0: enabling connected mode will cause multicast packet drops ib0: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops ib1: enabling connected mode will cause multicast packet drops The IPoIB failover is very slow sometimes, shown below is netperf -D output. IPoIB failover should ideally only take a second or two. I'll be filing a bug for that. Interim result: 4355.09 10^6bits/s over 1.00 seconds Interim result: 4371.07 10^6bits/s over 1.00 seconds Interim result: 4370.95 10^6bits/s over 1.00 seconds Interim result: 162.41 10^6bits/s over 26.91 seconds Interim result: 4360.14 10^6bits/s over 1.00 seconds Interim result: 4354.94 10^6bits/s over 1.00 seconds Interim result: 4353.08 10^6bits/s over 1.00 seconds Interim result: 4343.94 10^6bits/s over 1.00 seconds Interim result: 4356.98 10^6bits/s over 1.00 seconds Interim result: 4357.00 10^6bits/s over 1.00 seconds Interim result: 1735.68 10^6bits/s over 2.51 seconds Interim result: 4357.86 10^6bits/s over 1.00 seconds Interim result: 4358.63 10^6bits/s over 1.00 seconds Interim result: 4352.05 10^6bits/s over 1.00 seconds Interim result: 4355.14 10^6bits/s over 1.00 seconds Interim result: 4350.74 10^6bits/s over 1.00 seconds Interim result: 4363.25 10^6bits/s over 1.00 seconds Interim result: 41.46 10^6bits/s over 105.24 seconds Interim result: 297.83 10^6bits/s over 14.50 seconds Interim result: 4332.43 10^6bits/s over 1.00 seconds Interim result: 4345.48 10^6bits/s over 1.00 seconds Interim result: 4365.19 10^6bits/s over 1.00 seconds Interim result: 4354.96 10^6bits/s over 1.00 seconds Interim result: 4346.54 10^6bits/s over 1.00 seconds Interim result: 4339.78 10^6bits/s over 1.00 seconds Interim result: 1730.77 10^6bits/s over 2.51 seconds Interim result: 4346.55 10^6bits/s over 1.00 seconds Interim result: 4358.37 10^6bits/s over 1.00 seconds Interim result: 4357.15 10^6bits/s over 1.00 seconds Interim result: 4362.43 10^6bits/s over 1.00 seconds Interim result: 4342.37 10^6bits/s over 1.00 seconds Interim result: 4339.25 10^6bits/s over 1.00 seconds Interim result: 4337.89 10^6bits/s over 1.00 seconds Interim result: 4328.02 10^6bits/s over 1.00 seconds Interim result: 4352.09 10^6bits/s over 1.00 seconds Interim result: 4344.81 10^6bits/s over 1.00 seconds Interim result: 4354.92 10^6bits/s over 1.00 seconds Interim result: 4354.71 10^6bits/s over 1.00 seconds Interim result: 1732.11 10^6bits/s over 2.51 seconds Interim result: 4334.02 10^6bits/s over 1.00 seconds Interim result: 4340.94 10^6bits/s over 1.00 seconds Scott > -----Original Message----- > From: Sean Hefty [mailto:sean.hefty at intel.com] > Sent: Tuesday, March 13, 2007 5:06 PM > To: Scott Weitzenkamp (sweitzen); general at lists.openfabrics.org > Subject: bug 400: ipoib error messages > > {snippet from bug 400 report because I don't want to try to > have a discussion on > this inside a bug report...} > > IPoIB CM HA is working much better in OFED-1.2-20070311-0600. > I have been > running for a few hours flipping an IB port every 10 seconds. > > I do still see some junk in dmesg, let me know if I should > open a new bug or > reopen this bug. > > ib1: dev_queue_xmit failed to requeue packet > ib_mthca 0000:04:00.0: QP 000404 not found in MGM > ib0: ib_detach_mcast failed (result = -22) > ib0: ipoib_mcast_detach failed (result = -22) > ib1: dev_queue_xmit failed to requeue packet > ib1: dev_queue_xmit failed to requeue packet > ib1: dev_queue_xmit failed to requeue packet > ib0: dev_queue_xmit failed to requeue packet > ib0: dev_queue_xmit failed to requeue packet > ib0: dev_queue_xmit failed to requeue packet > ib0: dev_queue_xmit failed to requeue packet > ib0: dev_queue_xmit failed to requeue packet > ib0: dev_queue_xmit failed to requeue packet > > Scott, is this the start of the message log, or just a > snapshot? Specifically, > do you see ib_detach_mcast failures for ib1? Is the > dev_queue_xmit the first > error? > > - Sean > From mst at dev.mellanox.co.il Wed Mar 14 00:49:14 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Mar 2007 09:49:14 +0200 Subject: [ofa-general] Re: bug 400: ipoib error messages In-Reply-To: References: <000201c765cc$9c90fd70$8698070a@amr.corp.intel.com> Message-ID: <20070314074914.GE4644@mellanox.co.il> Quoting Scott Weitzenkamp (sweitzen) : > The IPoIB failover is very slow sometimes, shown below is netperf -D > output. IPoIB failover should ideally only take a second or two. I'll > be filing a bug for that. > > Interim result: 4355.09 10^6bits/s over 1.00 seconds > Interim result: 4371.07 10^6bits/s over 1.00 seconds > Interim result: 4370.95 10^6bits/s over 1.00 seconds > Interim result: 162.41 10^6bits/s over 26.91 seconds Scott, what is "Interim result"? You are using SM on switch, are you not? Are you sure the delays are not due to SM? For failover to take place, the following needs to happen: 1. link down and notification - triggered by SM MAD 2. interface down 3. interface up (includes registration with SA) 2 out of 3 steps involve SM/SA Since ipoib ha is just a perl script, it should be easy for you to add logging there so you can figure out where's the delay -- MST From monil at voltaire.com Wed Mar 14 02:24:28 2007 From: monil at voltaire.com (Moni Levy) Date: Wed, 14 Mar 2007 11:24:28 +0200 Subject: [ofa-general] [Bug 450] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 In-Reply-To: <20070313161840.GD16246@mellanox.co.il> References: <20070313123715.GS2608@mellanox.co.il> <6a122cc00703130908v2b97b85fg2816cc22e179da50@mail.gmail.com> <20070313161840.GD16246@mellanox.co.il> Message-ID: <6a122cc00703140224w201a31d4v70d9ec360b4bde7@mail.gmail.com> On 3/13/07, Michael S. Tsirkin wrote: > > >Why are you playing with ifc mtu? > > > > We needed to do some IP forwarding tests between IB and Ethernet and > > wanted to have the same MTU for the eth0 and ib0 interfaces, after > > that we reproduced that in a peer to peer configuration. > > OK but PMTU discovery would do this better, won't it? I'm not sure (it will probably drop some packets at the beginning). Anyway lower MTU should work. > As a side note, with 1.5K MTU it's probably better to use datagram mode > anyway. Now I see that I missed that in the report. We used datagram mode. -- Moni > > -- > MST > From bugzilla-daemon at lists.openfabrics.org Wed Mar 14 02:30:41 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Mar 2007 02:30:41 -0700 (PDT) Subject: [ofa-general] [Bug 450] IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 In-Reply-To: Message-ID: <20070314093044.5D361E6080B@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=450 ------- Comment #5 from monil at voltaire.com 2007-03-14 02:30 ------- Subject: Re: [ofa-general] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 On 3/13/07, Michael S. Tsirkin wrote: > > >Why are you playing with ifc mtu? > > > > We needed to do some IP forwarding tests between IB and Ethernet and > > wanted to have the same MTU for the eth0 and ib0 interfaces, after > > that we reproduced that in a peer to peer configuration. > > OK but PMTU discovery would do this better, won't it? I'm not sure (it will probably drop some packets at the beginning). Anyway lower MTU should work. > As a side note, with 1.5K MTU it's probably better to use datagram mode > anyway. Now I see that I missed that in the report. We used datagram mode. -- Moni > > -- > MST > -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From vlad at lists.openfabrics.org Wed Mar 14 02:38:41 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Wed, 14 Mar 2007 02:38:41 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070314-0200 daily build status Message-ID: <20070314093841.6CBE0E6080B@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.14 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.12 Passed on powerpc with linux-2.6.17 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.14 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Failed: From mst at dev.mellanox.co.il Wed Mar 14 02:55:05 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Mar 2007 11:55:05 +0200 Subject: [ofa-general] [Bug 450] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 In-Reply-To: <6a122cc00703140224w201a31d4v70d9ec360b4bde7@mail.gmail.com> References: <20070313123715.GS2608@mellanox.co.il> <6a122cc00703130908v2b97b85fg2816cc22e179da50@mail.gmail.com> <20070313161840.GD16246@mellanox.co.il> <6a122cc00703140224w201a31d4v70d9ec360b4bde7@mail.gmail.com> Message-ID: <20070314095505.GA9721@mellanox.co.il> > Quoting Moni Levy : > Subject: Re: [ofa-general] [Bug 450] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 > > On 3/13/07, Michael S. Tsirkin wrote: > >> >Why are you playing with ifc mtu? > >> > >> We needed to do some IP forwarding tests between IB and Ethernet and > >> wanted to have the same MTU for the eth0 and ib0 interfaces, after > >> that we reproduced that in a peer to peer configuration. > > > >OK but PMTU discovery would do this better, won't it? > > I'm not sure (it will probably drop some packets at the beginning). Can you check pls? > Anyway lower MTU should work. Manually tweaking MTU seems to be broken in lots of systems - I sometimes see the same behaviour with gigabit ethernet. > >As a side note, with 1.5K MTU it's probably better to use datagram mode > >anyway. > > Now I see that I missed that in the report. We used datagram mode. Did you try checking ethernet BW on the same machine? -- MST From bugzilla-daemon at lists.openfabrics.org Wed Mar 14 02:54:34 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Mar 2007 02:54:34 -0700 (PDT) Subject: [ofa-general] [Bug 450] IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 In-Reply-To: Message-ID: <20070314095434.9F8EDE60811@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=450 ------- Comment #6 from mst at dev.mellanox.co.il 2007-03-14 02:54 ------- Subject: Re: [ofa-general] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 > Quoting Moni Levy : > Subject: Re: [ofa-general] [Bug 450] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 > > On 3/13/07, Michael S. Tsirkin wrote: > >> >Why are you playing with ifc mtu? > >> > >> We needed to do some IP forwarding tests between IB and Ethernet and > >> wanted to have the same MTU for the eth0 and ib0 interfaces, after > >> that we reproduced that in a peer to peer configuration. > > > >OK but PMTU discovery would do this better, won't it? > > I'm not sure (it will probably drop some packets at the beginning). Can you check pls? > Anyway lower MTU should work. Manually tweaking MTU seems to be broken in lots of systems - I sometimes see the same behaviour with gigabit ethernet. > >As a side note, with 1.5K MTU it's probably better to use datagram mode > >anyway. > > Now I see that I missed that in the report. We used datagram mode. Did you try checking ethernet BW on the same machine? -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From halr at voltaire.com Wed Mar 14 04:59:08 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Mar 2007 06:59:08 -0500 Subject: [ofa-general] oversubscription In-Reply-To: <45F7A33F.3090906@dev.mellanox.co.il> References: <1173831317.5995.125066.camel@hal.voltaire.com> <45F7A33F.3090906@dev.mellanox.co.il> Message-ID: <1173873543.5995.169289.camel@hal.voltaire.com> On Wed, 2007-03-14 at 02:24, Yevgeny Kliteynik wrote: > Hal Rosenstock wrote: > > On Tue, 2007-03-13 at 17:30, SEGERS Koen wrote: > >> I'm trying to resolve an oversubscription problem of 1 server > >> receiving streams from 4 hosts. > >> > >> If you connect the server with 4 cables to the switch, it should be > >> resolved when I define the routes that should be used. But is it > >> possible to define the routing tables in the subnet manager? > > > > Depends on the SM as to whether this is supported or not. OpenSM > > supports the ability to do this as do some vendor SMs. > > > >> Maybe just running the fattree routing module is sufficient? > > > > I'm not sure; Yevgeny would be the best to answer this. > > What is the fabric topology? > > Basically, you can try using fat-tree routing, and if the topology > is not a fat-tree, OpenSM will issue an error message and will use > default routing. > You can get more details from the osm/doc/current-routing.txt Same info on fat tree is in the opensm man page. -- Hal > -- Yevgeny > > >> I also saw that it is possible to set the routes by using a file. Can > >> someone give an example of this? > > > > This capability is documented in the opensm man page. You can obtain > > this file via dump_lfts.sh script. > > > >> Currently I'm trying a combination of partitioning and linux routing. > >> Is this a good idea? > > > > What do you mean by Linux routing ? Is this IP routing ? > > > > -- Hal > > > >> BTW: we are only interested in SDP and IPoIB. In a previous thread I > >> discovered that bonding/aggregation/merging is not possible in an > >> active/active situation. So this doesn't improve our setup. > >> > >> greetz, > >> > >> Koen > >> *** Disclaimer *** > >> > >> Vlaamse Radio- en Televisieomroep > >> Auguste Reyerslaan 52, 1043 Brussel > >> > >> nv van publiek recht > >> BTW BE 0244.142.664 > >> RPR Brussel > >> http://www.vrt.be/disclaimer > >> > >> > >> > >> ______________________________________________________________________ > >> > >> _______________________________________________ > >> general mailing list > >> general at lists.openfabrics.org > >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >> > >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From dotanb at dev.mellanox.co.il Wed Mar 14 04:25:32 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Wed, 14 Mar 2007 13:25:32 +0200 Subject: [ofa-general] [PATCH - diags] Added support to query the GUID table entries Message-ID: <1173871533.7651.0.camel@mtldesk014.lab.mtl.com> Added support to query the GUID table entries. Signed-off-by: Dotan Barak --- Index: gen2_devel_user/src/userspace/management/diags/src/smpquery.c =================================================================== --- gen2_devel_user.orig/src/userspace/management/diags/src/smpquery.c 2007-03-12 18:05:09.000000000 +0200 +++ gen2_devel_user/src/userspace/management/diags/src/smpquery.c 2007-03-12 18:09:52.000000000 +0200 @@ -44,6 +44,9 @@ #include #include +#define __STDC_FORMAT_MACROS +#include + #define __BUILD_VERSION_TAG__ 1.2 #include #include @@ -65,7 +68,7 @@ typedef struct match_rec { } match_rec_t; static op_fn_t node_desc, node_info, port_info, switch_info, pkey_table, - sl2vl_table, vlarb_table; + sl2vl_table, vlarb_table, guid_info; static const match_rec_t match_tbl[] = { { "nodeinfo", node_info }, @@ -75,6 +78,7 @@ static const match_rec_t match_tbl[] = { { "pkeys", pkey_table, 1 }, { "sl2vl", sl2vl_table, 1 }, { "vlarb", vlarb_table, 1 }, + { "guid", guid_info, 1 }, {0} }; @@ -341,6 +345,51 @@ vlarb_table(ib_portid_t *dest, char **ar return ret; } +static char * +guid_info(ib_portid_t *dest, char **argv, int argc) +{ + uint8_t data[IB_SMP_DATA_SIZE]; + uint32_t i, j, k; + uint64_t *p; + uint mod; + int n, phy_ports; + int portnum = 0; + + if (argc > 0) + portnum = strtol(argv[0], 0, 0); + + /* Get the guid capacity */ + if (!smp_query(data, dest, IB_ATTR_NODE_INFO, 0, 0)) + return "node info query failed"; + + mad_decode_field(data, IB_NODE_NPORTS_F, &phy_ports); + if (portnum > phy_ports) + return "invalid port number"; + + if (!smp_query(data, dest, IB_ATTR_PORT_INFO, 0, 0)) + return "port info failed"; + mad_decode_field(data, IB_PORT_GUID_CAP_F, &n); + + for (i = 0; i < (n + 7) / 8; i++) { + mod = i; + if (!smp_query(data, dest, IB_ATTR_GUID_INFO, mod, 0)) + return "guid info query failed"; + if (i + 1 == (n + 7) / 8) + k = ((n + 1 - i * 8) / 2) * 2; + else + k = 8; + p = (uint64_t *) data; + for (j = 0; j < k; j += 2, p += 2) { + printf("%4u: 0x%016"PRIx64" 0x%016"PRIx64"\n", + (i * 8) + j, + ntohll(p[0]), ntohll(p[1])); + } + } + printf("%d guids capacity for this port\n", n); + + return 0; +} + static op_fn_t * match_op(char *name) { Index: gen2_devel_user/src/userspace/management/libibmad/include/infiniband/mad.h =================================================================== --- gen2_devel_user.orig/src/userspace/management/libibmad/include/infiniband/mad.h 2007-02-26 16:01:55.000000000 +0200 +++ gen2_devel_user/src/userspace/management/libibmad/include/infiniband/mad.h 2007-03-12 18:12:11.000000000 +0200 @@ -118,6 +118,7 @@ enum SMI_ATTR_ID { IB_ATTR_NODE_DESC = 0x10, IB_ATTR_NODE_INFO = 0x11, IB_ATTR_SWITCH_INFO = 0x12, + IB_ATTR_GUID_INFO = 0x14, IB_ATTR_PORT_INFO = 0x15, IB_ATTR_PKEY_TBL = 0x16, IB_ATTR_SLVL_TABLE = 0x17, From halr at voltaire.com Wed Mar 14 06:45:38 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Mar 2007 08:45:38 -0500 Subject: [ofa-general] Re: [PATCH - diags] Added support to query the GUID table entries In-Reply-To: <1173871533.7651.0.camel@mtldesk014.lab.mtl.com> References: <1173871533.7651.0.camel@mtldesk014.lab.mtl.com> Message-ID: <1173879936.5995.175892.camel@hal.voltaire.com> Hi Dotan, On Wed, 2007-03-14 at 06:25, Dotan Barak wrote: > Added support to query the GUID table entries. Thanks. One comment below: > Signed-off-by: Dotan Barak > > --- > > Index: gen2_devel_user/src/userspace/management/diags/src/smpquery.c > =================================================================== > --- gen2_devel_user.orig/src/userspace/management/diags/src/smpquery.c 2007-03-12 18:05:09.000000000 +0200 > +++ gen2_devel_user/src/userspace/management/diags/src/smpquery.c 2007-03-12 18:09:52.000000000 +0200 > @@ -44,6 +44,9 @@ > #include > #include > > +#define __STDC_FORMAT_MACROS > +#include > + > #define __BUILD_VERSION_TAG__ 1.2 > #include > #include > @@ -65,7 +68,7 @@ typedef struct match_rec { > } match_rec_t; > > static op_fn_t node_desc, node_info, port_info, switch_info, pkey_table, > - sl2vl_table, vlarb_table; > + sl2vl_table, vlarb_table, guid_info; > > static const match_rec_t match_tbl[] = { > { "nodeinfo", node_info }, > @@ -75,6 +78,7 @@ static const match_rec_t match_tbl[] = { > { "pkeys", pkey_table, 1 }, > { "sl2vl", sl2vl_table, 1 }, > { "vlarb", vlarb_table, 1 }, > + { "guid", guid_info, 1 }, > {0} > }; > > @@ -341,6 +345,51 @@ vlarb_table(ib_portid_t *dest, char **ar > return ret; > } > > +static char * > +guid_info(ib_portid_t *dest, char **argv, int argc) > +{ > + uint8_t data[IB_SMP_DATA_SIZE]; > + uint32_t i, j, k; > + uint64_t *p; > + uint mod; > + int n, phy_ports; > + int portnum = 0; > + > + if (argc > 0) > + portnum = strtol(argv[0], 0, 0); > + > + /* Get the guid capacity */ > + if (!smp_query(data, dest, IB_ATTR_NODE_INFO, 0, 0)) > + return "node info query failed"; > + > + mad_decode_field(data, IB_NODE_NPORTS_F, &phy_ports); > + if (portnum > phy_ports) > + return "invalid port number"; > + if (!smp_query(data, dest, IB_ATTR_PORT_INFO, 0, 0)) > + return "port info failed"; Port number does not affect GUIDInfo and in fact the query for PortInfo here is using port 0 so portnum is not needed, right ? -- Hal > + mad_decode_field(data, IB_PORT_GUID_CAP_F, &n); > + > + for (i = 0; i < (n + 7) / 8; i++) { > + mod = i; > + if (!smp_query(data, dest, IB_ATTR_GUID_INFO, mod, 0)) > + return "guid info query failed"; > + if (i + 1 == (n + 7) / 8) > + k = ((n + 1 - i * 8) / 2) * 2; > + else > + k = 8; > + p = (uint64_t *) data; > + for (j = 0; j < k; j += 2, p += 2) { > + printf("%4u: 0x%016"PRIx64" 0x%016"PRIx64"\n", > + (i * 8) + j, > + ntohll(p[0]), ntohll(p[1])); > + } > + } > + printf("%d guids capacity for this port\n", n); > + > + return 0; > +} > + > static op_fn_t * > match_op(char *name) > { > Index: gen2_devel_user/src/userspace/management/libibmad/include/infiniband/mad.h > =================================================================== > --- gen2_devel_user.orig/src/userspace/management/libibmad/include/infiniband/mad.h 2007-02-26 16:01:55.000000000 +0200 > +++ gen2_devel_user/src/userspace/management/libibmad/include/infiniband/mad.h 2007-03-12 18:12:11.000000000 +0200 > @@ -118,6 +118,7 @@ enum SMI_ATTR_ID { > IB_ATTR_NODE_DESC = 0x10, > IB_ATTR_NODE_INFO = 0x11, > IB_ATTR_SWITCH_INFO = 0x12, > + IB_ATTR_GUID_INFO = 0x14, > IB_ATTR_PORT_INFO = 0x15, > IB_ATTR_PKEY_TBL = 0x16, > IB_ATTR_SLVL_TABLE = 0x17, > > From dotanb at dev.mellanox.co.il Wed Mar 14 05:58:04 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Wed, 14 Mar 2007 14:58:04 +0200 Subject: [ofa-general] Re: [PATCH - diags] Added support to query the GUID table entries In-Reply-To: <1173879936.5995.175892.camel@hal.voltaire.com> References: <1173871533.7651.0.camel@mtldesk014.lab.mtl.com> <1173879936.5995.175892.camel@hal.voltaire.com> Message-ID: <45F7F15C.1060503@dev.mellanox.co.il> Hal Rosenstock wrote: > Port number does not affect GUIDInfo and in fact the query for PortInfo > here is using port 0 so portnum is not needed, right ? > > -- Hal > You are absolutely right: the port value is meaningless, here is a snip from the IB spec: "The attribute selected corresponds to the port that received the SMP." thanks Dotan From bugzilla-daemon at lists.openfabrics.org Wed Mar 14 06:04:07 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Mar 2007 06:04:07 -0700 (PDT) Subject: [ofa-general] [Bug 163] ibv_ack_async_event seg-fault when requested event is SRQ limit In-Reply-To: Message-ID: <20070314130407.AC3FEE607FD@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=163 mst at dev.mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from mst at dev.mellanox.co.il 2007-03-14 06:04 ------- probably fixed by now. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From swise at opengridcomputing.com Wed Mar 14 07:05:11 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 14 Mar 2007 09:05:11 -0500 Subject: [ofa-general] Re: mapping kernel memory to userspace in <= 2.6.14 In-Reply-To: <20070314051943.GA8249@mellanox.co.il> References: <1173794903.32342.17.camel@stevo-desktop> <1173813907.32342.44.camel@stevo-desktop> <20070314051943.GA8249@mellanox.co.il> Message-ID: <1173881111.29201.1.camel@stevo-desktop> On Wed, 2007-03-14 at 07:19 +0200, Michael S. Tsirkin wrote: > > I figured out why I was hitting the BUG_ON() in page_remove_rmap(). Its > > because my library was destroying the QP or CQ object _before_ unmaping > > the objects. I changed it to unmap first, and things work as expected. > > I guess kernel side also needs to be fixed, otherwise buggy userspace > can trigger BUG_ON's? > a buggy provider library. From tziporet at mellanox.co.il Wed Mar 14 07:12:52 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 14 Mar 2007 16:12:52 +0200 Subject: [ofa-general] OFED 1.2 beta release Message-ID: <45F802E4.4030503@mellanox.co.il> Hi, OFED 1.2-beta1 is available on http://www.openfabrics.org/builds/ofed-1.2/ File:OFED-1.2-beta1.tgz BUILD_ID contains info on all packages sources location. Please report any issues in bugzilla https://bugs.openfabrics.org/ Tziporet & Vlad *_OS support:_* Novell: - SLES 9.0 SP3 - SLES10 Redhat: - Redhat EL4 up4 and up3 - Redhat EL5 beta2 (only partially tested) kernel.org: - 2.6.20 - 2.6.19 Note: Fedora C6 and SuSE Pro 10 are not part of the official list. We keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. _*Systems:*_ * x86_64 * x86 * ia64 * ppc64 _*Main changes from OFED-1.1-alpha:*_ 1. Added packages: 1. madeye utility 2. DAPL utils (contains dapl test) 3. Support for MPI selector 4. RDS tools package 2. RDS to support SLES10 and RHEL up3 and up4, and tested with crload 3. ipath driver is now available and libipathverbs is working 4. Fixed 82 bugs (see attachment for all bugs fixed) _*Limitations and known issues:*_ Major issues: bug_id bug_severity op_sys assigned_to short_short_desc 400 blocker RHEL 4 sean.hefty at intel.com OFED 1.2 IPoIB HA/CM/multicast problems 456 critical SLES 10 arlin.r.davis at intel.com dapltest won't compile on SLES10 IA64 419 critical SLES 10 mee at pathscale.com OFED 1.2 does not build ipath components on SLES10/PPC64 420 critical All monil at voltaire.com PKey table reordering caused by SM failover stops ipoib traffic 402 critical Other mst at mellanox.co.il On stress kernel: BUG: soft lockup detected on CPU#0! 431 critical SLES 10 mst at mellanox.co.il IPoIB CM locks up server on SLES10/RHEL4 ppc64 436 major RHEL 4 arlin.r.davis at intel.com Intel MPI and HP MPI DDR bandwidth dropped after OFED 1.2 alpha 351 major SLES 10 bugzilla at openib.org Routing table problem in SLES10 when using port #2 449 major All bugzilla at openib.org DMA vs CQ race on IA64 Altix platform 450 major RHEL 4 bugzilla at openib.org IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 406 major RHEL 4 eitan at mellanox.co.il "double free" abort in ibdaigui 418 major RHEL 4 mst at mellanox.co.il IPoIB CM causes large message IPv4 multicast to fail 445 major SLES 10 pasha at mellanox.co.il MVAPICH won't compile on ppc64 438 major All rolandd at cisco.com OFED SRP does not work with DDN IB storage See bugzilla for all issues open. _*Tasks that should be completed for RC1:*_ 1. Support RHEL5 2. PPC better support (several PPC specific bugs are still open) 3. Bonding support by IPOIB 4. Fix all blocker, critical and major bugs *RC1 due date is 29-March* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fixed-bugs-2007-03-14.csv URL: From bugzilla-daemon at lists.openfabrics.org Wed Mar 14 07:35:58 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Mar 2007 07:35:58 -0700 (PDT) Subject: [ofa-general] [Bug 122] mad layer problem In-Reply-To: Message-ID: <20070314143558.83C48E603B1@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=122 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |INVALID ------- Comment #3 from tziporet at mellanox.co.il 2007-03-14 07:35 ------- Issue was resolved - problem was in the HCA FW. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at lists.openfabrics.org Wed Mar 14 07:37:39 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Mar 2007 07:37:39 -0700 (PDT) Subject: [ofa-general] [Bug 122] mad layer problem In-Reply-To: Message-ID: <20070314143739.538DBE603B1@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=122 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at lists.openfabrics.org Wed Mar 14 08:24:22 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Mar 2007 08:24:22 -0700 (PDT) Subject: [ofa-general] [Bug 308] IPOIB HA Failed - ping does not reach to destination In-Reply-To: Message-ID: <20070314152422.9DA20E607F8@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=308 yohadd at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from yohadd at mellanox.co.il 2007-03-14 08:24 ------- This bug was fixed. In the below flow - ping get to destination. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sweitzen at cisco.com Wed Mar 14 09:03:01 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 14 Mar 2007 09:03:01 -0700 Subject: [ofa-general] RE: [ewg] OFED 1.2 beta release In-Reply-To: <45F802E4.4030503@mellanox.co.il> References: <45F802E4.4030503@mellanox.co.il> Message-ID: I have added a 1.2beta1 version in bugzilla. ________________________________ From: ewg-bounces at lists.openfabrics.org [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Tziporet Koren Sent: Wednesday, March 14, 2007 7:13 AM To: EWG Cc: OPENIB Subject: [ewg] OFED 1.2 beta release Hi, OFED 1.2-beta1 is available on http://www.openfabrics.org/builds/ofed-1.2/ File:OFED-1.2-beta1.tgz BUILD_ID contains info on all packages sources location. Please report any issues in bugzilla https://bugs.openfabrics.org/ Tziporet & Vlad OS support: Novell: - SLES 9.0 SP3 - SLES10 Redhat: - Redhat EL4 up4 and up3 - Redhat EL5 beta2 (only partially tested) kernel.org: - 2.6.20 - 2.6.19 Note: Fedora C6 and SuSE Pro 10 are not part of the official list. We keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. Systems: * x86_64 * x86 * ia64 * ppc64 Main changes from OFED-1.1-alpha: 1. Added packages: 1. madeye utility 2. DAPL utils (contains dapl test) 3. Support for MPI selector 4. RDS tools package 2. RDS to support SLES10 and RHEL up3 and up4, and tested with crload 3. ipath driver is now available and libipathverbs is working 4. Fixed 82 bugs (see attachment for all bugs fixed) Limitations and known issues: Major issues: bug_id bug_severity op_sys assigned_to short_short_desc 400 blocker RHEL 4 sean.hefty at intel.com OFED 1.2 IPoIB HA/CM/multicast problems 456 critical SLES 10 arlin.r.davis at intel.com dapltest won't compile on SLES10 IA64 419 critical SLES 10 mee at pathscale.com OFED 1.2 does not build ipath components on SLES10/PPC64 420 critical All monil at voltaire.com PKey table reordering caused by SM failover stops ipoib traffic 402 critical Other mst at mellanox.co.il On stress kernel: BUG: soft lockup detected on CPU#0! 431 critical SLES 10 mst at mellanox.co.il IPoIB CM locks up server on SLES10/RHEL4 ppc64 436 major RHEL 4 arlin.r.davis at intel.com Intel MPI and HP MPI DDR bandwidth dropped after OFED 1.2 alpha 351 major SLES 10 bugzilla at openib.org Routing table problem in SLES10 when using port #2 449 major All bugzilla at openib.org DMA vs CQ race on IA64 Altix platform 450 major RHEL 4 bugzilla at openib.org IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 406 major RHEL 4 eitan at mellanox.co.il "double free" abort in ibdaigui 418 major RHEL 4 mst at mellanox.co.il IPoIB CM causes large message IPv4 multicast to fail 445 major SLES 10 pasha at mellanox.co.il MVAPICH won't compile on ppc64 438 major All rolandd at cisco.com OFED SRP does not work with DDN IB storage See bugzilla for all issues open. Tasks that should be completed for RC1: 1. Support RHEL5 2. PPC better support (several PPC specific bugs are still open) 3. Bonding support by IPOIB 4. Fix all blocker, critical and major bugs RC1 due date is 29-March -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Wed Mar 14 09:53:49 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 14 Mar 2007 09:53:49 -0700 Subject: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <20070314051158.GA7997@mellanox.co.il> References: <000001c765c0$7d3bdd70$8698070a@amr.corp.intel.com> <20070314051158.GA7997@mellanox.co.il> Message-ID: <45F8289D.9050806@ichips.intel.com> > Sean, before applying this, please discuss the MRA timeout patch > on the general list with Ishai. > > Can you fix the patch instead of removing it? > It helps him work-around bugs in his SRP target. Currently the patch is broken and incorrectly sets the CM timeouts to 21 ms, which causes failures on any (even small) scale-up testing. The OFED 1.1 patch was more limited in scope and only affected MRA timeouts, whereas, this patch affects all CM timeouts. I thought the SRP target issue was fixed with a firmware update, and I saw that Ishai was re-working the timeout patch. Until a fixed patch is ready, I believe that we should remove this. > Please post what you did and the errors you get, and we'll try to help. I ran a command based on what you mentioned: env git_url=/home/shefty/src/ofed_1_2/ git_branch=ofed_1_2 CHECK_LOCAL=yes CHECK_KERNEL_ORG=yes /home/vlad/scripts/build_ofa_kernel.sh The build for x686 works, but all other builds fail, so I removed the CHECK_CROSS=yes option. I get build errors related to the cxgb3 driver. - Sean From halr at voltaire.com Wed Mar 14 11:14:16 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Mar 2007 13:14:16 -0500 Subject: [ofa-general] Re: [PATCH - diags] Added support to query the GUID table entries In-Reply-To: <45F7F15C.1060503@dev.mellanox.co.il> References: <1173871533.7651.0.camel@mtldesk014.lab.mtl.com> <1173879936.5995.175892.camel@hal.voltaire.com> <45F7F15C.1060503@dev.mellanox.co.il> Message-ID: <1173896051.5995.192427.camel@hal.voltaire.com> On Wed, 2007-03-14 at 07:58, Dotan Barak wrote: > Hal Rosenstock wrote: > > Port number does not affect GUIDInfo and in fact the query for PortInfo > > here is using port 0 so portnum is not needed, right ? > > > > -- Hal > > > > You are absolutely right: the port value is meaningless, here is a snip > from the IB spec: > "The attribute selected corresponds to the port that received the SMP." I fixed this up. Thanks. Applied (to both master and ofed_1_2). -- Hal > thanks > Dotan From sweitzen at cisco.com Wed Mar 14 10:15:43 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 14 Mar 2007 10:15:43 -0700 Subject: [ofa-general] RE: bug 400: ipoib error messages In-Reply-To: <20070314074914.GE4644@mellanox.co.il> References: <000201c765cc$9c90fd70$8698070a@amr.corp.intel.com> <20070314074914.GE4644@mellanox.co.il> Message-ID: > Scott, what is "Interim result"? We configure netperf 2.4.1 with --enable-demo=yes, which adds a -D optional parameter to print interim results as it runs, which is very handy for long running tests. > You are using SM on switch, are you not? > Are you sure the delays are not due to SM? I will try IPoIB UD HA today and Topspin IPoIB HA as well, to gather more data. Scott From vlad at dev.mellanox.co.il Wed Mar 14 10:19:08 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 14 Mar 2007 19:19:08 +0200 Subject: [ewg] Re: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <45F8289D.9050806@ichips.intel.com> References: <000001c765c0$7d3bdd70$8698070a@amr.corp.intel.com> <20070314051158.GA7997@mellanox.co.il> <45F8289D.9050806@ichips.intel.com> Message-ID: <45F82E8C.1000208@dev.mellanox.co.il> Sean Hefty wrote: >> Sean, before applying this, please discuss the MRA timeout patch >> on the general list with Ishai. >> >> Can you fix the patch instead of removing it? >> It helps him work-around bugs in his SRP target. > > Currently the patch is broken and incorrectly sets the CM timeouts to > 21 ms, which causes failures on any (even small) scale-up testing. > The OFED 1.1 patch was more limited in scope and only affected MRA > timeouts, whereas, this patch affects all CM timeouts. I thought the > SRP target issue was fixed with a firmware update, and I saw that > Ishai was re-working the timeout patch. Until a fixed patch is ready, > I believe that we should remove this. > >> Please post what you did and the errors you get, and we'll try to help. > > I ran a command based on what you mentioned: > > env git_url=/home/shefty/src/ofed_1_2/ git_branch=ofed_1_2 > CHECK_LOCAL=yes CHECK_KERNEL_ORG=yes > /home/vlad/scripts/build_ofa_kernel.sh > > The build for x686 works, but all other builds fail, so I removed the > CHECK_CROSS=yes option. I get build errors related to the cxgb3 driver. > > - Sean > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > Try now, Cross compilation should work. Regards, Vladimir From mshefty at ichips.intel.com Wed Mar 14 10:38:35 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 14 Mar 2007 10:38:35 -0700 Subject: [ofa-general] RE: bug 400: ipoib error messages In-Reply-To: References: <000201c765cc$9c90fd70$8698070a@amr.corp.intel.com> Message-ID: <45F8331B.8030301@ichips.intel.com> > Right now I am having a hard time getting failures to happen, I'll keep > trying. Your last report mentioned that you were running OFED-1.2-20070311-0600. Is this still the case? A fix for the multicast detach race went into OFED 1.2 on March 11th. I don't know if it made it into the OFED-1.2-20070311-0600 release or not, but should be in 20070312 and later releases. Can you look for the file kernel_patches/fixes/sean_ipoib_multicast.patch? > Here's an example of several minutes of dmesg output: > > ib0: enabling connected mode will cause multicast packet drops > ib1: enabling connected mode will cause multicast packet drops > ib1: dev_queue_xmit failed to requeue packet > ib1: dev_queue_xmit failed to requeue packet Michael, are either of these messages any cause for concern? - Sean From bugzilla-daemon at lists.openfabrics.org Wed Mar 14 10:56:06 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Mar 2007 10:56:06 -0700 (PDT) Subject: [ofa-general] [Bug 458] New: Heavy interrupt rates kill UDP performance Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=458 Summary: Heavy interrupt rates kill UDP performance Product: OpenFabrics Linux Version: 1.1 Platform: X86-64 OS/Version: RHEL 4 Status: NEW Severity: major Priority: P2 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: xma at us.ibm.com cat /proc/cpuinfo (total of 4 processors, only dispaly one here) processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU 5160 @ 3.00GHz stepping : 6 cpu MHz : 3000.147 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est tm2 cx16 xtpr bogomips : 6006.40 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.128 (10.0.0.128) port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 8388608 65507 60.00 588062 0 5136.18 8388608 60.00 441589 3856.87 We have tried 8MB UDP buffer and 80MB UDP buffer, that doesn't help much. net.core.rmem_default = 8388608 net.core.wmem_default = 8388608 net.core.rmem_max = 8388608 net.core.wmem_max = 8388608 Then we tried CPU affinity for netserver to different CPU with the ib_mtcha irq interrupt, then the UDP receiving errors are gone. UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.128 (10.0.0.128) port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 8388608 65507 60.00 591595 0 5167.07 8388608 60.00 591595 5167.07 -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sweitzen at cisco.com Wed Mar 14 10:57:20 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 14 Mar 2007 10:57:20 -0700 Subject: [ofa-general] RE: bug 400: ipoib error messages In-Reply-To: <45F8331B.8030301@ichips.intel.com> References: <000201c765cc$9c90fd70$8698070a@amr.corp.intel.com> <45F8331B.8030301@ichips.intel.com> Message-ID: Yes, I was using 20070311, and I see the patch in 20070312. I'll try it. Scott > -----Original Message----- > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Wednesday, March 14, 2007 10:39 AM > To: Scott Weitzenkamp (sweitzen); Michael S. Tsirkin > Cc: Sean Hefty; general at lists.openfabrics.org > Subject: Re: [ofa-general] RE: bug 400: ipoib error messages > > > Right now I am having a hard time getting failures to > happen, I'll keep > > trying. > > Your last report mentioned that you were running > OFED-1.2-20070311-0600. Is > this still the case? > > A fix for the multicast detach race went into OFED 1.2 on > March 11th. I don't > know if it made it into the OFED-1.2-20070311-0600 release or > not, but should be > in 20070312 and later releases. Can you look for the file > kernel_patches/fixes/sean_ipoib_multicast.patch? > > > Here's an example of several minutes of dmesg output: > > > > ib0: enabling connected mode will cause multicast packet drops > > ib1: enabling connected mode will cause multicast packet drops > > ib1: dev_queue_xmit failed to requeue packet > > ib1: dev_queue_xmit failed to requeue packet > > Michael, are either of these messages any cause for concern? > > - Sean > From mst at dev.mellanox.co.il Wed Mar 14 10:57:28 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Mar 2007 19:57:28 +0200 Subject: [ofa-general] RE: bug 400: ipoib error messages In-Reply-To: <45F8331B.8030301@ichips.intel.com> References: <000201c765cc$9c90fd70$8698070a@amr.corp.intel.com> <45F8331B.8030301@ichips.intel.com> Message-ID: <20070314175728.GA22037@mellanox.co.il> > > Here's an example of several minutes of dmesg output: > > > > ib0: enabling connected mode will cause multicast packet drops > > ib1: enabling connected mode will cause multicast packet drops > > ib1: dev_queue_xmit failed to requeue packet > > ib1: dev_queue_xmit failed to requeue packet > > Michael, are either of these messages any cause for concern? I don't think so. -- MST From mshefty at ichips.intel.com Wed Mar 14 10:57:53 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 14 Mar 2007 10:57:53 -0700 Subject: [ewg] Re: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <45F82E8C.1000208@dev.mellanox.co.il> References: <000001c765c0$7d3bdd70$8698070a@amr.corp.intel.com> <20070314051158.GA7997@mellanox.co.il> <45F8289D.9050806@ichips.intel.com> <45F82E8C.1000208@dev.mellanox.co.il> Message-ID: <45F837A1.2080405@ichips.intel.com> > Try now, > Cross compilation should work. Thanks - it worked. - Sean From mst at dev.mellanox.co.il Wed Mar 14 10:58:27 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Mar 2007 19:58:27 +0200 Subject: [ofa-general] RE: bug 400: ipoib error messages In-Reply-To: <45F8331B.8030301@ichips.intel.com> References: <000201c765cc$9c90fd70$8698070a@amr.corp.intel.com> <45F8331B.8030301@ichips.intel.com> Message-ID: <20070314175827.GB22037@mellanox.co.il> > > Here's an example of several minutes of dmesg output: > > > > ib0: enabling connected mode will cause multicast packet drops > > ib1: enabling connected mode will cause multicast packet drops > > ib1: dev_queue_xmit failed to requeue packet > > ib1: dev_queue_xmit failed to requeue packet > > Michael, are either of these messages any cause for concern? I don't think so. -- MST From mst at dev.mellanox.co.il Wed Mar 14 11:05:13 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Mar 2007 20:05:13 +0200 Subject: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <45F8289D.9050806@ichips.intel.com> References: <000001c765c0$7d3bdd70$8698070a@amr.corp.intel.com> <20070314051158.GA7997@mellanox.co.il> <45F8289D.9050806@ichips.intel.com> Message-ID: <20070314180513.GC22037@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes > > >Sean, before applying this, please discuss the MRA timeout patch > >on the general list with Ishai. > > > >Can you fix the patch instead of removing it? > >It helps him work-around bugs in his SRP target. > > Currently the patch is broken and incorrectly sets the CM timeouts to 21 > ms, which causes failures on any (even small) scale-up testing. The OFED > 1.1 patch was more limited in scope and only affected MRA timeouts, > whereas, this patch affects all CM timeouts. I thought the SRP target > issue was fixed with a firmware update, This is not always an option. > and I saw that Ishai was re-working > the timeout patch. Until a fixed patch is ready, I believe that we should > remove this. AFAIK OFED 1.2 has the reworked patch last posted by Ishai. Last time I checked there were only some cosmetic issues with it. If there are still issues pls discuss. -- MST From mshefty at ichips.intel.com Wed Mar 14 11:13:39 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 14 Mar 2007 11:13:39 -0700 Subject: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <20070314180513.GC22037@mellanox.co.il> References: <000001c765c0$7d3bdd70$8698070a@amr.corp.intel.com> <20070314051158.GA7997@mellanox.co.il> <45F8289D.9050806@ichips.intel.com> <20070314180513.GC22037@mellanox.co.il> Message-ID: <45F83B53.20105@ichips.intel.com> > AFAIK OFED 1.2 has the reworked patch last posted by Ishai. > Last time I checked there were only some cosmetic issues with it. > If there are still issues pls discuss. I still see sean_cm_limit_mra_timeout.patch as being the same in the OFED tree. Ishai's reworked patch had one more serious issue keeping the wrong timeout value for SIDR. In any case, I'm updating the patch... - Sean From mst at dev.mellanox.co.il Wed Mar 14 11:33:41 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Mar 2007 20:33:41 +0200 Subject: [ofa-general] Re: mapping kernel memory to userspace in <= 2.6.14 In-Reply-To: <1173881111.29201.1.camel@stevo-desktop> References: <1173794903.32342.17.camel@stevo-desktop> <1173813907.32342.44.camel@stevo-desktop> <20070314051943.GA8249@mellanox.co.il> <1173881111.29201.1.camel@stevo-desktop> Message-ID: <20070314183341.GD22037@mellanox.co.il> > Quoting Steve Wise : > Subject: Re: mapping kernel memory to userspace in <= 2.6.14 > > On Wed, 2007-03-14 at 07:19 +0200, Michael S. Tsirkin wrote: > > > I figured out why I was hitting the BUG_ON() in page_remove_rmap(). Its > > > because my library was destroying the QP or CQ object _before_ unmaping > > > the objects. I changed it to unmap first, and things work as expected. > > > > I guess kernel side also needs to be fixed, otherwise buggy userspace > > can trigger BUG_ON's? > > > > a buggy provider library. This is a userspace library, isn't it? -- MST From mst at dev.mellanox.co.il Wed Mar 14 12:00:46 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Mar 2007 21:00:46 +0200 Subject: [ofa-general] Re: RE: bug 400: ipoib error messages In-Reply-To: <45F8331B.8030301@ichips.intel.com> References: <000201c765cc$9c90fd70$8698070a@amr.corp.intel.com> <45F8331B.8030301@ichips.intel.com> Message-ID: <20070314190037.GF22037@mellanox.co.il> > A fix for the multicast detach race went into OFED 1.2 on March 11th. I > don't know if it made it into the OFED-1.2-20070311-0600 release or not, Look at build id, this includes git checksum for kernel code. Look that up with git in ofed_1_2.git and you'll know what's in there. -- MST From mshefty at ichips.intel.com Wed Mar 14 13:52:10 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 14 Mar 2007 13:52:10 -0700 Subject: [ofa-general] Re: [openib-general] [PATCH] 2.6.20 ib_cm: limit cm message timeouts In-Reply-To: <45EC84E1.1080709@dev.mellanox.co.il> References: <002301c73355$c220e180$8698070a@amr.corp.intel.com> <45EC84E1.1080709@dev.mellanox.co.il> Message-ID: <45F8607A.4060407@ichips.intel.com> > How about the attached fix to Sean patch? I've created an updated patch that I will queue for 2.6.22. I'm working on importing it into OFED 1.2, and should have that shortly. - Sean From kanojsarcar at yahoo.com Wed Mar 14 14:42:01 2007 From: kanojsarcar at yahoo.com (Kanoj Sarcar) Date: Wed, 14 Mar 2007 14:42:01 -0700 (PDT) Subject: [ofa-general] OFA newbie question: module load/unload Message-ID: <495539.40593.qm@web32514.mail.mud.yahoo.com> Hi, I have a question about module load/unload of the various drivers in drivers/infiniband/hw. I can see that ib_uverbs_open() bumps the driver module reference count when a user application starts using a device. What about bumping the module reference count when a kernel client (eg ipoib) is using a device? How is the driver made unloadable in such cases? Thanks. Kanoj ____________________________________________________________________________________ Looking for earth-friendly autos? Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. http://autos.yahoo.com/green_center/ From sean.hefty at intel.com Wed Mar 14 14:46:23 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 14 Mar 2007 14:46:23 -0700 Subject: [ofa-general] OFA newbie question: module load/unload In-Reply-To: <495539.40593.qm@web32514.mail.mud.yahoo.com> Message-ID: <000601c76682$35999df0$8698070a@amr.corp.intel.com> >What about bumping the module reference count when a >kernel client (eg ipoib) is using a device? How is the >driver made unloadable in such cases? The driver is still unloadable in such cases. The kernel client is notified when a device is removed and is expected to release any resources associated with that device. - Sean From mshefty at ichips.intel.com Wed Mar 14 14:47:22 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 14 Mar 2007 14:47:22 -0700 Subject: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <45F83B53.20105@ichips.intel.com> References: <000001c765c0$7d3bdd70$8698070a@amr.corp.intel.com> <20070314051158.GA7997@mellanox.co.il> <45F8289D.9050806@ichips.intel.com> <20070314180513.GC22037@mellanox.co.il> <45F83B53.20105@ichips.intel.com> Message-ID: <45F86D6A.4030802@ichips.intel.com> > In any case, I'm updating the patch... Somethings wrong with the OFED git tree. Looking online, I don't see any changes to the git log since early February. If I clone the git tree, however, I do see recent log messages/changes. ??? I went to update the file sean_cm_limit_mra_timeout.patch, and noticed that it had changed, but I don't see a log message for that change, either online or in my cloned tree. ??? As a side, OFED really needs to change their entire process around how they make use of a source control tool. Using source control simply to store of copy of a directory loses most of the benefits that source control actually provides... - Sean From mshefty at ichips.intel.com Wed Mar 14 15:07:53 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 14 Mar 2007 15:07:53 -0700 Subject: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <45F86D6A.4030802@ichips.intel.com> References: <000001c765c0$7d3bdd70$8698070a@amr.corp.intel.com> <20070314051158.GA7997@mellanox.co.il> <45F8289D.9050806@ichips.intel.com> <20070314180513.GC22037@mellanox.co.il> <45F83B53.20105@ichips.intel.com> <45F86D6A.4030802@ichips.intel.com> Message-ID: <45F87239.5050007@ichips.intel.com> > I went to update the file sean_cm_limit_mra_timeout.patch, and noticed > that it had changed, but I don't see a log message for that change, > either online or in my cloned tree. ??? A fresh clone shows the log as Mar 1, so this looks good. I think the issue is with my use of git on the OFED server, versus the git version that I typically use. So, it seems that the only issue is with the http git view. - Sean From nimrodg at mellanox.com Wed Mar 14 15:23:22 2007 From: nimrodg at mellanox.com (Nimrod Gindi) Date: Wed, 14 Mar 2007 15:23:22 -0700 Subject: [ofa-general] OFED release testing Task force meeting minutes In-Reply-To: <1E3DCD1C63492545881FACB6063A57C1D4C8D8@mtiexch01.mti.com> Message-ID: <1E3DCD1C63492545881FACB6063A57C1E04FCD@mtiexch01.mti.com> Meeting took place on Wednesday - March. 14th, 2007 8:30AM (PST) Agenda: 1. Reviewing experience with existing spread sheet. 2. Review test to deploy into OFED and current method of deployment as exercised by Amit from Mellanox. Attending companies: Qlogic, Mellanox, Voltaire, SystemFabricWorks Discussion Items and Action Items: 1. Plan to implement the testing report over the Beta release in the next 2 weeks 2. AI1: Spread sheet Nimrod G. - Send spread sheet with Owners indicated 3. Reminder on Next agreed steps: a. Start using the spread sheet post Beta build of OFED 1.2 to assist with visibility into testing done by members b. Adding tests from member companies to shared OFED repository. i. AI 2: Amit K. - send out a pointer to tests which are already posted by Mellanox in OFED. c. Fill in missing ULP owners under the following understanding of responsibilities: i. ULP owner will be in charge of approving entering tests of the ULP to enter the list/repository ii. ULP owner to flag the task force in case in which the ULP under his responsibility is falling behind on testing in the community. Follow-up meeting will be scheduled for 28th-March 2007 8:30am PDT=11:30am EDT=6:30pm Israel. Nimrod Gindi Mellanox Technologies Ltd. mail : nimrodg at mellanox.com Cell : +1-408-750-4801 Office: +1-347-342-0011 Fax : +1-212-987-0275 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OFED testing report format rev3.xls Type: application/vnd.ms-excel Size: 50688 bytes Desc: OFED testing report format rev3.xls URL: From mshefty at ichips.intel.com Wed Mar 14 15:39:50 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 14 Mar 2007 15:39:50 -0700 Subject: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <20070314051158.GA7997@mellanox.co.il> References: <000001c765c0$7d3bdd70$8698070a@amr.corp.intel.com> <20070314051158.GA7997@mellanox.co.il> Message-ID: <45F879B6.2040904@ichips.intel.com> >>Vlad, please pull from: >> >> git://git.openfabrics.org/~shefty/ofed_1_2.git ofed_1_2 >> >>This should add some necessary fixes to the OFED code: >> >> RDMA/ucma: avoid sending reject if backlog is full >> RDMA/cma: Request reversible paths only >> IB/cm: remove broken MRA timeout patch > > > Sean, before applying this, please discuss the MRA timeout patch > on the general list with Ishai. > > Can you fix the patch instead of removing it? > It helps him work-around bugs in his SRP target. I've updated the patches in my ofed tree. The version of the tree that I was originally working on did not have Ishai's changes applied to them, and I didn't realize that they were merged into OFED. (The broken MRA timeout patch I was referring to was the one before Ishai's.) So, I ended up creating a different replacement patch that: increases the default timeout, exports the timeout as a module parameter, and fixes an issue setting the SIDR REQ timeout. (This is the version of the patch that I will request for 2.6.22.) - Sean From Koen.SEGERS at VRT.BE Wed Mar 14 16:04:05 2007 From: Koen.SEGERS at VRT.BE (SEGERS Koen) Date: Thu, 15 Mar 2007 00:04:05 +0100 Subject: [ofa-general] oversubscription References: <1173831317.5995.125066.camel@hal.voltaire.com> <45F7A33F.3090906@dev.mellanox.co.il> Message-ID: Your question made it clear :) We are still using only one SFS-7000P switch. Routing is here not an option. This was our setup: s1 | | | | SFS-7000P | | | | h1 h2 h3 h4 So change the routing will give no benefit what so ever... Greetz Koen ________________________________ From: Yevgeny Kliteynik [mailto:kliteyn at dev.mellanox.co.il] Sent: Wed 3/14/2007 8:24 AM To: Hal Rosenstock Cc: SEGERS Koen; general at lists.openfabrics.org Subject: Re: [ofa-general] oversubscription Hal Rosenstock wrote: > On Tue, 2007-03-13 at 17:30, SEGERS Koen wrote: >> I'm trying to resolve an oversubscription problem of 1 server >> receiving streams from 4 hosts. >> >> If you connect the server with 4 cables to the switch, it should be >> resolved when I define the routes that should be used. But is it >> possible to define the routing tables in the subnet manager? > > Depends on the SM as to whether this is supported or not. OpenSM > supports the ability to do this as do some vendor SMs. > >> Maybe just running the fattree routing module is sufficient? > > I'm not sure; Yevgeny would be the best to answer this. What is the fabric topology? Basically, you can try using fat-tree routing, and if the topology is not a fat-tree, OpenSM will issue an error message and will use default routing. You can get more details from the osm/doc/current-routing.txt -- Yevgeny >> I also saw that it is possible to set the routes by using a file. Can >> someone give an example of this? > > This capability is documented in the opensm man page. You can obtain > this file via dump_lfts.sh script. > >> Currently I'm trying a combination of partitioning and linux routing. >> Is this a good idea? > > What do you mean by Linux routing ? Is this IP routing ? > > -- Hal > >> BTW: we are only interested in SDP and IPoIB. In a previous thread I >> discovered that bonding/aggregation/merging is not possible in an >> active/active situation. So this doesn't improve our setup. >> >> greetz, >> >> Koen >> *** Disclaimer *** >> >> Vlaamse Radio- en Televisieomroep >> Auguste Reyerslaan 52, 1043 Brussel >> >> nv van publiek recht >> BTW BE 0244.142.664 >> RPR Brussel >> http://www.vrt.be/disclaimer >> >> >> >> ______________________________________________________________________ >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From kanojsarcar at yahoo.com Wed Mar 14 16:15:50 2007 From: kanojsarcar at yahoo.com (Kanoj Sarcar) Date: Wed, 14 Mar 2007 16:15:50 -0700 (PDT) Subject: [ofa-general] OFA newbie question: module load/unload In-Reply-To: <000601c76682$35999df0$8698070a@amr.corp.intel.com> Message-ID: <443253.66369.qm@web32515.mail.mud.yahoo.com> --- Sean Hefty wrote: > >What about bumping the module reference count when > a > >kernel client (eg ipoib) is using a device? How is > the > >driver made unloadable in such cases? > > The driver is still unloadable in such cases. The > kernel client is notified > when a device is removed and is expected to release > any resources associated > with that device. > > - Sean > Okay. So, the driver is responsible for waking up any verb handler threads that issued commands to the card but have not yet received responses. As an example, using mthca, I assume then something like mthca_CLOSE_IB() is causing the card to respond back to all pending verb requests so that mthca_cmd_event() can flag those as cancelled, and all threads can exit the to-be-unloaded code. Or maybe some other piece of code is handling this? Thanks. Kanoj ____________________________________________________________________________________ Be a PS3 game guru. Get your game face on with the latest PS3 news and previews at Yahoo! Games. http://videogames.yahoo.com/platform?platform=120121 From bugzilla-daemon at lists.openfabrics.org Wed Mar 14 22:58:45 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Mar 2007 22:58:45 -0700 (PDT) Subject: [ofa-general] [Bug 460] New: IP address assignment problem with IPoIB interfaces Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=460 Summary: IP address assignment problem with IPoIB interfaces Product: OpenFabrics Linux Version: 1.2alpha1 Platform: X86-64 OS/Version: SLES 10 Status: NEW Severity: major Priority: P1 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: karun.sharma at qlogic.com OFED Release: 1.2 alpha 1 OS Release: SLES10 64 bit HCA Details: Mellanox MT 23108 Kernel: 2.6.16.21-0.8-smp ---------------------------------------------- 1.Started installing OFED 1.2 package, with the option of "install All packages" 2.During installation itself configured IPoIB interfaces and assigned IP addresses as follows ib0 - 172.20.51.222 ib1 - 172.20.52.222 3.After completion of installation rebooted the system and after reboot IP addresses were shown properly as assigned. 4.Then executed the command " service Network restart" 5.After Network restart the IP addresses shown were, ib0 - 172.20.52.222 ib1 - 172.20.52.222 This is not a proper behavior. 5. Checked /etc/infiniband/openib.conf, these interfaces are not configured for HA, so they should not behave like HA.So there is no reason for ib0 to take similar IP as ib1. Following is the output of ifconfig, before and after service network restart. -------------------------------------------------------------------------------- ib0 Link encap:UNSPEC HWaddr 80-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:172.20.51.222 Bcast:172.20.51.255 Mask:255.255.255.0 inet6 addr: fe80::206:6a00:a000:399/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:3 errors:0 dropped:2 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:220 (220.0 b) ib1 Link encap:UNSPEC HWaddr 80-00-04-05-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:172.20.52.222 Bcast:172.20.52.255 Mask:255.255.255.0 inet6 addr: fe80::206:6a01:a000:399/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:1 errors:0 dropped:0 overruns:0 frame:0 TX packets:3 errors:0 dropped:3 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:64 (64.0 b) TX bytes:220 (220.0 b) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:14 errors:0 dropped:0 overruns:0 frame:0 TX packets:14 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:996 (996.0 b) TX bytes:996 (996.0 b) ss22:~ # service network restart NetworkManager is not installed, thus using NetControl. Please set /etc/sysconfig/network/config:NetworkManager=no or install NetworkManager. NetworkManager is not installed, thus using NetControl. Please set /etc/sysconfig/network/config:NetworkManager=no or install NetworkManager. Shutting down network interfaces: ib0 device: Mellanox Technologies MT23108 InfiniHost (rev a1) ib0 configuration: ib1 done ib1 device: Mellanox Technologies MT23108 InfiniHost (rev adone cipsec0 No configuration found for cipsec0 Nevertheless the interface will be shut down. done eth0 device: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) eth0 configuration: eth-id-00:04:23:b1:43:38 done eth1 device: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) eth1 configuration: eth-id-00:04:23:b1:43:39 done Shutting down service network . . . . . . . . . . . . . done. NetworkManager is not installed, thus using NetControl. Please set /etc/sysconfig/network/config:NetworkManager=no or install NetworkManager. Hint: you may set mandatory devices in /etc/sysconfig/network/config Setting up network interfaces: lo lo IP address: 127.0.0.1/8 done cipsec0 No configuration found for cipsec0 unused eth0 device: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) eth0 configuration: eth-id-00:04:23:b1:43:38 eth0 IP address: 172.20.50.222/24 done eth1 device: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) eth1 configuration: eth-id-00:04:23:b1:43:39 eth1 IP address: 10.20.50.222/24 done ib0 device: Mellanox Technologies MT23108 InfiniHost (rev a1) ib0 configuration: ib1 ib0 IP address: 172.20.52.222/24 done ib1 device: Mellanox Technologies MT23108 InfiniHost (rev a1) ib1 IP address: 172.20.52.222/24 done Interface veth5 is not available failed Interface vex1 is not available failed Setting up service network . . . . . . . . . . . . . . failed ss22:~ # ifconfig eth0 Link encap:Ethernet HWaddr 00:04:23:B1:43:38 inet addr:172.20.50.222 Bcast:172.20.50.255 Mask:255.255.255.0 inet6 addr: fe80::204:23ff:feb1:4338/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:154 errors:0 dropped:0 overruns:0 frame:0 TX packets:150 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:14611 (14.2 Kb) TX bytes:25211 (24.6 Kb) Base address:0xdc00 Memory:fcfa0000-fcfc0000 eth1 Link encap:Ethernet HWaddr 00:04:23:B1:43:39 inet addr:10.20.50.222 Bcast:10.20.50.255 Mask:255.255.255.0 inet6 addr: fe80::204:23ff:feb1:4339/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:14 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:0 (0.0 b) TX bytes:1116 (1.0 Kb) Base address:0xdc80 Memory:fcfe0000-fd000000 ib0 Link encap:UNSPEC HWaddr 80-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:172.20.52.222 Bcast:172.20.52.255 Mask:255.255.255.0 inet6 addr: fe80::206:6a00:a000:399/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:1 errors:0 dropped:0 overruns:0 frame:0 TX packets:7 errors:0 dropped:4 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:64 (64.0 b) TX bytes:516 (516.0 b) ib1 Link encap:UNSPEC HWaddr 80-00-04-05-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:172.20.52.222 Bcast:172.20.52.255 Mask:255.255.255.0 inet6 addr: fe80::206:6a01:a000:399/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:2 errors:0 dropped:0 overruns:0 frame:0 TX packets:7 errors:0 dropped:5 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:128 (128.0 b) TX bytes:516 (516.0 b) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:14 errors:0 dropped:0 overruns:0 frame:0 TX packets:14 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:996 (996.0 b) TX bytes:996 (996.0 b) ss22:~ # --------------------------------------------------------------------------- 6.If I keep ib1 configuration as default, i.e "bootproto=dhcp", then on network restart, ib0 starts making DHCP discover, which is also not correct. Attaching herewith the openib.conf for my machine. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Wed Mar 14 22:59:45 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Wed, 14 Mar 2007 22:59:45 -0700 (PDT) Subject: [ofa-general] [Bug 460] IP address assignment problem with IPoIB interfaces In-Reply-To: Message-ID: <20070315055945.346FDE6081A@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=460 ------- Comment #1 from karun.sharma at qlogic.com 2007-03-14 22:59 ------- Created an attachment (id=97) --> (https://bugs.openfabrics.org/attachment.cgi?id=97&action=view) openib.conf on my machine -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From idp2006 at zahav.net.il Thu Mar 15 01:11:15 2007 From: idp2006 at zahav.net.il (mail@idp.co.il) Date: Thu, 15 Mar 2007 10:11:15 +0200 Subject: [ofa-general] =?iso-8859-1?q?=E1=F7=F8=FA_=EB=F0=E9=F1=E4?= Message-ID: <6462e6599444282d635332eeccc95fad@localip> This is an HTML email, please use HTML format to open it! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: intrence-control-one-pic.gif Type: image/gif Size: 170921 bytes Desc: not available URL: From bugzilla-daemon at lists.openfabrics.org Thu Mar 15 01:55:38 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 15 Mar 2007 01:55:38 -0700 (PDT) Subject: [ofa-general] [Bug 460] IP address assignment problem with IPoIB interfaces In-Reply-To: Message-ID: <20070315085538.AAFA7E60811@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=460 karun.sharma at qlogic.com changed: What |Removed |Added ---------------------------------------------------------------------------- Version|1.2alpha1 |1.2beta1 ------- Comment #2 from karun.sharma at qlogic.com 2007-03-15 01:55 ------- The issue exist with beta release also -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From vlad at lists.openfabrics.org Thu Mar 15 02:34:36 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Thu, 15 Mar 2007 02:34:36 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070315-0200 daily build status Message-ID: <20070315093436.EC1A8E60811@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on powerpc with linux-2.6.13 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.12 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.17 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From trishal.choudhari at qlogic.com Thu Mar 15 03:16:37 2007 From: trishal.choudhari at qlogic.com (Trishal Choudhari) Date: Thu, 15 Mar 2007 05:16:37 -0500 Subject: [ofa-general] RE: [ewg] OFED 1.2 beta release References: <45F802E4.4030503@mellanox.co.il> Message-ID: not able to install OFED1.2 beta1 on RHEL4 UP4 and SLES 10 64 bit machines. seems like some problem with MPI. I did custom installation without selecting any of the MPI components and installation went fine. please find the logs attached. trishal ________________________________ From: ewg-bounces at lists.openfabrics.org on behalf of Tziporet Koren Sent: Wed 3/14/2007 9:12 AM To: EWG Cc: OPENIB Subject: [ewg] OFED 1.2 beta release Hi, OFED 1.2-beta1 is available on http://www.openfabrics.org/builds/ofed-1.2/ File:OFED-1.2-beta1.tgz BUILD_ID contains info on all packages sources location. Please report any issues in bugzilla https://bugs.openfabrics.org/ Tziporet & Vlad OS support: Novell: - SLES 9.0 SP3 - SLES10 Redhat: - Redhat EL4 up4 and up3 - Redhat EL5 beta2 (only partially tested) kernel.org: - 2.6.20 - 2.6.19 Note: Fedora C6 and SuSE Pro 10 are not part of the official list. We keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. Systems: * x86_64 * x86 * ia64 * ppc64 Main changes from OFED-1.1-alpha: 1. Added packages: 1. madeye utility 2. DAPL utils (contains dapl test) 3. Support for MPI selector 4. RDS tools package 2. RDS to support SLES10 and RHEL up3 and up4, and tested with crload 3. ipath driver is now available and libipathverbs is working 4. Fixed 82 bugs (see attachment for all bugs fixed) Limitations and known issues: Major issues: bug_id bug_severity op_sys assigned_to short_short_desc 400 blocker RHEL 4 sean.hefty at intel.com OFED 1.2 IPoIB HA/CM/multicast problems 456 critical SLES 10 arlin.r.davis at intel.com dapltest won't compile on SLES10 IA64 419 critical SLES 10 mee at pathscale.com OFED 1.2 does not build ipath components on SLES10/PPC64 420 critical All monil at voltaire.com PKey table reordering caused by SM failover stops ipoib traffic 402 critical Other mst at mellanox.co.il On stress kernel: BUG: soft lockup detected on CPU#0! 431 critical SLES 10 mst at mellanox.co.il IPoIB CM locks up server on SLES10/RHEL4 ppc64 436 major RHEL 4 arlin.r.davis at intel.com Intel MPI and HP MPI DDR bandwidth dropped after OFED 1.2 alpha 351 major SLES 10 bugzilla at openib.org Routing table problem in SLES10 when using port #2 449 major All bugzilla at openib.org DMA vs CQ race on IA64 Altix platform 450 major RHEL 4 bugzilla at openib.org IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 406 major RHEL 4 eitan at mellanox.co.il "double free" abort in ibdaigui 418 major RHEL 4 mst at mellanox.co.il IPoIB CM causes large message IPv4 multicast to fail 445 major SLES 10 pasha at mellanox.co.il MVAPICH won't compile on ppc64 438 major All rolandd at cisco.com OFED SRP does not work with DDN IB storage See bugzilla for all issues open. Tasks that should be completed for RC1: 1. Support RHEL5 2. PPC better support (several PPC specific bugs are still open) 3. Bonding support by IPOIB 4. Fix all blocker, critical and major bugs RC1 due date is 29-March -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OFED.build.6021_rhel4up4.zip Type: application/x-zip-compressed Size: 800922 bytes Desc: OFED.build.6021_rhel4up4.zip URL: From tziporet at mellanox.co.il Thu Mar 15 03:54:47 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 15 Mar 2007 12:54:47 +0200 Subject: [ofa-general] RE: [ewg] OFED 1.2 beta release In-Reply-To: References: <45F802E4.4030503@mellanox.co.il> Message-ID: <6C2C79E72C305246B504CBA17B5500C9A0E013@mtlexch01.mtl.com> Can you open a bug in bugzilla to openmpi component? Thanks, Tziporet ________________________________ From: Trishal Choudhari [mailto:trishal.choudhari at qlogic.com] Sent: Thursday, March 15, 2007 12:17 PM To: Tziporet Koren; EWG Cc: OPENIB Subject: RE: [ewg] OFED 1.2 beta release not able to install OFED1.2 beta1 on RHEL4 UP4 and SLES 10 64 bit machines. seems like some problem with MPI. I did custom installation without selecting any of the MPI components and installation went fine. please find the logs attached. trishal ________________________________ From: ewg-bounces at lists.openfabrics.org on behalf of Tziporet Koren Sent: Wed 3/14/2007 9:12 AM To: EWG Cc: OPENIB Subject: [ewg] OFED 1.2 beta release Hi, OFED 1.2-beta1 is available on http://www.openfabrics.org/builds/ofed-1.2/ File:OFED-1.2-beta1.tgz BUILD_ID contains info on all packages sources location. Please report any issues in bugzilla https://bugs.openfabrics.org/ Tziporet & Vlad OS support: Novell: - SLES 9.0 SP3 - SLES10 Redhat: - Redhat EL4 up4 and up3 - Redhat EL5 beta2 (only partially tested) kernel.org: - 2.6.20 - 2.6.19 Note: Fedora C6 and SuSE Pro 10 are not part of the official list. We keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. Systems: * x86_64 * x86 * ia64 * ppc64 Main changes from OFED-1.1-alpha: 1. Added packages: 1. madeye utility 2. DAPL utils (contains dapl test) 3. Support for MPI selector 4. RDS tools package 2. RDS to support SLES10 and RHEL up3 and up4, and tested with crload 3. ipath driver is now available and libipathverbs is working 4. Fixed 82 bugs (see attachment for all bugs fixed) Limitations and known issues: Major issues: bug_id bug_severity op_sys assigned_to short_short_desc 400 blocker RHEL 4 sean.hefty at intel.com OFED 1.2 IPoIB HA/CM/multicast problems 456 critical SLES 10 arlin.r.davis at intel.com dapltest won't compile on SLES10 IA64 419 critical SLES 10 mee at pathscale.com OFED 1.2 does not build ipath components on SLES10/PPC64 420 critical All monil at voltaire.com PKey table reordering caused by SM failover stops ipoib traffic 402 critical Other mst at mellanox.co.il On stress kernel: BUG: soft lockup detected on CPU#0! 431 critical SLES 10 mst at mellanox.co.il IPoIB CM locks up server on SLES10/RHEL4 ppc64 436 major RHEL 4 arlin.r.davis at intel.com Intel MPI and HP MPI DDR bandwidth dropped after OFED 1.2 alpha 351 major SLES 10 bugzilla at openib.org Routing table problem in SLES10 when using port #2 449 major All bugzilla at openib.org DMA vs CQ race on IA64 Altix platform 450 major RHEL 4 bugzilla at openib.org IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 406 major RHEL 4 eitan at mellanox.co.il "double free" abort in ibdaigui 418 major RHEL 4 mst at mellanox.co.il IPoIB CM causes large message IPv4 multicast to fail 445 major SLES 10 pasha at mellanox.co.il MVAPICH won't compile on ppc64 438 major All rolandd at cisco.com OFED SRP does not work with DDN IB storage See bugzilla for all issues open. Tasks that should be completed for RC1: 1. Support RHEL5 2. PPC better support (several PPC specific bugs are still open) 3. Bonding support by IPOIB 4. Fix all blocker, critical and major bugs RC1 due date is 29-March -------------- next part -------------- An HTML attachment was scrubbed... URL: From trishal.choudhari at qlogic.com Thu Mar 15 04:41:26 2007 From: trishal.choudhari at qlogic.com (Trishal Choudhari) Date: Thu, 15 Mar 2007 06:41:26 -0500 Subject: [ofa-general] RE: [ewg] OFED 1.2 beta release References: <45F802E4.4030503@mellanox.co.il> <6C2C79E72C305246B504CBA17B5500C9A0E013@mtlexch01.mtl.com> Message-ID: bug number 461 is opened ________________________________ From: Tziporet Koren [mailto:tziporet at mellanox.co.il] Sent: Thu 3/15/2007 5:54 AM To: Trishal Choudhari; EWG Cc: OPENIB Subject: RE: [ewg] OFED 1.2 beta release Can you open a bug in bugzilla to openmpi component? Thanks, Tziporet ________________________________ From: Trishal Choudhari [mailto:trishal.choudhari at qlogic.com] Sent: Thursday, March 15, 2007 12:17 PM To: Tziporet Koren; EWG Cc: OPENIB Subject: RE: [ewg] OFED 1.2 beta release not able to install OFED1.2 beta1 on RHEL4 UP4 and SLES 10 64 bit machines. seems like some problem with MPI. I did custom installation without selecting any of the MPI components and installation went fine. please find the logs attached. trishal ________________________________ From: ewg-bounces at lists.openfabrics.org on behalf of Tziporet Koren Sent: Wed 3/14/2007 9:12 AM To: EWG Cc: OPENIB Subject: [ewg] OFED 1.2 beta release Hi, OFED 1.2-beta1 is available on http://www.openfabrics.org/builds/ofed-1.2/ File:OFED-1.2-beta1.tgz BUILD_ID contains info on all packages sources location. Please report any issues in bugzilla https://bugs.openfabrics.org/ Tziporet & Vlad OS support: Novell: - SLES 9.0 SP3 - SLES10 Redhat: - Redhat EL4 up4 and up3 - Redhat EL5 beta2 (only partially tested) kernel.org: - 2.6.20 - 2.6.19 Note: Fedora C6 and SuSE Pro 10 are not part of the official list. We keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. Systems: * x86_64 * x86 * ia64 * ppc64 Main changes from OFED-1.1-alpha: 1. Added packages: 1. madeye utility 2. DAPL utils (contains dapl test) 3. Support for MPI selector 4. RDS tools package 2. RDS to support SLES10 and RHEL up3 and up4, and tested with crload 3. ipath driver is now available and libipathverbs is working 4. Fixed 82 bugs (see attachment for all bugs fixed) Limitations and known issues: Major issues: bug_id bug_severity op_sys assigned_to short_short_desc 400 blocker RHEL 4 sean.hefty at intel.com OFED 1.2 IPoIB HA/CM/multicast problems 456 critical SLES 10 arlin.r.davis at intel.com dapltest won't compile on SLES10 IA64 419 critical SLES 10 mee at pathscale.com OFED 1.2 does not build ipath components on SLES10/PPC64 420 critical All monil at voltaire.com PKey table reordering caused by SM failover stops ipoib traffic 402 critical Other mst at mellanox.co.il On stress kernel: BUG: soft lockup detected on CPU#0! 431 critical SLES 10 mst at mellanox.co.il IPoIB CM locks up server on SLES10/RHEL4 ppc64 436 major RHEL 4 arlin.r.davis at intel.com Intel MPI and HP MPI DDR bandwidth dropped after OFED 1.2 alpha 351 major SLES 10 bugzilla at openib.org Routing table problem in SLES10 when using port #2 449 major All bugzilla at openib.org DMA vs CQ race on IA64 Altix platform 450 major RHEL 4 bugzilla at openib.org IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 406 major RHEL 4 eitan at mellanox.co.il "double free" abort in ibdaigui 418 major RHEL 4 mst at mellanox.co.il IPoIB CM causes large message IPv4 multicast to fail 445 major SLES 10 pasha at mellanox.co.il MVAPICH won't compile on ppc64 438 major All rolandd at cisco.com OFED SRP does not work with DDN IB storage See bugzilla for all issues open. Tasks that should be completed for RC1: 1. Support RHEL5 2. PPC better support (several PPC specific bugs are still open) 3. Bonding support by IPOIB 4. Fix all blocker, critical and major bugs RC1 due date is 29-March -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at lists.openfabrics.org Thu Mar 15 04:46:45 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 15 Mar 2007 04:46:45 -0700 (PDT) Subject: [ofa-general] [Bug 460] IP address assignment problem with IPoIB interfaces In-Reply-To: Message-ID: <20070315114645.48A3CE6080E@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=460 vlad at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #3 from vlad at mellanox.co.il 2007-03-15 04:46 ------- This issue is a known SLES10 issue (same as BUG 351) and is not OFED specific. It should be fixed by Novell -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From weikuan.yu at gmail.com Thu Mar 15 05:43:53 2007 From: weikuan.yu at gmail.com (Weikuan Yu) Date: Thu, 15 Mar 2007 08:43:53 -0400 Subject: [ofa-general] HotI 2007 Call for Papers -- 3rd Call Message-ID: <45F93F89.70309@gmail.com> -------------------------------------------------------------------- Apologies if you received multiple copies of this posting. Please feel free to distribute it to those who might be interested. -------------------------------------------------------------------- Hot Interconnects 15 IEEE Symposium on High-Performance Interconnects August 22-24, 2007 Stanford University Palo Alto, California, USA Hot Interconnects is the premier international forum for researchers and developers of state-of-the-art hardware and software architectures and implementations for interconnection networks of all scales, ranging from on-chip processor-memory interconnects to wide-area networks. This yearly conference is very well attended by leaders in industry and academia. The atmosphere provides for a wealth of opportunities to interact with individuals at the forefront of this field. Themes include cross-cutting issues spanning computer systems, networking technologies, and communication protocols. This conference is directed particularly at new and exciting technology and product innovations in these areas. Contributions should focus on real experimental systems, prototypes, or leading-edge products and their performance evaluation. In addition to those subscribing to the main theme of the conference, contributions are also solicited in the topics listed below. * Novel and innovative interconnect architectures * Multi-core processor interconnects * System-on-Chip Interconnects * Advanced chip-to-chip communication technologies * Optical interconnects * Protocol and interfaces for interprocessor communication * Survivability and fault-tolerance of interconnects * High-speed packet processing engines and network processors * System and storage area network architectures and protocols * High-performance host-network interface architectures * High-bandwidth and low-latency I/O * Tb/s switching and routing technologies * Innovative architectures for supporting collective communication * Novel communication architectures to support grid computing Submission Guideline o Submission deadline: March 31, 2007 o Notification of acceptance: May 15, 2007 o Papers need sufficient technical detail to judge quality and suitability for presentation. o Submit title, author, abstract, and full paper (six pages, double-column, IEEE format). o Papers should be submitted electronically at the specified link location found on http://www.hoti.org o For further information please see http://www.hoti.org/hoti15/cfp.html About the Conference - Conference held at the William Hewlett Teaching Center at Stanford University. - Papers selected will be published in proceedings by the IEEE Computer Society. - Presentations are 30-minute talks in a single-track format. - Online information at http://www.hoti.org GENERAL CO-CHAIRS * John W. Lockwood, Washington University in St. Louis * Fabrizio Petrini, Pacific Northwest National Laboratory TECHNICAL CO-CHAIRS * Ron Brightwell, Sandia National Laboratories * Dhabaleswar (DK) Panda, The Ohio State University LOCAL ARRANGEMENTS CHAIR * Songkrant Muneenaem, Washington University in St. Louis PANEL CHAIR * Daniel Pitt, Santa Clara University PUBLICITY CO-CHAIRS * Weikuan Yu, Oak Ridge National Laboratory PUBLICATION CHAIR * Luca Valcarenghi, Scuola Superiore Sant'Anna FINANCE CHAIR * Herzel Ashkenazi, Xilinx TUTORIAL CO-CHAIRS - TBA REGISTRATION CHAIR * Songkrant Muneenaem, Washington University in St. Louis Webmaster * Liz Rogers, LRD Group Steering Committee o Allen Baum, Intel o Lily Jow, Hewlett Packard o Mark Laubach, Broadband Physics o John Lockwood, Stanford University o Daniel Pitt, Santa Clara University Technical Program Committee * Dennis Abts Cray, Inc. * Adnan Aziz University of Texas, Austin * Alan Benner IBM * Keren Bergman Columbia University * Andrea Bianco Politecnico di Torino * Piero Castoldi Scuola Superiore Sant'Anna * Sarang Dharmapurikar Nuova Systems * Hans Eberle Sun * Wu-chun Feng Virginia Tech * Juan Fernandez University of Murcia * Ada Gavrilovska Georgia Institute of Technology * Paolo Giaccone Politecnico di Torino * Mitchell Gusat IBM Zurich Research Laboratory * Ron Ho Sun Microsystems Laboratories * Doan Hoang University of Technology, Sydney * Jayasimha Jay Intel * Isaac Keslassy Technion * Venkata Krishnan Dolphin Interconnect Solutions * Tal Lavian Nortel Networks Labs, UC Berkeley * Bill Lin University of California, San Diego * Olav Lysne Simula Research Laboratory * Pankaj Mehra HP Labs * Rami Melhem University of Pittsburgh * Cyriel Minkenberg IBM Zurich Research Laboratory * Gregory Pfister IBM * Craig Stunkel IBM T.J. Watson Research Center * Anujan Varma University of California at Santa Cruz From mshefty at ichips.intel.com Thu Mar 15 09:06:54 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 15 Mar 2007 09:06:54 -0700 Subject: [ofa-general] [Bug 400] ipoib error messages In-Reply-To: References: <000201c765cc$9c90fd70$8698070a@amr.corp.intel.com> <45F8331B.8030301@ichips.intel.com> Message-ID: <45F96F1E.5050505@ichips.intel.com> Scott Weitzenkamp (sweitzen) wrote: > Yes, I was using 20070311, and I see the patch in 20070312. I'll try > it. Scott, have you had a chance to test with 20070312, and, if so, did you see the mcast detach issue? - Sean From sweitzen at cisco.com Thu Mar 15 09:28:17 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 15 Mar 2007 09:28:17 -0700 Subject: [ofa-general] RE: [Bug 400] ipoib error messages In-Reply-To: <45F96F1E.5050505@ichips.intel.com> References: <000201c765cc$9c90fd70$8698070a@amr.corp.intel.com> <45F8331B.8030301@ichips.intel.com> <45F96F1E.5050505@ichips.intel.com> Message-ID: I'm trying 20070314-0600 right now, changing an IB port every 10 seconds. IPoIB failover is usually pretty smooth, although occasionally it takes longer. Interim result: 4428.13 10^6bits/s over 1.00 seconds Interim result: 4439.57 10^6bits/s over 1.00 seconds Interim result: 4435.82 10^6bits/s over 1.00 seconds Interim result: 4430.15 10^6bits/s over 1.00 seconds Interim result: 1824.52 10^6bits/s over 2.43 seconds Interim result: 4440.04 10^6bits/s over 1.00 seconds Interim result: 4453.72 10^6bits/s over 1.00 seconds Interim result: 4458.34 10^6bits/s over 1.00 seconds Interim result: 4472.58 10^6bits/s over 1.00 seconds Interim result: 4464.64 10^6bits/s over 1.00 seconds Interim result: 4466.08 10^6bits/s over 1.00 seconds Interim result: 4464.80 10^6bits/s over 1.00 seconds Interim result: 4465.03 10^6bits/s over 1.00 seconds Interim result: 4456.73 10^6bits/s over 1.00 seconds Interim result: 4461.28 10^6bits/s over 1.00 seconds Interim result: 1834.37 10^6bits/s over 2.43 seconds Interim result: 4452.28 10^6bits/s over 1.00 seconds Interim result: 4462.01 10^6bits/s over 1.00 seconds Interim result: 4464.04 10^6bits/s over 1.00 seconds Interim result: 4461.37 10^6bits/s over 1.00 seconds Interim result: 4466.25 10^6bits/s over 1.00 seconds Interim result: 4461.43 10^6bits/s over 1.00 seconds Interim result: 1827.84 10^6bits/s over 2.44 seconds Interim result: 4433.90 10^6bits/s over 1.00 seconds Interim result: 4430.12 10^6bits/s over 1.00 seconds Interim result: 4430.62 10^6bits/s over 1.00 seconds Interim result: 4423.39 10^6bits/s over 1.00 seconds Interim result: 4430.59 10^6bits/s over 1.00 seconds Interim result: 4435.61 10^6bits/s over 1.00 seconds Interim result: 4441.29 10^6bits/s over 1.00 seconds Interim result: 4438.41 10^6bits/s over 1.00 seconds Interim result: 4449.77 10^6bits/s over 1.00 seconds Interim result: 4435.04 10^6bits/s over 1.00 seconds Interim result: 4432.16 10^6bits/s over 1.00 seconds Interim result: 4435.56 10^6bits/s over 1.00 seconds Interim result: 4433.32 10^6bits/s over 1.00 seconds Interim result: 1769.61 10^6bits/s over 2.51 seconds Interim result: 4423.30 10^6bits/s over 1.00 seconds Interim result: 4425.79 10^6bits/s over 1.00 seconds Interim result: 251.50 10^6bits/s over 17.60 seconds Interim result: 4425.51 10^6bits/s over 1.00 seconds Interim result: 4436.75 10^6bits/s over 1.00 seconds Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Thursday, March 15, 2007 9:07 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Michael S. Tsirkin; Sean Hefty; general at lists.openfabrics.org > Subject: [Bug 400] ipoib error messages > > Scott Weitzenkamp (sweitzen) wrote: > > Yes, I was using 20070311, and I see the patch in 20070312. > I'll try > > it. > > Scott, have you had a chance to test with 20070312, and, if > so, did you see the > mcast detach issue? > > - Sean > From robert.j.woodruff at intel.com Thu Mar 15 11:14:31 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Thu, 15 Mar 2007 11:14:31 -0700 Subject: [ofa-general] RE: [ewg] OFED 1.2 beta release - IPoIB bug In-Reply-To: Message-ID: I just loaded OFED 1.2 beta on my cluster, Redhat EL5-U4 2.6.9-42EL kernel and I got this messages in dmesg. woody ipoib_neigh_destructor device lo type 772 Badness in ipoib_neigh_destructor at /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/ipoib/ ipoib_main.c:858 Call Trace: {neigh_destroy+197} {dst_destroy+92} {dst_run_gc+0} {dst_run_gc+100} {run_timer_softirq+356} {__do_softirq+88} {do_softirq+49} {apic_timer_interrupt+133} {mwait_idle+86} {cpu_idle+26} {start_kernel+470} {_sinittext+469} ipoib_neigh_destructor device lo type 772 Badness in ipoib_neigh_destructor at /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/ipoib/ ipoib_main.c:858 Call Trace: {neigh_destroy+197} {dst_destroy+92} {dst_run_gc+0} {dst_run_gc+100} {run_timer_softirq+356} {__do_softirq+88} {do_softirq+49} {apic_timer_interrupt+133} {mwait_idle+86} {cpu_idle+26} {start_kernel+470} {_sinittext+469} From arlin.r.davis at intel.com Thu Mar 15 11:22:47 2007 From: arlin.r.davis at intel.com (Arlin Davis) Date: Thu, 15 Mar 2007 11:22:47 -0700 Subject: [ofa-general] [PATCH] uDAPL - dapltest - fix build issues on SUSE10 ia64 Message-ID: <000001c7672e$ef35de70$9f97070a@amr.corp.intel.com> Fixes dapltest build problems on SUSE ia64 (applied - master and ofed_1_2) Signed-off by: Arlin Davis ardavis at ichips.intel.com diff --git a/test/dapltest/mdep/linux/dapl_mdep_user.h b/test/dapltest/mdep/linux/dapl_mdep_user.h old mode 100644 new mode 100755 index 2903e78..37a2bd4 --- a/test/dapltest/mdep/linux/dapl_mdep_user.h +++ b/test/dapltest/mdep/linux/dapl_mdep_user.h @@ -43,10 +43,6 @@ #include #include -#ifdef __ia64__ -#include -#endif - /* Default Device Name */ #define DT_MdepDeviceName "ia0a" @@ -124,10 +120,9 @@ DT_Mdep_GetTimeStamp ( void ) #else #ifdef __ia64__ - unsigned long x; - - x = get_cycles (); - return x; + unsigned long ret; + __asm__ __volatile__ ("mov %0=ar.itc" : "=r"(ret)); + return ret; #else #if defined(__PPC64__) unsigned int tbl, tbu0, tbu1; From mst at dev.mellanox.co.il Thu Mar 15 11:55:29 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Mar 2007 20:55:29 +0200 Subject: [ofa-general] Re: [ewg] OFED 1.2 beta release - IPoIB bug In-Reply-To: References: Message-ID: <20070315185514.GA7294@mellanox.co.il> > Quoting Woodruff, Robert J : > Subject: RE: [ewg] OFED 1.2 beta release - IPoIB bug > > I just loaded OFED 1.2 beta on my cluster, Redhat EL5-U4 > 2.6.9-42EL kernel and I got this messages in dmesg. > > woody > > ipoib_neigh_destructor device lo type 772 > Badness in ipoib_neigh_destructor at > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/ipoib/ > ipoib_main.c:858 The message itself is benign. This indicates you see a problem in ipoib that is still under debug. -- MST From bugzilla-daemon at lists.openfabrics.org Thu Mar 15 16:56:40 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 15 Mar 2007 16:56:40 -0700 (PDT) Subject: [ofa-general] [Bug 447] ib_ipoib kernel 2.6.9-34 panic when routing to 10G ethernet In-Reply-To: Message-ID: <20070315235640.8E713E603BE@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=447 ------- Comment #3 from DarylGrunau at gmail.com 2007-03-15 16:56 ------- In spite of version-skew cleanup we are still experiencing kernel panics on our I/O nodes. I'll inline one of the latest stack traces for reference: Kernel BUG at dev:1121 invalid operand: 0000 [1] SMP CPU 4 Modules linked in: myri10ge(U) rdma_ucm(U) rdma_cm(U) ib_addr(U) ib_ipoib(U) ib_ipath(U) ib_mthca(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) bluesmoke_k8 bluesmoke_mc perfctr ipmi_devintf ipmi_si ipmi_msghandler bnx2 ext3 jbd nfs lockd nfs_acl sunrpc Pid: 0, comm: swapper Not tainted 2.6.9-34.ELsmp.lanl RIP: 0010:[] {__skb_linearize+62} RSP: 0018:000001041ffbbcf8 EFLAGS: 00010203 RAX: 0000000000000001 RBX: 000000000000001c RCX: 000001061fec6480 RDX: 00000000ffffffdc RSI: 0000000000000220 RDI: 000001061fec6400 RBP: 000001021f5b5d40 R08: 0000000000000000 R09: 000000000000003c R10: 0000000000000000 R11: 0000000000000000 R12: 000001041ff78280 R13: 0000000000000000 R14: 000001021ebd0000 R15: 0000000000000000 FS: 0000002a958a0b00(0000) GS:ffffffff804d8480(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000002a9556c000 CR3: 00000000dfcac000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo 000001061ff96000, task 0000010220032800) Stack: 000001021ebd0000 000001021ebd0000 00000000fffffff4 000001041ff78280 0000000000000000 ffffffff802ab133 000001021f5b5d40 000001041ff782c0 000001021f5b5d40 ffffffff802b01c8 Call Trace: {dev_queue_xmit+93} {neigh_resolve_output+578} {neigh_update+626} {arp_process+1257} {__mod_timer+293} {netif_receive_skb+590} {process_backlog+136} {net_rx_action+129} {__do_softirq+88} {do_softirq+49} {do_IRQ+328} {ret_from_intr+0} {default_idle+0} {default_idle+32} {cpu_idle+26} Code: 0f 0b 41 ee 31 80 ff ff ff ff 61 04 85 d2 b8 00 00 00 00 0f RIP {__skb_linearize+62} RSP <000001041ffbbcf8> <0>Kernel panic - not syncing: Oops An IBM engineer from the Linux Technology Center has also been looking into our problems and writes the following conjecture: 1) one adapter (probably with large MTU) allocates a receive skb that has multiple memory buffers [ok] 2) an ARP request is received in that skb [ok] 3) driver delivers this buffer to upper layer, but skb is marked as shared [NOT OK] 4) ARP re-uses that same skb to respond to the ARP request [ok] 5) outgoing device does not support scatter/gather on output, so output packet goes to skb_linearize() 6) panic(), because shared buffers are not allowed in skb_linearize() A couple things to note: A) No buffer delivered to the upper layers from a driver should be shared. It only matters (and is only checked) in a few cases -- ARP, ICMP, and some IPSEC cases -- but it is incorrect because of those cases. B) On Linux, the interface that received the ARP request is not necessarily the interface that sent the request. Since most interfaces that support scatter/gather on input also support it on output, this is probably a case where it received an ARP request for an Infiniband interface but received it on a jumbo-frame Ethernet NIC, and is sending the response back via the Infiniband interface. This can happen if both machines are connected to both networks. If that is the case, a simple workaround is to force ARP responses to only be sent on the interface on which it was received by: sysctl -w net.ipv4.conf.all.arp_filter=2 C) The code in the lower layers that is causing the skb to be shared would be a call to skb_clone(). One way of preventing this is to make all calls to skb_clone() that have nr_frags nonzero make a new copy of the buffer instead. Later kernel versions also have a version of skb_linearize that allows shared skb's. If this code was backported from a later kernel version, that may be where the bug came from in the first place. We will likely install an instrumented kernel to show what is happening in skb_linearize. Any further comments about above appreciated -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From vlad at lists.openfabrics.org Fri Mar 16 02:34:37 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Fri, 16 Mar 2007 02:34:37 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070316-0200 daily build status Message-ID: <20070316093437.953E4E60804@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.19 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.16 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.14 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.13 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.19 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.17 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.12 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.14 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From dledford at redhat.com Fri Mar 16 07:05:27 2007 From: dledford at redhat.com (Doug Ledford) Date: Fri, 16 Mar 2007 10:05:27 -0400 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> Message-ID: <1174053927.4673.60.camel@athlon-x2.xsintricity.com> On Fri, 2007-03-02 at 20:42 -0500, Jeff Squyres wrote: > To be totally clear, there are three issues: > > 1. *NOT AN MPI ISSUE*: base location of the stack. Doug has > repeatedly mentioned that /usr/local/ofed is not good. This is a > group issue to decide. As long as the base OFED stack is /usr/local/ofed, if someone calls Red Hat support to get IB help with RHEL5 or RHEL4U5, they will be told that they must first delete all locally built OFED RPMs from the system. It simply isn't realistic for us to try and support a system where conflicting libraries can exist in different locations and attempts to resolve the problem could end up being fruitless simply because the wrong library is getting linked in behind our backs. > 2. *NOT AN MPI ISSUE*: how the RPMs are built is Bad(tm). Not > deleting the buildroot is Bad; munging %build into %install is > Bad; ...etc. This needs to change. 4 choices jump to mind: > > a. Keep the same scheme. Ick. > b. Install while we build (i.e., the normal way to build a pile > of interdependent RPMs) > c. Use chroot (Red Hat does this in their internal setup, for > example) > d. Only distribute binary RPMs for supported platforms; source is > available for those who want it. d. is the normal route for anyone wanting to provide a known working environment. Building locally is fraught with perils related to custom compilers, custom core libraries, and other things that the EWG can't control and can't realistically support. > 3. Doug's final point about allowing multiple MPI's to play > harmoniously on a single system is obviously an MPI issue. The /etc/ > alternatives mechanism is not really good enough (IMHO) for this -- / > etc/alternatives is about choosing one implementation and making > everyone use it. The problem is that when multiple MPI's are > installed on a single system, people need all of them (some users > prefer one over the other, but much more important, some apps are > only certified with one MPI or another). Correct. You need the various MPI stacks to all be usable at the same time. The alternatives system doesn't really provide for this, especially since OpenMPI expects to know how to behave based upon argv[0]. > The mpi-selector tool we > introduced in OFED 1.2 will likely be "good enough" for this purpose, > but we can also work on integrating the /etc/alternatives stuff if > desired, particularly for those who only need/want one MPI > implementation. We implemented unique executable names with symlinks to the unique name that provide a working argv[0] to OpenMPI regardless of what MPI is the default. So, for instance, if you want to use i386 OpenMPI on x86_64, you can do this: export PATH=/usr/share/openmpi/bin32:$PATH mpicc -o blah blah.c and things just work. The /usr/share/openmpi/bin32 directory has the right symlinks to the file in /usr/bin to make it happen. Now, that being said, I'm not really happy with that solution and would prefer to have a solution that works for all the MPIs. The only FHS compliant location that I know of where we can put a bin directory under our parent directory is /opt. So, I would suggest that for OpenMPI at least, we standardize on it going into /opt/openmpi/$VERSION_MAJOR-$CC and under that we have a bin (only need one bin if we can get the single binaries to support both 32/64 bit operation via a command line switch), lib, lib64, share, man, etc. Then users can do the same basic thing as above, but using a path in /opt. The alternatives system could link the system wide default to the binaries in /opt easy enough. That allows both a system wide and user specified version to work seamlessly. > > On Feb 28, 2007, at 12:56 PM, Doug Ledford wrote: > > > On Wed, 2007-02-28 at 16:10 +0200, Tziporet Koren wrote: > >> * Improved RPM usage by the install will not be part of OFED > >> 1.2 > > > > Since I first brought this up, you have added new libraries, iWARP > > support, etc. These constitute new RPMs. And, because you guys have > > been doing things contrary to standards like the file hierarchy > > standard > > in the original RPMs, it's been carried forward to these new RPMs. > > This > > is a snowball, and the longer you put off fixing it, the harder it > > gets > > to change. And not just in your RPMs either. The longer you put off > > coming up with a reasonable standard for MPI library and executable > > file > > locations, the longer customers will hand roll their own site specific > > setups, and the harder it will be to get them to switch over to the > > standard once you *do* implement it. You may end up dooming Jeff to > > maintaining those custom file location hacks in the OpenMPI spec > > forever. > > > > Not to mention that interoperability is about more than one machine > > talking to another machine. It's also about a customer's application > > building properly on different versions of the stack, without the > > customer needing to change all the include file locations and link > > parameters. It's also about a customer being able to rest assured > > that > > if they tried to install two conflicting copies of libibverbs, it > > would > > in fact cause RPM to throw conflict errors (which it doesn't now > > because > > your libibverbs is in /usr/local, where I'm not allowed to put > > ours, so > > since the files are in different locations, rpm will happily let the > > user install both your libibverbs and my libibverbs without a > > conflict, > > and a customer could waste large amounts of time trying to track > > down a > > bug in one library only to find out their application is linking > > against > > the other). > > > >> * The RPM usage will be enhanced for the next (1.3) > >> release and we will decide on the correct way in > >> Sonoma. > > > > > > > > There's not really much to decide. Either the stack is Linux File > > Hierarchy Standard compliant or it isn't. The only leeway for > > decisions > > allowed by the standard is on things like where in /etc to put the > > config files (since you guys are striving to be a generic RDMA stack, > > not just an IB stack, I would suggest that all RDMA related config > > files > > go into /etc/rdma, and for those applications that can reasonably > > be run > > absent RDMA technology, like OpenMPI, I would separate their config > > files off into either /etc or /etc/openmpi, ditto for the include > > directories, /usr/include/rdma for the generic non-IB specific stuff, > > and possibly /usr/include/rdma/infiniband for IB specific stuff, or > > you > > could put the IB stuff under /usr/include/infiniband, either way). > > > > The biggest variation from the spec that needs to be dealt with is the > > need for multiple MPI installations, which is problematic if you just > > use generic locations as it stands today, but with a few modifications > > to the MPI stack it could be worked around. > > > > > > -- > > Doug Ledford > > GPG KeyID: CFBFF194 > > http://people.redhat.com/dledford > > > > Infiniband specific RPMs available at > > http://people.redhat.com/dledford/Infiniband > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > > openib-general > > -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From jlentini at netapp.com Fri Mar 16 07:47:58 2007 From: jlentini at netapp.com (James Lentini) Date: Fri, 16 Mar 2007 10:47:58 -0400 (EDT) Subject: [ofa-general] RE: [Bug 408] dapltest compilation fails on x86 [PATCH] In-Reply-To: <000001c76108$07829720$4297070a@amr.corp.intel.com> References: <000001c76108$07829720$4297070a@amr.corp.intel.com> Message-ID: On Wed, 7 Mar 2007, Arlin Davis wrote: > > James, please review this fix for dapltest build issue. Hi Arlin, Just returning from vacation and catching up on email. > Signed-off by: Arlin Davis ardavis at ichips.intel.com > > diff --git a/test/dapltest/mdep/linux/dapl_mdep_user.h b/test/dapltest/mdep/linux/dapl_mdep_user.h > index c05dd30..2903e78 100644 > --- a/test/dapltest/mdep/linux/dapl_mdep_user.h > +++ b/test/dapltest/mdep/linux/dapl_mdep_user.h > @@ -117,7 +117,7 @@ typedef unsigned long long int DT_Mdep_TimeStamp; > static _INLINE_ DT_Mdep_TimeStamp > DT_Mdep_GetTimeStamp ( void ) > { > -#if defined(__GNUC__) && defined(__PENTIUM__) > +#if defined(__GNUC__) && defined(__i386__) > DT_Mdep_TimeStamp x; > __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x)); This is the opcode for a RDTSC instruction. My copy of the "Intel 64 and IA-32 Architectures Software Developer's Manual" says that the RDTSC instruction was introduced by the Pentium. Although I don't think anyone is going to use RDMA on a 486, making the change above would create code that attempted to use an RDTSC instruction on pre-Pentium processors. What is the processor that is being excluded by the Pentium test? Would it be possible to recode the compile guard to check for either Pentium or X? As a side note, the code could use the rdtsc mnemonic, as we do below, instead of the raw opcode. > return x; > @@ -143,7 +143,7 @@ DT_Mdep_GetTimeStamp ( void ) > asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); > return ((unsigned long)__a) | (((unsigned long)__d)<<32); > #else > -#error "Non-Pentium and Non-PPC Linux - unimplemented" > +#error "Linux CPU architecture - unimplemented" > #endif > #endif > #endif From mst at dev.mellanox.co.il Fri Mar 16 08:13:36 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 16 Mar 2007 17:13:36 +0200 Subject: [ofa-general] Re: [Bug 408] dapltest compilation fails on x86 [PATCH] In-Reply-To: References: <000001c76108$07829720$4297070a@amr.corp.intel.com> Message-ID: <20070316151336.GB10725@mellanox.co.il> > Quoting James Lentini : > Subject: RE: [Bug 408] dapltest compilation fails on x86 [PATCH] > > > > On Wed, 7 Mar 2007, Arlin Davis wrote: > > > > > James, please review this fix for dapltest build issue. > > Hi Arlin, > > Just returning from vacation and catching up on email. > > > Signed-off by: Arlin Davis ardavis at ichips.intel.com > > > > diff --git a/test/dapltest/mdep/linux/dapl_mdep_user.h b/test/dapltest/mdep/linux/dapl_mdep_user.h > > index c05dd30..2903e78 100644 > > --- a/test/dapltest/mdep/linux/dapl_mdep_user.h > > +++ b/test/dapltest/mdep/linux/dapl_mdep_user.h > > @@ -117,7 +117,7 @@ typedef unsigned long long int DT_Mdep_TimeStamp; > > static _INLINE_ DT_Mdep_TimeStamp > > DT_Mdep_GetTimeStamp ( void ) > > { > > -#if defined(__GNUC__) && defined(__PENTIUM__) > > +#if defined(__GNUC__) && defined(__i386__) > > DT_Mdep_TimeStamp x; > > __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x)); > > This is the opcode for a RDTSC instruction. My copy of the "Intel 64 > and IA-32 Architectures Software Developer's Manual" says that the > RDTSC instruction was introduced by the Pentium. Although I don't > think anyone is going to use RDMA on a 486, making the change > above would create code that attempted to use an RDTSC instruction on > pre-Pentium processors. > > What is the processor that is being excluded by the Pentium test? > Would it be possible to recode the compile guard to check for either > Pentium or X? > > As a side note, the code could use the rdtsc mnemonic, as we do below, > instead of the raw opcode. > > > return x; > > @@ -143,7 +143,7 @@ DT_Mdep_GetTimeStamp ( void ) > > asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); > > return ((unsigned long)__a) | (((unsigned long)__d)<<32); > > #else > > -#error "Non-Pentium and Non-PPC Linux - unimplemented" > > +#error "Linux CPU architecture - unimplemented" > > #endif > > #endif > > #endif You can't detect a pentium processor at compile-time, and testing for __PENTIUM__ is the wrong thing to do - you are actually testing for pentium-only optimizations being enabled in compiler. You really must test for processor type at run time. -- MST From jlentini at netapp.com Fri Mar 16 09:20:44 2007 From: jlentini at netapp.com (James Lentini) Date: Fri, 16 Mar 2007 12:20:44 -0400 (EDT) Subject: [ofa-general] Re: [Bug 408] dapltest compilation fails on x86 [PATCH] In-Reply-To: <20070316151336.GB10725@mellanox.co.il> References: <000001c76108$07829720$4297070a@amr.corp.intel.com> <20070316151336.GB10725@mellanox.co.il> Message-ID: On Fri, 16 Mar 2007, Michael S. Tsirkin wrote: > > Quoting James Lentini : > > Subject: RE: [Bug 408] dapltest compilation fails on x86 [PATCH] > > > > > > > > On Wed, 7 Mar 2007, Arlin Davis wrote: > > > > > > > > James, please review this fix for dapltest build issue. > > > > Hi Arlin, > > > > Just returning from vacation and catching up on email. > > > > > Signed-off by: Arlin Davis ardavis at ichips.intel.com > > > > > > diff --git a/test/dapltest/mdep/linux/dapl_mdep_user.h b/test/dapltest/mdep/linux/dapl_mdep_user.h > > > index c05dd30..2903e78 100644 > > > --- a/test/dapltest/mdep/linux/dapl_mdep_user.h > > > +++ b/test/dapltest/mdep/linux/dapl_mdep_user.h > > > @@ -117,7 +117,7 @@ typedef unsigned long long int DT_Mdep_TimeStamp; > > > static _INLINE_ DT_Mdep_TimeStamp > > > DT_Mdep_GetTimeStamp ( void ) > > > { > > > -#if defined(__GNUC__) && defined(__PENTIUM__) > > > +#if defined(__GNUC__) && defined(__i386__) > > > DT_Mdep_TimeStamp x; > > > __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x)); > > > > This is the opcode for a RDTSC instruction. My copy of the "Intel 64 > > and IA-32 Architectures Software Developer's Manual" says that the > > RDTSC instruction was introduced by the Pentium. Although I don't > > think anyone is going to use RDMA on a 486, making the change > > above would create code that attempted to use an RDTSC instruction on > > pre-Pentium processors. > > > > What is the processor that is being excluded by the Pentium test? > > Would it be possible to recode the compile guard to check for either > > Pentium or X? > > > > As a side note, the code could use the rdtsc mnemonic, as we do below, > > instead of the raw opcode. > > > > > return x; > > > @@ -143,7 +143,7 @@ DT_Mdep_GetTimeStamp ( void ) > > > asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); > > > return ((unsigned long)__a) | (((unsigned long)__d)<<32); > > > #else > > > -#error "Non-Pentium and Non-PPC Linux - unimplemented" > > > +#error "Linux CPU architecture - unimplemented" > > > #endif > > > #endif > > > #endif > > You can't detect a pentium processor at compile-time, and testing > for __PENTIUM__ is the wrong thing to do - you are actually testing > for pentium-only optimizations being enabled in compiler. > > You really must test for processor type at run time. How do you suggest performing the run time check? Parse /proc/cpuinfo for the "tsc" string in the flags field, use assembly programming to check the eflags & issue a CPUID instruction,...? We'd like something easy to maintain. From vlad at lists.openfabrics.org Sat Mar 17 02:34:36 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sat, 17 Mar 2007 02:34:36 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070317-0200 daily build status Message-ID: <20070317093437.5C618E6080B@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.13 Passed on x86_64 with linux-2.6.20 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.12 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.14 Passed on powerpc with linux-2.6.13 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.16 Passed on powerpc with linux-2.6.14 Passed on ia64 with linux-2.6.15 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ia64 with linux-2.6.16.21-0.8-default Failed: From mst at dev.mellanox.co.il Sat Mar 17 10:35:31 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sat, 17 Mar 2007 19:35:31 +0200 Subject: [ofa-general] Re: [Bug 408] dapltest compilation fails on x86 [PATCH] In-Reply-To: References: <000001c76108$07829720$4297070a@amr.corp.intel.com> <20070316151336.GB10725@mellanox.co.il> Message-ID: <20070317173531.GC10725@mellanox.co.il> > > > > @@ -143,7 +143,7 @@ DT_Mdep_GetTimeStamp ( void ) > > > > asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); > > > > return ((unsigned long)__a) | (((unsigned long)__d)<<32); > > > > #else > > > > -#error "Non-Pentium and Non-PPC Linux - unimplemented" > > > > +#error "Linux CPU architecture - unimplemented" > > > > #endif > > > > #endif > > > > #endif > > > > You can't detect a pentium processor at compile-time, and testing > > for __PENTIUM__ is the wrong thing to do - you are actually testing > > for pentium-only optimizations being enabled in compiler. > > > > You really must test for processor type at run time. > > How do you suggest performing the run time check? > > Parse /proc/cpuinfo for the "tsc" string in the flags field, use > assembly programming to check the eflags & issue a CPUID > instruction,...? > > We'd like something easy to maintain. I think either if these methods will do. Assembly will probably be shorter and have less chances to break with kernel changes. -- MST From mst at dev.mellanox.co.il Sat Mar 17 14:35:48 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sat, 17 Mar 2007 23:35:48 +0200 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <1174053927.4673.60.camel@athlon-x2.xsintricity.com> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> Message-ID: <20070317213516.GC4466@mellanox.co.il> > > 2. *NOT AN MPI ISSUE*: how the RPMs are built is Bad(tm). Not > > deleting the buildroot is Bad; munging %build into %install is > > Bad; ...etc. This needs to change. 4 choices jump to mind: > > > > a. Keep the same scheme. Ick. > > b. Install while we build (i.e., the normal way to build a pile > > of interdependent RPMs) > > c. Use chroot (Red Hat does this in their internal setup, for > > example) > > d. Only distribute binary RPMs for supported platforms; source is > > available for those who want it. > > d. is the normal route for anyone wanting to provide a known working > environment. Building locally is fraught with perils related to custom > compilers, custom core libraries, and other things that the EWG can't > control and can't realistically support. I don't think d is realistic simply because OFED is not redhat, it needs to be distribution agnostic. In our experience people *want* to use custom compilers, custom core libraries etc. Mostly things work smoothly. We can and do support this. -- MST From mst at dev.mellanox.co.il Sat Mar 17 15:13:16 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 00:13:16 +0200 Subject: Fwd: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <20070317213516.GC4466@mellanox.co.il> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317213516.GC4466@mellanox.co.il> Message-ID: <20070317221316.GF4466@mellanox.co.il> forwarding to new list addresses. Quoting Michael S. Tsirkin : Subject: Re: [ofa-general] OFED 1.2 Feb-26 meeting summary > > 2. *NOT AN MPI ISSUE*: how the RPMs are built is Bad(tm). Not > > deleting the buildroot is Bad; munging %build into %install is > > Bad; ...etc. This needs to change. 4 choices jump to mind: > > > > a. Keep the same scheme. Ick. > > b. Install while we build (i.e., the normal way to build a pile > > of interdependent RPMs) > > c. Use chroot (Red Hat does this in their internal setup, for > > example) > > d. Only distribute binary RPMs for supported platforms; source is > > available for those who want it. > > d. is the normal route for anyone wanting to provide a known working > environment. Building locally is fraught with perils related to custom > compilers, custom core libraries, and other things that the EWG can't > control and can't realistically support. I don't think d is realistic simply because OFED is not redhat, it needs to be distribution agnostic. In our experience people *want* to use custom compilers, custom core libraries etc. Mostly things work smoothly. We can and do support this. -- MST -- MST From mst at dev.mellanox.co.il Sat Mar 17 15:25:52 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 00:25:52 +0200 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <1174053927.4673.60.camel@athlon-x2.xsintricity.com> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> Message-ID: <20070317222552.GG4466@mellanox.co.il> > Quoting Doug Ledford : > Subject: Re: [ofa-general] OFED 1.2 Feb-26 meeting summary > > On Fri, 2007-03-02 at 20:42 -0500, Jeff Squyres wrote: > > To be totally clear, there are three issues: > > > > 1. *NOT AN MPI ISSUE*: base location of the stack. Doug has > > repeatedly mentioned that /usr/local/ofed is not good. This is a > > group issue to decide. > > As long as the base OFED stack is /usr/local/ofed, if someone calls Red > Hat support to get IB help with RHEL5 or RHEL4U5, they will be told that > they must first delete all locally built OFED RPMs from the system. It > simply isn't realistic for us to try and support a system where > conflicting libraries can exist in different locations and attempts to > resolve the problem could end up being fruitless simply because the > wrong library is getting linked in behind our backs. I think the prefix is easily configurable. So I think we should just say in the readme note that prefix should be /usr if one wants to get redhat support for infiniband. -- MST From vlad at lists.openfabrics.org Sun Mar 18 02:34:50 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sun, 18 Mar 2007 02:34:50 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070318-0200 daily build status Message-ID: <20070318093450.87820E6080E@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.14 Passed on powerpc with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.15 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From jsquyres at cisco.com Sun Mar 18 04:01:38 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Sun, 18 Mar 2007 07:01:38 -0400 Subject: [ewg] Fwd: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <20070317221316.GF4466@mellanox.co.il> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317213516.GC4466@mellanox.co.il> <20070317221316.GF4466@mellanox.co.il> Message-ID: <4621D678-4F8A-421F-B042-855F4F753E86@cisco.com> It seems odd to me that you [repeatedly] brush off several members of the community that are saying that it's *not* working smoothly enough. 1. We're doing things in the installer that are very much *not* what any Linux distro wants us to do (e.g., munge %build into %install). 2. RHEL and SLES -- two of our Big community targets -- are replacing all of our installer work with their own. 3. The MPI packages all have to do weird (read: non-standard and potentially hazardous) things to get installed properly. This is not the first time that Doug and I have tried to say "what we're doing is wrong!" More below. On Mar 17, 2007, at 6:13 PM, Michael S. Tsirkin wrote: >>> 2. *NOT AN MPI ISSUE*: how the RPMs are built is Bad(tm). Not >>> deleting the buildroot is Bad; munging %build into %install is >>> Bad; ...etc. This needs to change. 4 choices jump to mind: >>> >>> a. Keep the same scheme. Ick. >>> b. Install while we build (i.e., the normal way to build a pile >>> of interdependent RPMs) >>> c. Use chroot (Red Hat does this in their internal setup, for >>> example) >>> d. Only distribute binary RPMs for supported platforms; >>> source is >>> available for those who want it. >> >> d. is the normal route for anyone wanting to provide a known working >> environment. Building locally is fraught with perils related to >> custom >> compilers, custom core libraries, and other things that the EWG can't >> control and can't realistically support. > > I don't think d is realistic simply because OFED is not redhat, it > needs to be distribution agnostic. But OFED is *not* distribution agnostic. We have a specific, documented set of distributions that we support. Having the source code available is great, of course. But Cisco, for example, supports only a specific set of distros/versions and we distribute binaries for them. I believe that others may be doing the same...? > In our experience people *want* to use custom compilers, > custom core libraries etc. Do you have customers who build the OFA code base with non-GNU compilers? Right now, the OFED installer only lets you choose none- GNU compilers for the MPI installations -- not the OFA code base itself. If this is your strongest point, then refer to what I said above: a) it's the MPI implementations that are complaining that what we are doing is Bad b) it's the MPI implementations that have to do weird/non-standard/ potentially hazardous things to get installed properly -- Jeff Squyres Cisco Systems From jsquyres at cisco.com Sun Mar 18 04:04:51 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Sun, 18 Mar 2007 07:04:51 -0400 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <20070317222552.GG4466@mellanox.co.il> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317222552.GG4466@mellanox.co.il> Message-ID: On Mar 17, 2007, at 6:25 PM, Michael S. Tsirkin wrote: >>> 1. *NOT AN MPI ISSUE*: base location of the stack. Doug has >>> repeatedly mentioned that /usr/local/ofed is not good. This is a >>> group issue to decide. >> >> As long as the base OFED stack is /usr/local/ofed, if someone >> calls Red >> Hat support to get IB help with RHEL5 or RHEL4U5, they will be >> told that >> they must first delete all locally built OFED RPMs from the >> system. It >> simply isn't realistic for us to try and support a system where >> conflicting libraries can exist in different locations and >> attempts to >> resolve the problem could end up being fruitless simply because the >> wrong library is getting linked in behind our backs. > > I think the prefix is easily configurable. > So I think we should just say in the readme note that prefix > should be /usr if one wants to get redhat support for infiniband. I think you're missing Doug's point. There is currently no mechanism for the user to know that they're installing 2 potentially conflicting versions of the same software (OFED). For example, I have a suspicion that a current P1 bug (https:// bugs.openfabrics.org/show_bug.cgi?id=461) is due to the fact that RHEL / SLES's OFED is installed when QLogic is trying to install the community OFED 1.2 (won't know more until Tuesday -- it's a long/ holiday weekend in India). If this is correct, it's *another* example of why our installer is leading to Bad/potentially hazardous practices. -- Jeff Squyres Cisco Systems From mst at dev.mellanox.co.il Sun Mar 18 04:15:33 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 13:15:33 +0200 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317222552.GG4466@mellanox.co.il> Message-ID: <20070318111533.GF2862@mellanox.co.il> > I think you're missing Doug's point. There is currently no mechanism > for the user to know that they're installing 2 potentially > conflicting versions of the same software (OFED). That's a good point, although not entirely correct. AFAIK OFED installer currently attempts to detect and warn about conflicting libraries, this logic probably can be improved. -- MST From jsquyres at cisco.com Sun Mar 18 04:21:16 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Sun, 18 Mar 2007 07:21:16 -0400 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <20070318111533.GF2862@mellanox.co.il> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317222552.GG4466@mellanox.co.il> <20070318111533.GF2862@mellanox.co.il> Message-ID: <76FDA6C6-C4FC-494D-BBF1-9001A15C3E8C@cisco.com> That seems like chasing our tail: adding more logic/work to replicate a mechanism that is already available (*and* making sure that we keep this logic up-to-date with all the OFED distributions out there -- which seems like a losing proposition). RPM can detect this kind of conflict and prevent it. Why aren't we using it? Oh, right, because we're doing several kinds of non-standard things that preclude us from doing so. :-) On Mar 18, 2007, at 7:15 AM, Michael S. Tsirkin wrote: >> I think you're missing Doug's point. There is currently no mechanism >> for the user to know that they're installing 2 potentially >> conflicting versions of the same software (OFED). > > That's a good point, although not entirely correct. AFAIK OFED > installer > currently attempts to detect and warn about conflicting libraries, > this logic > probably can be improved. > > -- > MST -- Jeff Squyres Cisco Systems From mst at dev.mellanox.co.il Sun Mar 18 04:48:24 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 13:48:24 +0200 Subject: [ewg] Fwd: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <4621D678-4F8A-421F-B042-855F4F753E86@cisco.com> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317213516.GC4466@mellanox.co.il> <20070317221316.GF4466@mellanox.co.il> <4621D678-4F8A-421F-B042-855F4F753E86@cisco.com> Message-ID: <20070318114824.GG2862@mellanox.co.il> > Quoting Jeff Squyres : > Subject: Re: [ewg] Fwd: [ofa-general] OFED 1.2 Feb-26 meeting summary > > It seems odd to me that you [repeatedly] brush off several members of > the community that are saying that it's *not* working smoothly enough. I actually agree there are problems, I just don't necessarily agree with the solution of focusing on RHEL/SLES exclusively. And I do not like writing long missives where a short sentence will do - sorry if this sounds dismissive. > 1. We're doing things in the installer that are very much *not* what > any Linux distro wants us to do (e.g., munge %build into %install). I'm not sure why do we do this, actually. > 2. RHEL and SLES -- two of our Big community targets -- are replacing > all of our installer work with their own. This is probably for the best. I expect they can also throw away a ton of backports and whatnot. > 3. The MPI packages all have to do weird (read: non-standard and > potentially hazardous) things to get installed properly. I think the tricks they do are quite broken, too. No idea how to do it better though. > This is not the first time that Doug and I have tried to say "what > we're doing is wrong!" > > More below. > > > > On Mar 17, 2007, at 6:13 PM, Michael S. Tsirkin wrote: > > >>>2. *NOT AN MPI ISSUE*: how the RPMs are built is Bad(tm). Not > >>>deleting the buildroot is Bad; munging %build into %install is > >>>Bad; ...etc. This needs to change. 4 choices jump to mind: > >>> > >>> a. Keep the same scheme. Ick. > >>> b. Install while we build (i.e., the normal way to build a pile > >>> of interdependent RPMs) > >>> c. Use chroot (Red Hat does this in their internal setup, for example) > >>> d. Only distribute binary RPMs for supported platforms; > >>> source is > >>> available for those who want it. > >> > >>d. is the normal route for anyone wanting to provide a known working > >>environment. Building locally is fraught with perils related to > >>custom compilers, custom core libraries, and other things that the EWG can't > >>control and can't realistically support. > > > >I don't think d is realistic simply because OFED is not redhat, it > >needs to be distribution agnostic. > > But OFED is *not* distribution agnostic. We have a specific, > documented set of distributions that we support. Having the source > code available is great, of course. But Cisco, for example, supports > only a specific set of distros/versions and we distribute binaries > for them. I believe that others may be doing the same...? Is there something that prevents you from doing this? Can't you build with prefix /usr? > >In our experience people *want* to use custom compilers, > >custom core libraries etc. > > Do you have customers who build the OFA code base with non-GNU > compilers? Right now, the OFED installer only lets you choose none- > GNU compilers for the MPI installations -- not the OFA code base > itself. If this is your strongest point, then refer to what I said > above: > > a) it's the MPI implementations that are complaining that what we are > doing is Bad > b) it's the MPI implementations that have to do weird/non-standard/ > potentially hazardous things to get installed properly I know I am very interested in e.g. cross-compiling the OFED core. I'm not too involved with MPI per se. -- MST From mst at dev.mellanox.co.il Sun Mar 18 04:51:43 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 13:51:43 +0200 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <76FDA6C6-C4FC-494D-BBF1-9001A15C3E8C@cisco.com> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317222552.GG4466@mellanox.co.il> <20070318111533.GF2862@mellanox.co.il> <76FDA6C6-C4FC-494D-BBF1-9001A15C3E8C@cisco.com> Message-ID: <20070318115143.GH2862@mellanox.co.il> > On Mar 18, 2007, at 7:15 AM, Michael S. Tsirkin wrote: > > >>I think you're missing Doug's point. There is currently no mechanism > >>for the user to know that they're installing 2 potentially > >>conflicting versions of the same software (OFED). > > > >That's a good point, although not entirely correct. AFAIK OFED installer > >currently attempts to detect and warn about conflicting libraries, this > >logic probably can be improved. > > > > Quoting Jeff Squyres : > Subject: Re: [ofa-general] OFED 1.2 Feb-26 meeting summary > > That seems like chasing our tail: adding more logic/work to replicate > a mechanism that is already available (*and* making sure that we keep > this logic up-to-date with all the OFED distributions out there -- > which seems like a losing proposition). RPM can detect this kind of > conflict and prevent it. Why aren't we using it? > > Oh, right, because we're doing several kinds of non-standard things > that preclude us from doing so. :-) Right. But the user *can* the prefix to /usr, and RPM will detect conflicts then, isn't that right? -- MST From halr at voltaire.com Sun Mar 18 06:22:58 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Mar 2007 08:22:58 -0500 Subject: [ofa-general] Past Conference Presentations on New Web Site Message-ID: <1174224178.4684.278060.camel@hal.voltaire.com> Hi, On the old web site, there used to be a web page with all the past conference presentations. Has this been preserved/ported to the new web site ? If so, there should be an easy way of getting to this from some other high level web page (perhaps Calendar of Events with Past Events/Presentations). If not, I think this should be added back. IMO it is very useful. Thanks. -- Hal From dledford at redhat.com Sun Mar 18 07:43:33 2007 From: dledford at redhat.com (Doug Ledford) Date: Sun, 18 Mar 2007 10:43:33 -0400 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <20070318111533.GF2862@mellanox.co.il> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317222552.GG4466@mellanox.co.il> <20070318111533.GF2862@mellanox.co.il> Message-ID: <1174229013.4673.86.camel@athlon-x2.xsintricity.com> On Sun, 2007-03-18 at 13:15 +0200, Michael S. Tsirkin wrote: > > I think you're missing Doug's point. There is currently no mechanism > > for the user to know that they're installing 2 potentially > > conflicting versions of the same software (OFED). > > That's a good point, although not entirely correct. AFAIK OFED installer > currently attempts to detect and warn about conflicting libraries, this logic > probably can be improved. No, it can't because you don't know that the OFED libraries will come second. They could be there first and then an up2date run might install the Red Hat official libraries. It simply is not a tenable, reasonable situation, quit making excuses for it. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From dledford at redhat.com Sun Mar 18 08:08:07 2007 From: dledford at redhat.com (Doug Ledford) Date: Sun, 18 Mar 2007 11:08:07 -0400 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <20070317213516.GC4466@mellanox.co.il> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317213516.GC4466@mellanox.co.il> Message-ID: <1174230487.4673.94.camel@athlon-x2.xsintricity.com> On Sat, 2007-03-17 at 23:35 +0200, Michael S. Tsirkin wrote: > > > 2. *NOT AN MPI ISSUE*: how the RPMs are built is Bad(tm). Not > > > deleting the buildroot is Bad; munging %build into %install is > > > Bad; ...etc. This needs to change. 4 choices jump to mind: > > > > > > a. Keep the same scheme. Ick. > > > b. Install while we build (i.e., the normal way to build a pile > > > of interdependent RPMs) > > > c. Use chroot (Red Hat does this in their internal setup, for > > > example) > > > d. Only distribute binary RPMs for supported platforms; source is > > > available for those who want it. > > > > d. is the normal route for anyone wanting to provide a known working > > environment. Building locally is fraught with perils related to custom > > compilers, custom core libraries, and other things that the EWG can't > > control and can't realistically support. > > I don't think d is realistic simply because OFED is not redhat, it needs to be > distribution agnostic. So? You test on Red Hat and SuSE, you can easily enough build RPMs for each. Being agnostic does not mean you have to ship source, it's perfectly acceptable/possible to make RPMs for different targets. > In our experience people *want* to use custom compilers, > custom core libraries etc. Really? Then why have people been on me so hard to create official Red Hat RPMs that are fully supported? If they *wanted* to build their own infrastructure for using InfiniBand and other RDMA protocols, they wouldn't care what Red Hat does in regards to that. They want compilers/libs *for their apps*. That might necessitate a few libraries get rebuilt with that custom compiler too, but anything that doesn't *have* to be done for their app/compiler choice to work, they don't *want* to do, they want someone like Red Hat, or the EWG, to handle the rest for them. > Mostly things work smoothly. We can and do support this. Hehehe, OK, so your support policy is "If it works great, if not, oh well?" They don't let me implement that kind of support policy here. It *has* to work. Mostly isn't allowed. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From dledford at redhat.com Sun Mar 18 08:42:45 2007 From: dledford at redhat.com (Doug Ledford) Date: Sun, 18 Mar 2007 11:42:45 -0400 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <20070318115143.GH2862@mellanox.co.il> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317222552.GG4466@mellanox.co.il> <20070318111533.GF2862@mellanox.co.il> <76FDA6C6-C4FC-494D-BBF1-9001A15C3E8C@cisco.com> <20070318115143.GH2862@mellanox.co.il> Message-ID: <1174232565.4673.111.camel@athlon-x2.xsintricity.com> On Sun, 2007-03-18 at 13:51 +0200, Michael S. Tsirkin wrote: > > On Mar 18, 2007, at 7:15 AM, Michael S. Tsirkin wrote: > > > > >>I think you're missing Doug's point. There is currently no mechanism > > >>for the user to know that they're installing 2 potentially > > >>conflicting versions of the same software (OFED). > > > > > >That's a good point, although not entirely correct. AFAIK OFED installer > > >currently attempts to detect and warn about conflicting libraries, this > > >logic probably can be improved. > > > > > > > > Quoting Jeff Squyres : > > Subject: Re: [ofa-general] OFED 1.2 Feb-26 meeting summary > > > > That seems like chasing our tail: adding more logic/work to replicate > > a mechanism that is already available (*and* making sure that we keep > > this logic up-to-date with all the OFED distributions out there -- > > which seems like a losing proposition). RPM can detect this kind of > > conflict and prevent it. Why aren't we using it? > > > > Oh, right, because we're doing several kinds of non-standard things > > that preclude us from doing so. :-) > > Right. But the user *can* the prefix to /usr, and RPM will detect conflicts > then, isn't that right? This is a joke, right? You can't *really* be serious. If you are, then I suggest the EWG change the acronym for OFED to Open Fabrics Experimental Distribution because no enterprise customer I know of would accept the above suggestion that they change the spec file and recompile just to get RPM to do its job as reasonable for an enterprise software package. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mst at dev.mellanox.co.il Sun Mar 18 08:55:32 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 17:55:32 +0200 Subject: [ofa-general] dst_ifdown breaks infiniband? Message-ID: <20070318155532.GG7958@mellanox.co.il> Alexey, Roland, In debugging kernel lockup that occurs with IP over InfiniBand in 2.6.21-rc4: ( https://bugs.openfabrics.org/show_bug.cgi?id=402 ) I noticed the following code in dst_ifdown: /* Dirty hack. We did it in 2.2 (in __dst_free), * we have _very_ good reasons not to repeat * this mistake in 2.3, but we have no choice * now. _It_ _is_ _explicit_ _deliberate_ * _race_ _condition_. * * Commented and originally written by Alexey. */ static inline void dst_ifdown(struct dst_entry *dst, struct net_device *dev, int unregister) { if (dst->ops->ifdown) dst->ops->ifdown(dst, dev, unregister); if (dev != dst->dev) return; if (!unregister) { dst->input = dst_discard_in; dst->output = dst_discard_out; } else { dst->dev = &loopback_dev; dev_hold(&loopback_dev); dev_put(dev); if (dst->neighbour && dst->neighbour->dev == dev) { dst->neighbour->dev = &loopback_dev; dev_put(dev); dev_hold(&loopback_dev); } } } The line dst->neighbour->dev = &loopback_dev breaks IP over InfiniBand, simply because neighbour->parms still points to an entry that has been set up with dev->neigh_setup call from IPoIB neighbour device. So when neighbour->parms->neigh_destructor is called, we get to ipoib_neigh_destructor in drivers/infiniband/ulp/ipoib/ipoib_main.c, and that in turn crashes since it needs an infiniband device in neighbour dev pointer. This is not new code, and should have triggered long time ago, so I am not sure how come we are triggering this only now, but somehow this did not lead to crashes in 2.6.20, but does now in 2.6.21-rc4. Ideas on how to fix this? Why is neighbour->dev changed here? Can dst->neighbour be changed to point to NULL instead, and the neighbour released? Thanks very much, -- MST From mst at dev.mellanox.co.il Sun Mar 18 09:24:35 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 18:24:35 +0200 Subject: [ofa-general] Re: OFED 1.2 beta release - IPoIB bug In-Reply-To: References: Message-ID: <20070318162435.GH7958@mellanox.co.il> > Quoting Woodruff, Robert J : > Subject: RE: OFED 1.2 beta release - IPoIB bug > > I just loaded OFED 1.2 beta on my cluster, Redhat EL5-U4 > 2.6.9-42EL kernel and I got this messages in dmesg. > > woody > > ipoib_neigh_destructor device lo type 772 It's a warning triggered additional sanity check I added in beta1 in an attempt to debug bug 402. It seems to trigger because of an old bug which is now discussed on the general list. -- MST From jsquyres at cisco.com Sun Mar 18 11:02:22 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Sun, 18 Mar 2007 14:02:22 -0400 Subject: [ewg] Fwd: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <20070318114824.GG2862@mellanox.co.il> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317213516.GC4466@mellanox.co.il> <20070317221316.GF4466@mellanox.co.il> <4621D678-4F8A-421F-B042-855F4F753E86@cisco.com> <20070318114824.GG2862@mellanox.co.il> Message-ID: <6C0D2490-FD81-4C5B-A3AD-F436285BC3E1@cisco.com> We can do it better by one of the two methods that has been proposed many times: 1. Use chroot to build everything. 2. Install while we build. Either of these would eliminate a bunch of ugliness from the OMPI portion of the OFED installer, and also eliminate the need for a at least some portion of ugliness that is in the OMPI specfile. On Mar 18, 2007, at 7:48 AM, Michael S. Tsirkin wrote: >> Quoting Jeff Squyres : >> Subject: Re: [ewg] Fwd: [ofa-general] OFED 1.2 Feb-26 meeting summary >> >> It seems odd to me that you [repeatedly] brush off several members of >> the community that are saying that it's *not* working smoothly >> enough. > > I actually agree there are problems, I just don't necessarily agree > with the > solution of focusing on RHEL/SLES exclusively. And I do not like > writing long > missives where a short sentence will do - sorry if this sounds > dismissive. > >> 1. We're doing things in the installer that are very much *not* what >> any Linux distro wants us to do (e.g., munge %build into %install). > > I'm not sure why do we do this, actually. > >> 2. RHEL and SLES -- two of our Big community targets -- are replacing >> all of our installer work with their own. > > This is probably for the best. I expect they can also throw away a ton > of backports and whatnot. > >> 3. The MPI packages all have to do weird (read: non-standard and >> potentially hazardous) things to get installed properly. > > I think the tricks they do are quite broken, too. > No idea how to do it better though. > >> This is not the first time that Doug and I have tried to say "what >> we're doing is wrong!" >> >> More below. >> >> >> >> On Mar 17, 2007, at 6:13 PM, Michael S. Tsirkin wrote: >> >>>>> 2. *NOT AN MPI ISSUE*: how the RPMs are built is Bad(tm). Not >>>>> deleting the buildroot is Bad; munging %build into %install is >>>>> Bad; ...etc. This needs to change. 4 choices jump to mind: >>>>> >>>>> a. Keep the same scheme. Ick. >>>>> b. Install while we build (i.e., the normal way to build a pile >>>>> of interdependent RPMs) >>>>> c. Use chroot (Red Hat does this in their internal setup, >>>>> for example) >>>>> d. Only distribute binary RPMs for supported platforms; >>>>> source is >>>>> available for those who want it. >>>> >>>> d. is the normal route for anyone wanting to provide a known >>>> working >>>> environment. Building locally is fraught with perils related to >>>> custom compilers, custom core libraries, and other things that >>>> the EWG can't >>>> control and can't realistically support. >>> >>> I don't think d is realistic simply because OFED is not redhat, it >>> needs to be distribution agnostic. >> >> But OFED is *not* distribution agnostic. We have a specific, >> documented set of distributions that we support. Having the source >> code available is great, of course. But Cisco, for example, supports >> only a specific set of distros/versions and we distribute binaries >> for them. I believe that others may be doing the same...? > > Is there something that prevents you from doing this? > Can't you build with prefix /usr? > >>> In our experience people *want* to use custom compilers, >>> custom core libraries etc. >> >> Do you have customers who build the OFA code base with non-GNU >> compilers? Right now, the OFED installer only lets you choose none- >> GNU compilers for the MPI installations -- not the OFA code base >> itself. If this is your strongest point, then refer to what I said >> above: >> >> a) it's the MPI implementations that are complaining that what we are >> doing is Bad >> b) it's the MPI implementations that have to do weird/non-standard/ >> potentially hazardous things to get installed properly > > I know I am very interested in e.g. cross-compiling the OFED core. > I'm not too involved with MPI per se. > > -- > MST -- Jeff Squyres Cisco Systems From jsquyres at cisco.com Sun Mar 18 11:06:54 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Sun, 18 Mar 2007 14:06:54 -0400 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <1174232565.4673.111.camel@athlon-x2.xsintricity.com> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317222552.GG4466@mellanox.co.il> <20070318111533.GF2862@mellanox.co.il> <76FDA6C6-C4FC-494D-BBF1-9001A15C3E8C@cisco.com> <20070318115143.GH2862@mellanox.co.il> <1174232565.4673.111.camel@athlon-x2.xsintricity.com> Message-ID: I think that Doug is trying to say that our default location should be /usr (not /usr/local/ofed). That would seem to solve several issues: - will automatically generate conflicts with the RHEL OFED RPMs - less muckery with finding libraries and header files - can claim to be FHS complaint - user *can* change the default location to elsewhere if they want to If it's a simple issue to change our default, is the only reason *not* to do it the historical precedent of prior community OFED versions? If so, that argument is somewhat diluted because a) we (as a community) are encouraging users to upgrade, and b) RH started is already shipping OFED RPMs that live in /usr. On Mar 18, 2007, at 11:42 AM, Doug Ledford wrote: > On Sun, 2007-03-18 at 13:51 +0200, Michael S. Tsirkin wrote: >>> On Mar 18, 2007, at 7:15 AM, Michael S. Tsirkin wrote: >>> >>>>> I think you're missing Doug's point. There is currently no >>>>> mechanism >>>>> for the user to know that they're installing 2 potentially >>>>> conflicting versions of the same software (OFED). >>>> >>>> That's a good point, although not entirely correct. AFAIK OFED >>>> installer >>>> currently attempts to detect and warn about conflicting >>>> libraries, this >>>> logic probably can be improved. >>> >>> >>> >>> Quoting Jeff Squyres : >>> Subject: Re: [ofa-general] OFED 1.2 Feb-26 meeting summary >>> >>> That seems like chasing our tail: adding more logic/work to >>> replicate >>> a mechanism that is already available (*and* making sure that we >>> keep >>> this logic up-to-date with all the OFED distributions out there -- >>> which seems like a losing proposition). RPM can detect this kind of >>> conflict and prevent it. Why aren't we using it? >>> >>> Oh, right, because we're doing several kinds of non-standard things >>> that preclude us from doing so. :-) >> >> Right. But the user *can* the prefix to /usr, and RPM will detect >> conflicts >> then, isn't that right? > > This is a joke, right? You can't *really* be serious. If you are, > then > I suggest the EWG change the acronym for OFED to Open Fabrics > Experimental Distribution because no enterprise customer I know of > would > accept the above suggestion that they change the spec file and > recompile > just to get RPM to do its job as reasonable for an enterprise software > package. > > -- > Doug Ledford > GPG KeyID: CFBFF194 > http://people.redhat.com/dledford > > Infiniband specific RPMs available at > http://people.redhat.com/dledford/Infiniband -- Jeff Squyres Cisco Systems From kuznet at ms2.inr.ac.ru Sun Mar 18 12:12:38 2007 From: kuznet at ms2.inr.ac.ru (Alexey Kuznetsov) Date: Sun, 18 Mar 2007 22:12:38 +0300 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318155532.GG7958@mellanox.co.il> References: <20070318155532.GG7958@mellanox.co.il> Message-ID: <20070318191238.GA20518@ms2.inr.ac.ru> Hello! > This is not new code, and should have triggered long time ago, > so I am not sure how come we are triggering this only now, > but somehow this did not lead to crashes in 2.6.20 I see. I guess this was plain luck. > Why is neighbour->dev changed here? It holds reference to device and prevents its destruction. If dst is held somewhere, we cannot destroy the device and deadlock while unregister. We could not invalidate dst->neighbour but it looked safe to invalidate neigh->dev after quiescent state. Obviosuly, it is not and it never was safe. Was supposed to be repaired asap, but this did not happen. :-( > Can dst->neighbour be changed to point to NULL instead, and the neighbour > released? It should be cleared and we should be sure it will not be destroyed before quiescent state. Seems, this is the only correct solution, but to do this we have to audit all the places where dst->neighbour is dereferenced for RCU safety. Actually, it is very good you caught this eventually, the bug was so _disgusting_ that it was "forgotten" all the time, waiting for someone who will point out that the king is naked. :-) Alexey From mst at dev.mellanox.co.il Sun Mar 18 12:46:38 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 21:46:38 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318191238.GA20518@ms2.inr.ac.ru> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> Message-ID: <20070318194638.GA11078@mellanox.co.il> > Quoting Alexey Kuznetsov : > Subject: Re: dst_ifdown breaks infiniband? > > Hello! > > > This is not new code, and should have triggered long time ago, > > so I am not sure how come we are triggering this only now, > > but somehow this did not lead to crashes in 2.6.20 > > I see. I guess this was plain luck. Hmm. Something I don't understand: does the code in question not run on *each* device unregister? Why do I only see this under stress? -- MST From mst at dev.mellanox.co.il Sun Mar 18 12:53:55 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 21:53:55 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318191238.GA20518@ms2.inr.ac.ru> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> Message-ID: <20070318195355.GB11078@mellanox.co.il> Quoting Alexey Kuznetsov : Subject: Re: dst_ifdown breaks infiniband? > > Hello! > > > This is not new code, and should have triggered long time ago, > > so I am not sure how come we are triggering this only now, > > but somehow this did not lead to crashes in 2.6.20 > > I see. I guess this was plain luck. > > > > Why is neighbour->dev changed here? > > It holds reference to device and prevents its destruction. > If dst is held somewhere, we cannot destroy the device and deadlock > while unregister. > > We could not invalidate dst->neighbour but it looked safe to invalidate > neigh->dev after quiescent state. Obviosuly, it is not and it never was safe. > Was supposed to be repaired asap, but this did not happen. :-( > > > Can dst->neighbour be changed to point to NULL instead, and the neighbour > > released? > > It should be cleared and we should be sure it will not be destroyed > before quiescent state. I'm confused. didn't you say dst_ifdown is called after quiescent state? > Seems, this is the only correct solution, but to do this we have > to audit all the places where dst->neighbour is dereferenced for > RCU safety. > > Actually, it is very good you caught this eventually, the bug was > so _disgusting_ that it was "forgotten" all the time, waiting for > someone who will point out that the king is naked. :-) > > Alexey This does not sound like something that's likely to be accepted in 2.6.21, right? Any simpler ideas? -- MST From kuznet at ms2.inr.ac.ru Sun Mar 18 12:55:58 2007 From: kuznet at ms2.inr.ac.ru (Alexey Kuznetsov) Date: Sun, 18 Mar 2007 22:55:58 +0300 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318194638.GA11078@mellanox.co.il> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318194638.GA11078@mellanox.co.il> Message-ID: <20070318195558.GA27004@ms2.inr.ac.ru> Hello! > Hmm. Something I don't understand: does the code > in question not run on *each* device unregister? It does. > Why do I only see this under stress? You should have some referenced destination entries to trigger bad path. This should happen not only under stress. F.e. just try to ssh to something via this device. And unregister it. Seems, the crash is inevitable. If you do not see crash, I will be puzzled. Alexey From kuznet at ms2.inr.ac.ru Sun Mar 18 13:18:26 2007 From: kuznet at ms2.inr.ac.ru (Alexey Kuznetsov) Date: Sun, 18 Mar 2007 23:18:26 +0300 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318195355.GB11078@mellanox.co.il> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318195355.GB11078@mellanox.co.il> Message-ID: <20070318201826.GB27004@ms2.inr.ac.ru> Hello! > > It should be cleared and we should be sure it will not be destroyed > > before quiescent state. > > I'm confused. didn't you say dst_ifdown is called after quiescent state? Quiescent state should happen after dst->neighbour is invalidated. And this implies that all the users of dst->neighbour check validity after dereference and do not use it after quiescent state. > This does not sound like something that's likely to be accepted in 2.6.21, right? > > Any simpler ideas? Well, if inifiniband destructor really needs to take that lock... no. Right now I do not see. Alexey From mst at dev.mellanox.co.il Sun Mar 18 13:24:44 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 22:24:44 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318195558.GA27004@ms2.inr.ac.ru> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318194638.GA11078@mellanox.co.il> <20070318195558.GA27004@ms2.inr.ac.ru> Message-ID: <20070318202444.GC11078@mellanox.co.il> > Quoting Alexey Kuznetsov : > Subject: Re: dst_ifdown breaks infiniband? > > Hello! > > > Hmm. Something I don't understand: does the code > > in question not run on *each* device unregister? > > It does. > > > > Why do I only see this under stress? > > You should have some referenced destination entries to trigger bad path. > This should happen not only under stress. > > F.e. just try to ssh to something via this device. And unregister it. > Seems, the crash is inevitable. If you do not see crash, I will be puzzled. I did this. What happens is: neigh_setup is called dst_ifdown changes the neigh->dev to loopback device But the funny thing is that this neighbour can thinkably hang around indefinitely now, and if it does destructor won't be called and there won't be a crash. To trigger a crash, I did simply ifconfig lo down; ifconfig lo 127.0.0.1 and sure enough it crashes in drivers/infiniband/ulp/ipoib/ipoib_main.c. -- MST From greg.lindahl at qlogic.com Sun Mar 18 13:24:58 2007 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Sun, 18 Mar 2007 13:24:58 -0700 Subject: [ewg] Fwd: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <4621D678-4F8A-421F-B042-855F4F753E86@cisco.com> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317213516.GC4466@mellanox.co.il> <20070317221316.GF4466@mellanox.co.il> <4621D678-4F8A-421F-B042-855F4F753E86@cisco.com> Message-ID: <20070318202458.GA3437@localhost.localdomain> On Sun, Mar 18, 2007 at 07:01:38AM -0400, Jeff Squyres wrote: > But Cisco, for example, supports > only a specific set of distros/versions and we distribute binaries > for them. I believe that others may be doing the same...? For our InfiniPath stack (which is all of OFED plus a little extra), we build binary rpms for the various distros that we support. For Red Hat, we install with a prefix of /usr, because that's "the Red Hat way to do it". For SLES, we do it the SLES way. This is what our customers expect. And as a compiler company, we know that having end-users always build the binaries themselves is just asking for bugs. gcc can't build our compiler correctly at -O2, but the bugs it inserts are subtle. And compiler test suites are much, much better than OFED's tests. So users are much better off using the same binary rpms as everyone else -- the actual binaries that were tested -- unless they have a good reason to rebuild. -- greg From mst at dev.mellanox.co.il Sun Mar 18 13:25:59 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 22:25:59 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318191238.GA20518@ms2.inr.ac.ru> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> Message-ID: <20070318202559.GD11078@mellanox.co.il> > > Why is neighbour->dev changed here? > > It holds reference to device and prevents its destruction. > If dst is held somewhere, we cannot destroy the device and deadlock > while unregister. BTW, can this ever happen for the loopback device itself? Is it ever unregistered? -- MST From mst at dev.mellanox.co.il Sun Mar 18 13:29:10 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 22:29:10 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318201826.GB27004@ms2.inr.ac.ru> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318195355.GB11078@mellanox.co.il> <20070318201826.GB27004@ms2.inr.ac.ru> Message-ID: <20070318202910.GE11078@mellanox.co.il> > > > It should be cleared and we should be sure it will not be destroyed > > > before quiescent state. > > > > I'm confused. didn't you say dst_ifdown is called after quiescent state? > > Quiescent state should happen after dst->neighbour is invalidated. > And this implies that all the users of dst->neighbour check validity > after dereference and do not use it after quiescent state. > > > > This does not sound like something that's likely to be accepted in 2.6.21, right? > > > > Any simpler ideas? > > Well, if inifiniband destructor really needs to take that lock... no. > Right now I do not see. OK then. If you post some patches I'll test them. -- MST From mst at dev.mellanox.co.il Sun Mar 18 13:33:45 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 22:33:45 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318191238.GA20518@ms2.inr.ac.ru> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> Message-ID: <20070318203345.GF11078@mellanox.co.il> Quoting Alexey Kuznetsov : Subject: Re: dst_ifdown breaks infiniband? > > Can dst->neighbour be changed to point to NULL instead, and the neighbour > > released? > > It should be cleared and we should be sure it will not be destroyed > before quiescent state. > > Seems, this is the only correct solution, but to do this we have > to audit all the places where dst->neighbour is dereferenced for > RCU safety. > > Actually, it is very good you caught this eventually, the bug was > so _disgusting_ that it was "forgotten" all the time, waiting for > someone who will point out that the king is naked. :-) Actually that might not be too bad: $grep -rIi 'dst->neighbour' net/ | wc -l 36 I'll try to do it. -- MST From dledford at redhat.com Sun Mar 18 13:50:33 2007 From: dledford at redhat.com (Doug Ledford) Date: Sun, 18 Mar 2007 16:50:33 -0400 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317222552.GG4466@mellanox.co.il> <20070318111533.GF2862@mellanox.co.il> <76FDA6C6-C4FC-494D-BBF1-9001A15C3E8C@cisco.com> <20070318115143.GH2862@mellanox.co.il> <1174232565.4673.111.camel@athlon-x2.xsintricity.com> Message-ID: <1174251033.4673.133.camel@athlon-x2.xsintricity.com> On Sun, 2007-03-18 at 14:06 -0400, Jeff Squyres wrote: > I think that Doug is trying to say that our default location should > be /usr (not /usr/local/ofed). That would seem to solve several issues: > > - will automatically generate conflicts with the RHEL OFED RPMs > - less muckery with finding libraries and header files > - can claim to be FHS complaint > - user *can* change the default location to elsewhere if they want to > > If it's a simple issue to change our default, If it's *not* a simple issue to change the default (and in truth, it does take some forethought and planning to get it right), then the much more important question is why would you expect the user to do that work themselves? > is the only reason > *not* to do it the historical precedent of prior community OFED > versions? If so, that argument is somewhat diluted because a) we (as > a community) are encouraging users to upgrade, and b) RH started is > already shipping OFED RPMs that live in /usr. c) like every other initially /usr/local package, there comes a time to grow up. If historical precedent meant anything, I'm sure X would still be in /usr/local. > > > > On Mar 18, 2007, at 11:42 AM, Doug Ledford wrote: > > > On Sun, 2007-03-18 at 13:51 +0200, Michael S. Tsirkin wrote: > >>> On Mar 18, 2007, at 7:15 AM, Michael S. Tsirkin wrote: > >>> > >>>>> I think you're missing Doug's point. There is currently no > >>>>> mechanism > >>>>> for the user to know that they're installing 2 potentially > >>>>> conflicting versions of the same software (OFED). > >>>> > >>>> That's a good point, although not entirely correct. AFAIK OFED > >>>> installer > >>>> currently attempts to detect and warn about conflicting > >>>> libraries, this > >>>> logic probably can be improved. > >>> > >>> > >>> > >>> Quoting Jeff Squyres : > >>> Subject: Re: [ofa-general] OFED 1.2 Feb-26 meeting summary > >>> > >>> That seems like chasing our tail: adding more logic/work to > >>> replicate > >>> a mechanism that is already available (*and* making sure that we > >>> keep > >>> this logic up-to-date with all the OFED distributions out there -- > >>> which seems like a losing proposition). RPM can detect this kind of > >>> conflict and prevent it. Why aren't we using it? > >>> > >>> Oh, right, because we're doing several kinds of non-standard things > >>> that preclude us from doing so. :-) > >> > >> Right. But the user *can* the prefix to /usr, and RPM will detect > >> conflicts > >> then, isn't that right? > > > > This is a joke, right? You can't *really* be serious. If you are, > > then > > I suggest the EWG change the acronym for OFED to Open Fabrics > > Experimental Distribution because no enterprise customer I know of > > would > > accept the above suggestion that they change the spec file and > > recompile > > just to get RPM to do its job as reasonable for an enterprise software > > package. > > > > -- > > Doug Ledford > > GPG KeyID: CFBFF194 > > http://people.redhat.com/dledford > > > > Infiniband specific RPMs available at > > http://people.redhat.com/dledford/Infiniband > > -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mst at dev.mellanox.co.il Sun Mar 18 14:06:17 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 23:06:17 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318203345.GF11078@mellanox.co.il> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318203345.GF11078@mellanox.co.il> Message-ID: <20070318210616.GG11078@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: Re: dst_ifdown breaks infiniband? > > Quoting Alexey Kuznetsov : > Subject: Re: dst_ifdown breaks infiniband? > > > Can dst->neighbour be changed to point to NULL instead, and the neighbour > > > released? > > > > It should be cleared and we should be sure it will not be destroyed > > before quiescent state. > > > > Seems, this is the only correct solution, but to do this we have > > to audit all the places where dst->neighbour is dereferenced for > > RCU safety. > > > > Actually, it is very good you caught this eventually, the bug was > > so _disgusting_ that it was "forgotten" all the time, waiting for > > someone who will point out that the king is naked. :-) > > Actually that might not be too bad: > $grep -rIi 'dst->neighbour' net/ | wc -l > 36 > > I'll try to do it. Here's the list. Looks OK to me. What do you think? $grep rIi 'dst->neighbour' net/ ./atm/clip.c:395: if (!skb->dst->neighbour) { ./atm/clip.c:397: skb->dst->neighbour = clip_find_neighbour(skb->dst, 1); ./atm/clip.c:398: if (!skb->dst->neighbour) { ./atm/clip.c:409: entry = NEIGH2ENTRY(skb->dst->neighbour); ./atm/clip.c:426: DPRINTK("using neighbour %p, vcc %p\n", skb->dst->neighbour, vcc); The above are all in hard_start_xmit - output routine so should be OK (atomic) wrt RCU ./core/dst.c:186: neigh = dst->neighbour; ./core/dst.c:195: dst->neighbour = NULL; Looks OK. ./core/dst.c:252: if (dst->neighbour && dst->neighbour->dev == dev) { ./core/dst.c:253: dst->neighbour->dev = &loopback_dev; This is our boy. ./core/neighbour.c:1045: /* On shaper/eql skb->dst->neighbour != neigh :( */ ./core/neighbour.c:1046: if (skb->dst && skb->dst->neighbour) ./core/neighbour.c:1047: n1 = skb->dst->neighbour; neigh_update - seems to be always called after neigh_lookup so there is a reference to neighbour. ./core/neighbour.c:1144: if (!dst || !(neigh = dst->neighbour)) neigh_resolve_output - looks safe ./core/neighbour.c:1174: dst, dst ? dst->neighbour : NULL); merely prints a pointer ./core/neighbour.c:1187: struct neighbour *neigh = dst->neighbour; neigh_connected_output - looks safe ./decnet/dn_neigh.c:208: struct neighbour *neigh = dst->neighbour; ./decnet/dn_neigh.c:226: struct neighbour *neigh = dst->neighbour; ./decnet/dn_neigh.c:272: struct neighbour *neigh = dst->neighbour; ./decnet/dn_neigh.c:315: struct neighbour *neigh = dst->neighbour; ./decnet/dn_route.c:228: struct dn_dev *dn = dst->neighbour ? ./decnet/dn_route.c:229: (struct dn_dev *)dst->neighbour->dev->dn_ptr : NULL; ./decnet/dn_route.c:693: if ((neigh = dst->neighbour) == NULL) ./decnet/dn_route.c:727: struct neighbour *neigh = dst->neighbour; output routines, except line 228 is dn_dst_update_pmtu, which looks OK as well. ./ipv4/arp.c:445: * It is very UGLY routine: it DOES NOT use skb->dst->neighbour, ./ipv4/arp.c:508: struct neighbour *n = dst->neighbour; ./ipv4/arp.c:523: dst->neighbour = n; Looks safe. ./ipv4/ip_gre.c:714: struct neighbour *neigh = skb->dst->neighbour; ./ipv4/ip_output.c:186: else if (dst->neighbour) ./ipv4/ip_output.c:187: return dst->neighbour->output(skb); ./ipv6/ip6_output.c:79: else if (dst->neighbour) ./ipv6/ip6_output.c:80: return dst->neighbour->output(skb); ./ipv6/ip6_output.c:431: if (skb->dev == dst->dev && dst->neighbour && opt->srcrt == 0) { ./ipv6/ip6_output.c:434: struct neighbour *n = dst->neighbour; ./ipv6/sit.c:459: neigh = skb->dst->neighbour; These are all output routines ./sched/sch_teql.c:235: struct neighbour *mn = skb->dst->neighbour; Looks ok - takes reference on the neighbour. ./sched/sch_teql.c:269: skb->dst->neighbour == NULL) Looks ok. -- MST From mst at dev.mellanox.co.il Sun Mar 18 14:11:18 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 23:11:18 +0200 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <1174251033.4673.133.camel@athlon-x2.xsintricity.com> References: <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317222552.GG4466@mellanox.co.il> <20070318111533.GF2862@mellanox.co.il> <76FDA6C6-C4FC-494D-BBF1-9001A15C3E8C@cisco.com> <20070318115143.GH2862@mellanox.co.il> <1174232565.4673.111.camel@athlon-x2.xsintricity.com> <1174251033.4673.133.camel@athlon-x2.xsintricity.com> Message-ID: <20070318211118.GH11078@mellanox.co.il> > Quoting Doug Ledford : > Subject: Re: [ofa-general] OFED 1.2 Feb-26 meeting summary > > On Sun, 2007-03-18 at 14:06 -0400, Jeff Squyres wrote: > > I think that Doug is trying to say that our default location should > > be /usr (not /usr/local/ofed). That would seem to solve several issues: > > > > - will automatically generate conflicts with the RHEL OFED RPMs > > - less muckery with finding libraries and header files > > - can claim to be FHS complaint > > - user *can* change the default location to elsewhere if they want to > > > > If it's a simple issue to change our default, > > If it's *not* a simple issue to change the default (and in truth, it > does take some forethought and planning to get it right), then the much > more important question is why would you expect the user to do that work > themselves? Good point :). > > is the only reason > > *not* to do it the historical precedent of prior community OFED > > versions? If so, that argument is somewhat diluted because a) we (as > > a community) are encouraging users to upgrade, and b) RH started is > > already shipping OFED RPMs that live in /usr. > > c) like every other initially /usr/local package, there comes a time to > grow up. If historical precedent meant anything, I'm sure X would still > be in /usr/local. Sounds OK. Does EWG vote on this, or something? -- MST From mst at dev.mellanox.co.il Sun Mar 18 14:20:46 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 23:20:46 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318210616.GG11078@mellanox.co.il> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318203345.GF11078@mellanox.co.il> <20070318210616.GG11078@mellanox.co.il> Message-ID: <20070318212046.GI11078@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: Re: dst_ifdown breaks infiniband? > > > Quoting Michael S. Tsirkin : > > Subject: Re: dst_ifdown breaks infiniband? > > > > Quoting Alexey Kuznetsov : > > Subject: Re: dst_ifdown breaks infiniband? > > > > Can dst->neighbour be changed to point to NULL instead, and the neighbour > > > > released? > > > > > > It should be cleared and we should be sure it will not be destroyed > > > before quiescent state. > > > > > > Seems, this is the only correct solution, but to do this we have > > > to audit all the places where dst->neighbour is dereferenced for > > > RCU safety. > > > > > > Actually, it is very good you caught this eventually, the bug was > > > so _disgusting_ that it was "forgotten" all the time, waiting for > > > someone who will point out that the king is naked. :-) > > > > Actually that might not be too bad: > > $grep -rIi 'dst->neighbour' net/ | wc -l > > 36 > > > > I'll try to do it. > > Here's the list. Looks OK to me. What do you think? > So Alexey, how does the following (lightly tested) patch look? Is this what you had in mind? ----------------------------- Fix dst_ifdown for infiniband. Changing dst->neighbour->dev is unsafe because neigh->parms callbacks are set up for specific device. We should drop the dst->neighbour reference instead. Signed-off-by: Michael S. Tsirkin --- diff --git a/net/core/dst.c b/net/core/dst.c index 764bccb..27091a5 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -15,6 +15,7 @@ #include #include #include +#include #include @@ -235,6 +236,8 @@ again: static inline void dst_ifdown(struct dst_entry *dst, struct net_device *dev, int unregister) { + struct neighbour *neigh; + if (dst->ops->ifdown) dst->ops->ifdown(dst, dev, unregister); @@ -245,13 +248,13 @@ static inline void dst_ifdown(struct dst_entry *dst, struct net_device *dev, dst->input = dst_discard_in; dst->output = dst_discard_out; } else { + neigh = dst->neighbour; dst->dev = &loopback_dev; dev_hold(&loopback_dev); dev_put(dev); - if (dst->neighbour && dst->neighbour->dev == dev) { - dst->neighbour->dev = &loopback_dev; - dev_put(dev); - dev_hold(&loopback_dev); + if (neigh && neigh->dev == dev) { + dst->neighbour = NULL; + neigh_release(neigh); } } } -- MST From mst at dev.mellanox.co.il Sun Mar 18 14:52:59 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Mar 2007 23:52:59 +0200 Subject: Fwd: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <20070318202458.GA3437@localhost.localdomain> References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317213516.GC4466@mellanox.co.il> <20070317221316.GF4466@mellanox.co.il> <4621D678-4F8A-421F-B042-855F4F753E86@cisco.com> <20070318202458.GA3437@localhost.localdomain> Message-ID: <20070318215259.GK11078@mellanox.co.il> > Quoting Greg Lindahl : > Subject: Re: Fwd: [ofa-general] OFED 1.2 Feb-26 meeting summary > > gcc can't build our > compiler correctly at -O2, but the bugs it inserts are subtle. And > compiler test suites are much, much better than OFED's tests. I have to say I have not observed this with the verbs library yet. Is it possible that it is simply much more straigh-forward, or written in a more portable way, than your compiler? -- MST From jgunthorpe at obsidianresearch.com Sun Mar 18 14:55:03 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Sun, 18 Mar 2007 15:55:03 -0600 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <20070318211118.GH11078@mellanox.co.il> References: <1174053927.4673.60.camel@athlon-x2.xsintricity.com> <20070317222552.GG4466@mellanox.co.il> <20070318111533.GF2862@mellanox.co.il> <76FDA6C6-C4FC-494D-BBF1-9001A15C3E8C@cisco.com> <20070318115143.GH2862@mellanox.co.il> <1174232565.4673.111.camel@athlon-x2.xsintricity.com> <1174251033.4673.133.camel@athlon-x2.xsintricity.com> <20070318211118.GH11078@mellanox.co.il> Message-ID: <20070318215503.GA5740@obsidianresearch.com> On Sun, Mar 18, 2007 at 11:11:18PM +0200, Michael S. Tsirkin wrote: > > > is the only reason > > > *not* to do it the historical precedent of prior community OFED > > > versions? If so, that argument is somewhat diluted because a) we (as > > > a community) are encouraging users to upgrade, and b) RH started is > > > already shipping OFED RPMs that live in /usr. > > > > c) like every other initially /usr/local package, there comes a time to > > grow up. If historical precedent meant anything, I'm sure X would still > > be in /usr/local. > > Sounds OK. Does EWG vote on this, or something? Along the lines of growing up.. Now that distributors are shipping openfabrics components I think the expectation of OFED will change a little bit. Generally people are happiest if upgrades to things included in their distribution look, act and feel like the original thing. If the distributions are not going to provide feature upgrades as OF keeps evolving for their past releases then there is going to be a larger need for distribution specific binary rpm based upgrades to newer OF stuff. Basically, I wonder if the usefulness of a primarily source OFED distribution is shrinking? Maybe expanding the program to provide RH/SuSE compatible source and binary upgrade RPMs is better? Jason From mst at dev.mellanox.co.il Sun Mar 18 15:10:39 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 00:10:39 +0200 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <20070318215503.GA5740@obsidianresearch.com> References: <20070317222552.GG4466@mellanox.co.il> <20070318111533.GF2862@mellanox.co.il> <76FDA6C6-C4FC-494D-BBF1-9001A15C3E8C@cisco.com> <20070318115143.GH2862@mellanox.co.il> <1174232565.4673.111.camel@athlon-x2.xsintricity.com> <1174251033.4673.133.camel@athlon-x2.xsintricity.com> <20070318211118.GH11078@mellanox.co.il> <20070318215503.GA5740@obsidianresearch.com> Message-ID: <20070318221039.GL11078@mellanox.co.il> > Quoting Jason Gunthorpe : > > Basically, I wonder if the usefulness of a primarily source OFED > distribution is shrinking? My laptop does not run either RHEL or SLES so I don't think so :). > Maybe expanding the program to provide > RH/SuSE compatible source and binary upgrade RPMs is better? Can't distributions do that? Why not? -- MST From mrl at eskimo.com Sun Mar 18 15:17:39 2007 From: mrl at eskimo.com (mrl at eskimo.com) Date: Sun, 18 Mar 2007 15:17:39 -0700 (PDT) Subject: [ofa-general] OFED-1.2-20070318-0600 build failure - qlvnictools Message-ID: <200703182217.PAA08838@eskimo.com> Using SLES10 and OFED-1.2-20070318-0600, a build.sh/install.sh fails in userland in qlvnictools with: make[1]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/qlvnictools/ibvexdmto ols' cd . && /bin/sh /var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/qlvnictools/ibvexdmtools/config/m issing --run automake-1.9 --foreign configure.in:9: version mismatch. This is Automake 1.9.6, configure.in:9: but the definition used by this AM_INIT_AUTOMAKE configure.in:9: comes from Automake 1.9.2. You should recreate configure.in:9: aclocal.m4 with aclocal and run automake again. make[1]: *** [Makefile.in] Error 1 make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/qlvnictools/ibvexdmtoo ls' make: *** [qlvnictools] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.9704 (%install) Leaving out qlvnictools and doing a rpmbuild of userspace by hand and all else works. /usr/bin/automake --version automake (GNU automake) 1.9.6 Is there an easy (or not so easy) way around this, folks? Mostyn From mst at dev.mellanox.co.il Sun Mar 18 15:25:09 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 00:25:09 +0200 Subject: [ofa-general] Re: OFED-1.2-20070318-0600 build failure - qlvnictools In-Reply-To: <200703182217.PAA08838@eskimo.com> References: <200703182217.PAA08838@eskimo.com> Message-ID: <20070318222509.GM11078@mellanox.co.il> > Quoting mrl at eskimo.com : > Subject: OFED-1.2-20070318-0600 build failure - qlvnictools > > Using SLES10 and OFED-1.2-20070318-0600, a build.sh/install.sh fails in userland in > qlvnictools with: > > make[1]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/qlvnictools/ibvexdmto > ols' > cd . && /bin/sh /var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/qlvnictools/ibvexdmtools/config/m > issing --run automake-1.9 --foreign > configure.in:9: version mismatch. This is Automake 1.9.6, > configure.in:9: but the definition used by this AM_INIT_AUTOMAKE > configure.in:9: comes from Automake 1.9.2. You should recreate > configure.in:9: aclocal.m4 with aclocal and run automake again. > make[1]: *** [Makefile.in] Error 1 Looks like make dist didn't run properly there. -- MST From ebiederman at lnxi.com Sun Mar 18 15:24:23 2007 From: ebiederman at lnxi.com (Eric W. Biederman) Date: Sun, 18 Mar 2007 16:24:23 -0600 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318202559.GD11078@mellanox.co.il> (Michael S. Tsirkin's message of "Sun, 18 Mar 2007 22:25:59 +0200") References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318202559.GD11078@mellanox.co.il> Message-ID: "Michael S. Tsirkin" writes: >> > Why is neighbour->dev changed here? >> >> It holds reference to device and prevents its destruction. >> If dst is held somewhere, we cannot destroy the device and deadlock >> while unregister. > > BTW, can this ever happen for the loopback device itself? > Is it ever unregistered? Well I don't think the loopback device is currently but as soon as we get network namespace support we will have multiple loopback devices and they will get unregistered when we remove the network namespace. Eric From dledford at redhat.com Sun Mar 18 15:26:29 2007 From: dledford at redhat.com (Doug Ledford) Date: Sun, 18 Mar 2007 18:26:29 -0400 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <20070318221039.GL11078@mellanox.co.il> References: <20070317222552.GG4466@mellanox.co.il> <20070318111533.GF2862@mellanox.co.il> <76FDA6C6-C4FC-494D-BBF1-9001A15C3E8C@cisco.com> <20070318115143.GH2862@mellanox.co.il> <1174232565.4673.111.camel@athlon-x2.xsintricity.com> <1174251033.4673.133.camel@athlon-x2.xsintricity.com> <20070318211118.GH11078@mellanox.co.il> <20070318215503.GA5740@obsidianresearch.com> <20070318221039.GL11078@mellanox.co.il> Message-ID: <1174256789.4673.163.camel@athlon-x2.xsintricity.com> On Mon, 2007-03-19 at 00:10 +0200, Michael S. Tsirkin wrote: > > Quoting Jason Gunthorpe : > > > > Basically, I wonder if the usefulness of a primarily source OFED > > distribution is shrinking? > > My laptop does not run either RHEL or SLES so I don't think so :). > > > Maybe expanding the program to provide > > RH/SuSE compatible source and binary upgrade RPMs is better? > > Can't distributions do that? Why not? Although not weekly or similarly frequent, RHEL4 and RHEL5 will both get updates to the OFED sources at each scheduled update. We won't be freezing our OFED support with the initial supported release. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mst at dev.mellanox.co.il Sun Mar 18 15:34:00 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 00:34:00 +0200 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <1174256789.4673.163.camel@athlon-x2.xsintricity.com> References: <20070318111533.GF2862@mellanox.co.il> <76FDA6C6-C4FC-494D-BBF1-9001A15C3E8C@cisco.com> <20070318115143.GH2862@mellanox.co.il> <1174232565.4673.111.camel@athlon-x2.xsintricity.com> <1174251033.4673.133.camel@athlon-x2.xsintricity.com> <20070318211118.GH11078@mellanox.co.il> <20070318215503.GA5740@obsidianresearch.com> <20070318221039.GL11078@mellanox.co.il> <1174256789.4673.163.camel@athlon-x2.xsintricity.com> Message-ID: <20070318223400.GN11078@mellanox.co.il> > Quoting Doug Ledford : > Subject: Re: [ofa-general] OFED 1.2 Feb-26 meeting summary > > On Mon, 2007-03-19 at 00:10 +0200, Michael S. Tsirkin wrote: > > > Quoting Jason Gunthorpe : > > > > > > Basically, I wonder if the usefulness of a primarily source OFED > > > distribution is shrinking? > > > > My laptop does not run either RHEL or SLES so I don't think so :). > > > > > Maybe expanding the program to provide > > > RH/SuSE compatible source and binary upgrade RPMs is better? > > > > Can't distributions do that? Why not? > > Although not weekly or similarly frequent, RHEL4 and RHEL5 will both get > updates to the OFED sources at each scheduled update. We won't be > freezing our OFED support with the initial supported release. That's great. BTW, something that's I'd like to learn how to do, is a way to figure out what code is RHEL infiniband support based on. For example, I gather RHEL5 basically has OFED 1.1, right? Does this include patches from the support page? Thanks, MST -- MST From mst at dev.mellanox.co.il Sun Mar 18 15:36:53 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 00:36:53 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318202559.GD11078@mellanox.co.il> Message-ID: <20070318223653.GO11078@mellanox.co.il> > Quoting Eric W. Biederman : > Subject: Re: [ofa-general] Re: dst_ifdown breaks infiniband? > > "Michael S. Tsirkin" writes: > > >> > Why is neighbour->dev changed here? > >> > >> It holds reference to device and prevents its destruction. > >> If dst is held somewhere, we cannot destroy the device and deadlock > >> while unregister. > > > > BTW, can this ever happen for the loopback device itself? > > Is it ever unregistered? > > Well I don't think the loopback device is currently but as soon > as we get network namespace support we will have multiple loopback > devices and they will get unregistered when we remove the network > namespace. Hmm. Then the code moving dst->dev to point to the loopback device will have to be fixed too. I'll post a patch a bit later. -- MST From mst at dev.mellanox.co.il Sun Mar 18 15:42:34 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 00:42:34 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318223653.GO11078@mellanox.co.il> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318202559.GD11078@mellanox.co.il> <20070318223653.GO11078@mellanox.co.il> Message-ID: <20070318224234.GP11078@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: Re: [ofa-general] Re: dst_ifdown breaks infiniband? > > > Quoting Eric W. Biederman : > > Subject: Re: [ofa-general] Re: dst_ifdown breaks infiniband? > > > > "Michael S. Tsirkin" writes: > > > > >> > Why is neighbour->dev changed here? > > >> > > >> It holds reference to device and prevents its destruction. > > >> If dst is held somewhere, we cannot destroy the device and deadlock > > >> while unregister. > > > > > > BTW, can this ever happen for the loopback device itself? > > > Is it ever unregistered? > > > > Well I don't think the loopback device is currently but as soon > > as we get network namespace support we will have multiple loopback > > devices and they will get unregistered when we remove the network > > namespace. > > Hmm. Then the code moving dst->dev to point to the loopback > device will have to be fixed too. I'll post a patch a bit later. Does this look sane (untested)? Signed-off-by: Michael S. Tsirkin diff --git a/net/core/dst.c b/net/core/dst.c index 764bccb..8283158 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -235,6 +236,8 @@ again: static inline void dst_ifdown(struct dst_entry *dst, struct net_device *dev, int unregister) { + struct neighbour *neigh; + if (dst->ops->ifdown) dst->ops->ifdown(dst, dev, unregister); @@ -245,14 +248,13 @@ static inline void dst_ifdown(struct dst_entry *dst, struct net_device *dev, dst->input = dst_discard_in; dst->output = dst_discard_out; } else { - dst->dev = &loopback_dev; - dev_hold(&loopback_dev); - dev_put(dev); - if (dst->neighbour && dst->neighbour->dev == dev) { - dst->neighbour->dev = &loopback_dev; - dev_put(dev); - dev_hold(&loopback_dev); + neigh = dst->neighbour; + if (neigh && neigh->dev == dev) { + dst->neighbour = NULL; + neigh_release(neigh); } + dst->dev = NULL; + dev_put(dev); } } -- MST From ebiederman at lnxi.com Sun Mar 18 16:03:37 2007 From: ebiederman at lnxi.com (Eric W. Biederman) Date: Sun, 18 Mar 2007 17:03:37 -0600 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> (Jeff Squyres's message of "Fri, 2 Mar 2007 20:42:22 -0500") References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> Message-ID: "Jeff Squyres" writes: > To be totally clear, there are three issues: > > 1. *NOT AN MPI ISSUE*: base location of the stack. Doug has repeatedly > mentioned that /usr/local/ofed is not good. This is a group issue to decide. If you are really supporting linux you have two choices: /usr or /opt/ofed/ (assuming you register /opt/ofed with LANNA) anything else is wrong by definition. How hard is this to change? If it isn't too bad this should probably be changed as soon as possible. ABI issues suck and should be fixed as soon as you can. The install path and config file path is an ABI issue. > 2. *NOT AN MPI ISSUE*: how the RPMs are built is Bad(tm). Not deleting the > buildroot is Bad; munging %build into %install is Bad; ...etc. This needs to > change. 4 choices jump to mind: > > a. Keep the same scheme. Ick. > b. Install while we build (i.e., the normal way to build a pile of > interdependent RPMs) > c. Use chroot (Red Hat does this in their internal setup, for example) > d. Only distribute binary RPMs for supported platforms; source is available > for those who want it. e. Give up and building RPM's and let the distributions do it. I think e is the most common solution and what distributions are doing now. The only problem with it is that ofed may be evolving to fast to reasonably expect the distributions to keep up. Of course the ideal build scenario looks something like: for source in *.tgz ; do rpmbuild -tb $source ; rpm -i ? ; done Where the source tarballs are have a spec file in them that can be used to build an rpm. Sorting this out needs to happen but likely is something that ugly bits can be lived with. > 3. Doug's final point about allowing multiple MPI's to play harmoniously on a > single system is obviously an MPI issue. The /etc/ alternatives mechanism is > not really good enough (IMHO) for this -- / etc/alternatives is about choosing > one implementation and making everyone use it. The problem is that when > multiple MPI's are installed on a single system, people need all of them (some > users prefer one over the other, but much more important, some apps are only > certified with one MPI or another). The mpi-selector tool we introduced in > OFED 1.2 will likely be "good enough" for this purpose, but we can also work on > integrating the /etc/alternatives stuff if desired, particularly for those who > only need/want one MPI implementation. Agreed. There are a few other issues here as well. There seems to be no agreement on a fortran ABI for linux. Even little things like f77 and g90 are incompatible. Or was their an ABI agreement recently and I missed it? When providing MPI fortran bindings that is a problem because you need to compile separately for each different installed compiler, possibly even for different versions of the same compiler. Which makes even an /opt solution not quite good enough because you need multiple version of the same mpi built with different compilers. Eric From davem at davemloft.net Sun Mar 18 17:13:37 2007 From: davem at davemloft.net (David Miller) Date: Sun, 18 Mar 2007 17:13:37 -0700 (PDT) Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318224234.GP11078@mellanox.co.il> References: <20070318223653.GO11078@mellanox.co.il> <20070318224234.GP11078@mellanox.co.il> Message-ID: <20070318.171337.112622504.davem@davemloft.net> From: "Michael S. Tsirkin" Date: Mon, 19 Mar 2007 00:42:34 +0200 > > Quoting Michael S. Tsirkin : > > Subject: Re: [ofa-general] Re: dst_ifdown breaks infiniband? > > > > > Quoting Eric W. Biederman : > > > Subject: Re: [ofa-general] Re: dst_ifdown breaks infiniband? > > > > > > "Michael S. Tsirkin" writes: > > > > > > >> > Why is neighbour->dev changed here? > > > >> > > > >> It holds reference to device and prevents its destruction. > > > >> If dst is held somewhere, we cannot destroy the device and deadlock > > > >> while unregister. > > > > > > > > BTW, can this ever happen for the loopback device itself? > > > > Is it ever unregistered? > > > > > > Well I don't think the loopback device is currently but as soon > > > as we get network namespace support we will have multiple loopback > > > devices and they will get unregistered when we remove the network > > > namespace. > > > > Hmm. Then the code moving dst->dev to point to the loopback > > device will have to be fixed too. I'll post a patch a bit later. > > Does this look sane (untested)? > > Signed-off-by: Michael S. Tsirkin You can't point it at NULL, we don't point it at loopback just for fun. There can be asynchronous paths elsewhere in the networking still referencing the neigh or dst and they will (correctly) feel free to derefence whatever device is hanging there. So transitioning to NULL is invalid. You guys will need to come up with a better solution for this silly situation with network namespaces. Loopback is always available to point dead routes and neighbour entries at, and this assumption is massively rooted in the networking. From mst at dev.mellanox.co.il Sun Mar 18 22:15:20 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 07:15:20 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318212046.GI11078@mellanox.co.il> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318203345.GF11078@mellanox.co.il> <20070318210616.GG11078@mellanox.co.il> <20070318212046.GI11078@mellanox.co.il> Message-ID: <20070319051520.GR11078@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: Re: dst_ifdown breaks infiniband? > > > Quoting Michael S. Tsirkin : > > Subject: Re: dst_ifdown breaks infiniband? > > > > > Quoting Michael S. Tsirkin : > > > Subject: Re: dst_ifdown breaks infiniband? > > > > > > Quoting Alexey Kuznetsov : > > > Subject: Re: dst_ifdown breaks infiniband? > > > > > Can dst->neighbour be changed to point to NULL instead, and the neighbour > > > > > released? > > > > > > > > It should be cleared and we should be sure it will not be destroyed > > > > before quiescent state. > > > > > > > > Seems, this is the only correct solution, but to do this we have > > > > to audit all the places where dst->neighbour is dereferenced for > > > > RCU safety. > > > > > > > > Actually, it is very good you caught this eventually, the bug was > > > > so _disgusting_ that it was "forgotten" all the time, waiting for > > > > someone who will point out that the king is naked. :-) > > > > > > Actually that might not be too bad: > > > $grep -rIi 'dst->neighbour' net/ | wc -l > > > 36 > > > > > > I'll try to do it. > > > > Here's the list. Looks OK to me. What do you think? > > > > So Alexey, how does the following (lightly tested) patch look? > Is this what you had in mind? > > ----------------------------- > > Fix dst_ifdown for infiniband. > > Changing dst->neighbour->dev is unsafe because neigh->parms callbacks > are set up for specific device. > We should drop the dst->neighbour reference instead. > > Signed-off-by: Michael S. Tsirkin Ugh, looked again and this looks obviously broken. Note to self - stop writing code at 23:00. -- MST From mst at dev.mellanox.co.il Sun Mar 18 22:19:34 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 07:19:34 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318.171337.112622504.davem@davemloft.net> References: <20070318223653.GO11078@mellanox.co.il> <20070318224234.GP11078@mellanox.co.il> <20070318.171337.112622504.davem@davemloft.net> Message-ID: <20070319051934.GS11078@mellanox.co.il> > Quoting David Miller : > Subject: Re: [ofa-general] Re: dst_ifdown breaks infiniband? > > From: "Michael S. Tsirkin" > Date: Mon, 19 Mar 2007 00:42:34 +0200 > > > > Quoting Michael S. Tsirkin : > > > Subject: Re: [ofa-general] Re: dst_ifdown breaks infiniband? > > > > > > > Quoting Eric W. Biederman : > > > > Subject: Re: [ofa-general] Re: dst_ifdown breaks infiniband? > > > > > > > > "Michael S. Tsirkin" writes: > > > > > > > > >> > Why is neighbour->dev changed here? > > > > >> > > > > >> It holds reference to device and prevents its destruction. > > > > >> If dst is held somewhere, we cannot destroy the device and deadlock > > > > >> while unregister. > > > > > > > > > > BTW, can this ever happen for the loopback device itself? > > > > > Is it ever unregistered? > > > > > > > > Well I don't think the loopback device is currently but as soon > > > > as we get network namespace support we will have multiple loopback > > > > devices and they will get unregistered when we remove the network > > > > namespace. > > > > > > Hmm. Then the code moving dst->dev to point to the loopback > > > device will have to be fixed too. I'll post a patch a bit later. > > > > Does this look sane (untested)? > > > > Signed-off-by: Michael S. Tsirkin > > You can't point it at NULL, we don't point it at loopback > just for fun. > > There can be asynchronous paths elsewhere in the networking still > referencing the neigh or dst and they will (correctly) feel free to > derefence whatever device is hanging there. So transitioning > to NULL is invalid. > > You guys will need to come up with a better solution for this silly > situation with network namespaces. Loopback is always available to > point dead routes and neighbour entries at, and this assumption is > massively rooted in the networking. Yes, I see this now. I guess it's best to focus on the original problem with dst_ifdown breaking infiniband for now. For that, we have to audit all the places where dst->neighbour is dereferenced for RCU safety, and this is already a massive task. -- MST From ebiederman at lnxi.com Sun Mar 18 22:30:39 2007 From: ebiederman at lnxi.com (Eric W. Biederman) Date: Sun, 18 Mar 2007 23:30:39 -0600 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318.171337.112622504.davem@davemloft.net> (David Miller's message of "Sun, 18 Mar 2007 17:13:37 -0700 (PDT)") References: <20070318223653.GO11078@mellanox.co.il> <20070318224234.GP11078@mellanox.co.il> <20070318.171337.112622504.davem@davemloft.net> Message-ID: David Miller writes: > From: "Michael S. Tsirkin" > Date: Mon, 19 Mar 2007 00:42:34 +0200 >> > Hmm. Then the code moving dst->dev to point to the loopback >> > device will have to be fixed too. I'll post a patch a bit later. >> >> Does this look sane (untested)? >> >> Signed-off-by: Michael S. Tsirkin > > You can't point it at NULL, we don't point it at loopback > just for fun. > > There can be asynchronous paths elsewhere in the networking still > referencing the neigh or dst and they will (correctly) feel free to > derefence whatever device is hanging there. So transitioning > to NULL is invalid. > > You guys will need to come up with a better solution for this silly > situation with network namespaces. Loopback is always available to > point dead routes and neighbour entries at, and this assumption is > massively rooted in the networking. Sure. In the network namespace case I think the careful ordering of the shutdown handles that case. Even with per network namespace lo unregistered it still existed until the network namespace actually exited. And it only happened on exit. So while there may be a tiny race there it hasn't been an issue yet in practice. I wasn't proposing that we fix it this way. I was simply saying that there was the possibility for the case to exist. The existence of a per network namespace loopback device is fairly fundamental to the network namespace concept. Heck I think Herbert has been looking at it for vserver which almost totally socket isolation. Eric From davem at davemloft.net Sun Mar 18 23:13:16 2007 From: davem at davemloft.net (David Miller) Date: Sun, 18 Mar 2007 23:13:16 -0700 (PDT) Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: References: <20070318224234.GP11078@mellanox.co.il> <20070318.171337.112622504.davem@davemloft.net> Message-ID: <20070318.231316.59470365.davem@davemloft.net> From: ebiederman at lnxi.com (Eric W. Biederman) Date: Sun, 18 Mar 2007 23:30:39 -0600 > Sure. In the network namespace case I think the careful ordering of the > shutdown handles that case. Even with per network namespace lo > unregistered it still existed until the network namespace actually > exited. And it only happened on exit. > > So while there may be a tiny race there it hasn't been an issue yet > in practice. I think the thing to do is to just leave the loopback references in place, try to unregister the per-namespace loopback device, and that will safely wait for all the references to go away. If you do it that way, you should need absolutely no changes to the other code in this area. As per Herbert, I think he works on Xen rather than vserver :-) Perhaps you're thinking of Alexey Kuznetsov or another one of the vserver guys. From kuznet at ms2.inr.ac.ru Mon Mar 19 02:20:09 2007 From: kuznet at ms2.inr.ac.ru (Alexey Kuznetsov) Date: Mon, 19 Mar 2007 12:20:09 +0300 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318202559.GD11078@mellanox.co.il> Message-ID: <20070319092009.GA9387@ms2.inr.ac.ru> Hello! > Well I don't think the loopback device is currently but as soon > as we get network namespace support we will have multiple loopback > devices and they will get unregistered when we remove the network > namespace. There is no logical difference. At the moment when namespace is gone there is nobody who can hold unrevokable references to this loopback. Alexey From bunk at stusta.de Mon Mar 19 02:23:10 2007 From: bunk at stusta.de (Adrian Bunk) Date: Mon, 19 Mar 2007 10:23:10 +0100 Subject: [ofa-general] drivers/infiniband/ulp/ipoib/ipoib_main.c: use-after-free Message-ID: <20070319092310.GJ752@stusta.de> The Coverity checker spotted the following code introduced by commit 839fcaba355abaffb7b44f0f4504093acb0b11cf: <-- snip --> ... static void path_rec_completion(int status, struct ib_sa_path_rec *pathrec, void *path_ptr) { ... list_for_each_entry(neigh, &path->neigh_list, list) { kref_get(&path->ah->ref); neigh->ah = path->ah; memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw, sizeof(union ib_gid)); if (ipoib_cm_enabled(dev, neigh->neighbour)) { if (!ipoib_cm_get(neigh)) ipoib_cm_set(neigh, ipoib_cm_create_tx(dev, path, neigh)); if (!ipoib_cm_get(neigh)) { list_del(&neigh->list); if (neigh->ah) ipoib_put_ah(neigh->ah); ipoib_neigh_free(dev, neigh); continue; } } while ((skb = __skb_dequeue(&neigh->queue))) __skb_queue_tail(&skqueue, skb); } ... <-- snip --> Notice that before the continue "neigh" gets freed, but the list_for_each_entry() for() loop uses it. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From kuznet at ms2.inr.ac.ru Mon Mar 19 02:24:22 2007 From: kuznet at ms2.inr.ac.ru (Alexey Kuznetsov) Date: Mon, 19 Mar 2007 12:24:22 +0300 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318224234.GP11078@mellanox.co.il> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318202559.GD11078@mellanox.co.il> <20070318223653.GO11078@mellanox.co.il> <20070318224234.GP11078@mellanox.co.il> Message-ID: <20070319092422.GB9387@ms2.inr.ac.ru> Hello! > Does this look sane (untested)? It does not, unfortunately. Instead of regular crash in infiniband you will get numerous random NULL pointer dereferences both due to dst->neighbour and due to dst->dev. Alexey From bunk at stusta.de Mon Mar 19 02:26:55 2007 From: bunk at stusta.de (Adrian Bunk) Date: Mon, 19 Mar 2007 10:26:55 +0100 Subject: [ofa-general] drivers/infiniband/hw/cxgb3/iwch_provider.c: uninitialized variable used Message-ID: <20070319092655.GR752@stusta.de> The Coverity checker spotted that "npages" will be used uninitialized in the following code if !(mr_rereg_mask & IB_MR_REREG_TRANS): <-- snip --> ... static int iwch_reregister_phys_mem(struct ib_mr *mr, int mr_rereg_mask, struct ib_pd *pd, struct ib_phys_buf *buffer_list, int num_phys_buf, int acc, u64 * iova_start) { struct iwch_mr mh, *mhp; struct iwch_pd *php; struct iwch_dev *rhp; __be64 *page_list = NULL; int shift = 0; u64 total_size; int npages; int ret; PDBG("%s ib_mr %p ib_pd %p\n", __FUNCTION__, mr, pd); /* There can be no memory windows */ if (atomic_read(&mr->usecnt)) return -EINVAL; mhp = to_iwch_mr(mr); rhp = mhp->rhp; php = to_iwch_pd(mr->pd); /* make sure we are on the same adapter */ if (rhp != php->rhp) return -EINVAL; memcpy(&mh, mhp, sizeof *mhp); if (mr_rereg_mask & IB_MR_REREG_PD) php = to_iwch_pd(pd); if (mr_rereg_mask & IB_MR_REREG_ACCESS) mh.attr.perms = iwch_ib_to_tpt_access(acc); if (mr_rereg_mask & IB_MR_REREG_TRANS) ret = build_phys_page_list(buffer_list, num_phys_buf, iova_start, &total_size, &npages, &shift, &page_list); ret = iwch_reregister_mem(rhp, php, &mh, shift, page_list, npages); ... <-- snip --> Looking at the code, it also seems some orignally planned error handling code for the build_phys_page_list() call was forgotten ("ret" is never checked before it's overwritten again). cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From mst at dev.mellanox.co.il Mon Mar 19 02:33:15 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 11:33:15 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070319092422.GB9387@ms2.inr.ac.ru> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318202559.GD11078@mellanox.co.il> <20070318223653.GO11078@mellanox.co.il> <20070318224234.GP11078@mellanox.co.il> <20070319092422.GB9387@ms2.inr.ac.ru> Message-ID: <20070319093315.GA15909@mellanox.co.il> > Quoting Alexey Kuznetsov : > Subject: Re: [ofa-general] Re: dst_ifdown breaks infiniband? > > > Does this look sane (untested)? > > It does not, unfortunately. > > Instead of regular crash in infiniband you will get numerous > random NULL pointer dereferences both due to dst->neighbour > and due to dst->dev. Right, I saw this clearly in the morning. Thanks. -- MST From kuznet at ms2.inr.ac.ru Mon Mar 19 02:34:43 2007 From: kuznet at ms2.inr.ac.ru (Alexey Kuznetsov) Date: Mon, 19 Mar 2007 12:34:43 +0300 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318.231316.59470365.davem@davemloft.net> References: <20070318224234.GP11078@mellanox.co.il> <20070318.171337.112622504.davem@davemloft.net> <20070318.231316.59470365.davem@davemloft.net> Message-ID: <20070319093443.GC9387@ms2.inr.ac.ru> Hello! > I think the thing to do is to just leave the loopback references > in place, try to unregister the per-namespace loopback device, > and that will safely wait for all the references to go away. Yes, it is exactly how it works in openvz. All the sockets are killed, queues are cleared, nobody holds references and virtual loopback can be unregistered just like another device. Alexey From mst at dev.mellanox.co.il Mon Mar 19 02:36:32 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 11:36:32 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318201826.GB27004@ms2.inr.ac.ru> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318195355.GB11078@mellanox.co.il> <20070318201826.GB27004@ms2.inr.ac.ru> Message-ID: <20070319093632.GB8386@mellanox.co.il> > > Any simpler ideas? > > Well, if inifiniband destructor really needs to take that lock... no. > Right now I do not see. OK, this is actually not hard to fix - for infiniband, we can just look at neighbour->dev->type or compare neighbour->dev and neighbour->parms->dev - if they are different, device is being unregistered, so we do not need to do anything in the destructor. I'll send a patch to openfabrics, shortly. However, after implementing this fix, I hit what could be use after free at module unloading. Dave, Alexey, Roland, could you take a look at the following please? Works fine for me (survived a couple of hours of crazy device loading/unloading/up/down/hotplug + link data and state activity) and seems to fix the issue. --------- If a device driver sets neigh_destructor in neigh_params, this could get called after the device has been unregistered and the driver module removed. This is an old bug, but apparently, started to get triggered more infiniband after recent multicast and connected mode changes. Fix this by delaying dev_put until the neigh_params object is removed. Signed-off-by: Michael S. Tsirkin diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 3183142..cb34f1a 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1313,8 +1313,6 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms) *p = parms->next; parms->dead = 1; write_unlock_bh(&tbl->lock); - if (parms->dev) - dev_put(parms->dev); call_rcu(&parms->rcu_head, neigh_rcu_free_parms); return; } @@ -1325,6 +1323,8 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms) void neigh_parms_destroy(struct neigh_parms *parms) { + if (parms->dev) + dev_put(parms->dev); kfree(parms); } -- MST From vlad at lists.openfabrics.org Mon Mar 19 02:35:30 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Mon, 19 Mar 2007 02:35:30 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070319-0200 daily build status Message-ID: <20070319093530.AB8C8E60837@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.12 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.14 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From mst at dev.mellanox.co.il Mon Mar 19 02:46:19 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 11:46:19 +0200 Subject: [ofa-general] drivers/infiniband/ulp/ipoib/ipoib_main.c: use-after-free In-Reply-To: <20070319092310.GJ752@stusta.de> References: <20070319092310.GJ752@stusta.de> Message-ID: <20070319094619.GE8386@mellanox.co.il> > Quoting Adrian Bunk : > Subject: [ofa-general] drivers/infiniband/ulp/ipoib/ipoib_main.c: use-after-free > > The Coverity checker spotted the following code introduced by > commit 839fcaba355abaffb7b44f0f4504093acb0b11cf: > > <-- snip --> > > ... > static void path_rec_completion(int status, > struct ib_sa_path_rec *pathrec, > void *path_ptr) > { > ... > list_for_each_entry(neigh, &path->neigh_list, list) { > kref_get(&path->ah->ref); > neigh->ah = path->ah; > memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw, > sizeof(union ib_gid)); > > if (ipoib_cm_enabled(dev, neigh->neighbour)) { > if (!ipoib_cm_get(neigh)) > ipoib_cm_set(neigh, ipoib_cm_create_tx(dev, > path, > neigh)); > if (!ipoib_cm_get(neigh)) { > list_del(&neigh->list); > if (neigh->ah) > ipoib_put_ah(neigh->ah); > ipoib_neigh_free(dev, neigh); > continue; > } > } > > while ((skb = __skb_dequeue(&neigh->queue))) > __skb_queue_tail(&skqueue, skb); > } > ... > > <-- snip --> > > Notice that before the continue "neigh" gets freed, but the > list_for_each_entry() for() loop uses it. Something like this then? Untested. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 12b528b..706eb88 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -380,7 +380,7 @@ static void path_rec_completion(int status, struct net_device *dev = path->dev; struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_ah *ah = NULL; - struct ipoib_neigh *neigh; + struct ipoib_neigh *neigh, *t; struct sk_buff_head skqueue; struct sk_buff *skb; unsigned long flags; @@ -418,7 +418,7 @@ static void path_rec_completion(int status, while ((skb = __skb_dequeue(&path->queue))) __skb_queue_tail(&skqueue, skb); - list_for_each_entry(neigh, &path->neigh_list, list) { + list_for_each_entry_safe(neigh, t, &path->neigh_list, list) { kref_get(&path->ah->ref); neigh->ah = path->ah; memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw, -- MST From mst at dev.mellanox.co.il Mon Mar 19 02:55:45 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 11:55:45 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070319093632.GB8386@mellanox.co.il> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318195355.GB11078@mellanox.co.il> <20070318201826.GB27004@ms2.inr.ac.ru> <20070319093632.GB8386@mellanox.co.il> Message-ID: <20070319095545.GF8386@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: Re: dst_ifdown breaks infiniband? > > > > Any simpler ideas? > > > > Well, if inifiniband destructor really needs to take that lock... no. > > Right now I do not see. > > OK, this is actually not hard to fix - for infiniband, we can just look at > neighbour->dev->type or compare neighbour->dev and > neighbour->parms->dev - if they are different, device is being unregistered, > so we do not need to do anything in the destructor. > > I'll send a patch to openfabrics, shortly. > > However, after implementing this fix, I hit what could be use after > free at module unloading. Dave, Alexey, Roland, could you take a look at > the following please? > > Works fine for me (survived a couple of hours of crazy device > loading/unloading/up/down/hotplug + link data and state activity) > and seems to fix the issue. > > --------- > > If a device driver sets neigh_destructor in neigh_params, this could > get called after the device has been unregistered and the driver module > removed. > > This is an old bug, but apparently, started to get triggered more infiniband > after recent multicast and connected mode changes. > > Fix this by delaying dev_put until the neigh_params object is removed. > > Signed-off-by: Michael S. Tsirkin The problem seems real enough but the fix seems no good - device unregister gets blocked with unregister_netdevice: waiting for ib0 to become free. Usage count = 1 It seems the parms object can survive indefinitely after device is removed. How about creating a new parms object in dst_ifdown, and pointing neighbour to this? Would that work? The advantage of this approach is that neigh->parms is already protected by RCU. -- MST From bugzilla-daemon at lists.openfabrics.org Mon Mar 19 03:22:27 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 19 Mar 2007 03:22:27 -0700 (PDT) Subject: [ofa-general] [Bug 469] New: there may be a memory leak incase of ibv_fork_init failure Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=469 Summary: there may be a memory leak incase of ibv_fork_init failure Product: OpenFabrics Linux Version: 1.2 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P1 Component: Verbs AssignedTo: bugzilla at openib.org ReportedBy: dotanb at mellanox.co.il When the madvise function call fails, there may be a memory leak. a patch was sent to the mailing list to fix this issue. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dotanb at dev.mellanox.co.il Mon Mar 19 03:23:15 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Mon, 19 Mar 2007 12:23:15 +0200 Subject: [ofa-general] [PATCH] libibverbs: fix memory leak in case of error flow Message-ID: <1174299795.3215.1.camel@mtldesk014.lab.mtl.com> Don't leak memory when madvise fails. Signed-off-by: Dotan Barak --- diff --git a/src/memory.c b/src/memory.c index 2af7021..70ef713 100644 --- a/src/memory.c +++ b/src/memory.c @@ -73,6 +73,7 @@ static int too_late; int ibv_fork_init(void) { void *tmp; + int ret; if (mm_root) return 0; @@ -87,12 +88,14 @@ int ibv_fork_init(void) if (posix_memalign(&tmp, page_size, page_size)) return ENOMEM; - if (madvise(tmp, page_size, MADV_DONTFORK) || - madvise(tmp, page_size, MADV_DOFORK)) - return ENOSYS; + ret = madvise(tmp, page_size, MADV_DONTFORK) || + madvise(tmp, page_size, MADV_DOFORK); free(tmp); + if (ret) + return ENOSYS; + mm_root = malloc(sizeof *mm_root); if (!mm_root) return ENOMEM; From tziporet at mellanox.co.il Mon Mar 19 03:39:24 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 19 Mar 2007 12:39:24 +0200 Subject: [ofa-general] FW: weekly update - OFA Sonoma agenda planning Message-ID: <6C2C79E72C305246B504CBA17B5500C9A0E065@mtlexch01.mtl.com> Hi All, Please review the proposed agenda for the Sonoma. Please review and send comments and suggestions to Jeff Tziporet ________________________________ From: Jeffrey Scott [mailto:jeff at splitrockpr.com] Sent: Wednesday, March 14, 2007 11:31 PM To: Thad Omura Cc: Tziporet Koren; Ryan, Jim; Sujal Das; John Hagerman; Bill Boas; Bob Woolery; Chet Mehta; Roland Dreier; Gilad Shainer; mlleinin at hpcn.ca.sandia.gov; Asaf Somekh; Phamdo, Tuan; jriotto at cisco.com; paul.grun at intel.com; Brian Sparks; Arkady.Kanevsky at netapp.com; Dror Goldenberg; seager at llnl.gov; christyl at voltaire.com; ogerlitz at voltaire.com Subject: weekly update - OFA Sonoma agenda planning Session Owners- Attached is this week's Sonoma agenda update. We're up to 30 confirmed sessions. Thanks for everyone's support. We still have a lot of ground to cover. If you have unconfirmed sessions, please lock those down as soon as possible. I am now receiving many inquiries about the agenda as potential attendees weigh the decision to register for the event. PLEASE REMEMBER these two important items: 1. Session owners or presenters should submit presentation drafts to me by April 6. Your presentations won't be edited. We simply want to ensure that all presentations are in the spirit of the workshop (i.e., we don't want sessions to used as product or company promotions). 2. Register for the workshop!!!!!! Every attendee, including presenters and session owners, must register at this link ... http://www.acteva.com/booking.cfm?bevaid=125720 Thanks for working hard to make the Sonoma Workshop a great success. Regards, Jeff Office (408) 884-4017 Mobile (202) 903-6057 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Sonoma Agenda Planner 3-14-07.xls Type: application/vnd.ms-excel Size: 36864 bytes Desc: Sonoma Agenda Planner 3-14-07.xls URL: From kliteyn at dev.mellanox.co.il Mon Mar 19 04:05:59 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 19 Mar 2007 13:05:59 +0200 Subject: [ofa-general] [PATCH] osm: Clearing lid matrices before rebuilding them Message-ID: <45FE6E97.1080105@dev.mellanox.co.il> Hal, This patch fixes a bug in the lid matrices creation: The lid matrices were not cleared, which caused OSM routing to crash when routing nonexisting (disconnected) lids. Please apply to ofed_1_2. I'm not sure about the trunk though. Sasha, Can you please check that you latest improvements to the routing don't have this problem? Thanks. -- Yevgeny Signed-off-by: Yevgeny Kliteynik --- osm/opensm/osm_ucast_mgr.c | 21 +++++++++++++++------ 1 files changed, 15 insertions(+), 6 deletions(-) diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index ee6b3f9..05e07d5 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -1196,12 +1196,21 @@ ucast_mgr_setup_all_switches(osm_subn_t for (p_sw = (osm_switch_t*)cl_qmap_head(&p_subn->sw_guid_tbl); p_sw != (osm_switch_t*)cl_qmap_end(&p_subn->sw_guid_tbl); p_sw = (osm_switch_t*)cl_qmap_next(&p_sw->map_item)) - if (osm_switch_prepare_path_rebuild(p_sw, lids)) { - osm_log(&p_subn->p_osm->log, OSM_LOG_ERROR, - "ucast_mgr_setup_all_switches: ERR 3A0B: " - "cannot setup switch 0x%016" PRIx64 "\n", - cl_ntoh64(osm_node_get_node_guid(p_sw->p_node))); - return -1; + { + if (osm_switch_prepare_path_rebuild(p_sw, lids)) + { + osm_log(&p_subn->p_osm->log, OSM_LOG_ERROR, + "ucast_mgr_setup_all_switches: ERR 3A0B: " + "cannot setup switch 0x%016" PRIx64 "\n", + cl_ntoh64(osm_node_get_node_guid(p_sw->p_node))); + return -1; + } + + /* Clear the LID matrix of the switch */ + for ( i = 0; i < p_sw->num_hops; i++ ) + if (p_sw->hops[i]) + memset(p_sw->hops[i], OSM_NO_PATH, p_sw->num_ports); + } return 0; -- 1.4.4.1.GIT From ogerlitz at voltaire.com Mon Mar 19 04:07:18 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 19 Mar 2007 13:07:18 +0200 Subject: [ofa-general] FW: OFA Sonoma agenda planning Message-ID: <3857BB049D83424D9DB82753D37CEA550EEEDF@taurus.voltaire.com> For some reason this email was not sent to the developer mailing lists, so here it is. I understand that Jeffrey Scott [mailto:jeff at splitrockpr.com] is the individual coordinating the agenda. Jeffrey: please add the two developer mailing lists general at lists.openfabrics.org and ewg at lists.openfabrics.org to any further posting re the work shop. Specfically, to the one mentioned below by Thad so we can make sure the developer related sessions are serialized. Thad Omura wrote: > Please go ahead and forward the list of sessions to the developer > mailing list and the EWG and any other list you'd like, there is no > intension to keep the session planning closed. > > FYI: Late Friday, members of the marketing working group got together > to map the sessions into the 2 1/2 days of time we have for the > workshop. We'll release this as quickly as we can. We do have some > room for more sessions. > Thad Omura | VP of Product Marketing | Mellanox Technologies, Inc. > thad at mellanox.com | Tel 408-916-0020 | Cell 408-750-6236 ________________________________ From: Jeffrey Scott [mailto:jeff at splitrockpr.com] Sent: Wednesday, March 14, 2007 11:31 PM To: Thad Omura Cc: tziporet at mellanox.co.il; Ryan, Jim; Sujal Das; John Hagerman; Bill Boas; Bob Woolery; Chet Mehta; Roland Dreier; Gilad Shainer; mlleinin at hpcn.ca.sandia.gov; Asaf Somekh; Phamdo, Tuan; jriotto at cisco.com; paul.grun at intel.com; Brian at mellanox.com; Arkady.Kanevsky at netapp.com; gdror at mellanox.co.il; seager at llnl.gov; Christy Lynch; Or Gerlitz Subject: weekly update - OFA Sonoma agenda planning Session Owners- Attached is this week's Sonoma agenda update. We're up to 30 confirmed sessions. Thanks for everyone's support. We still have a lot of ground to cover. If you have unconfirmed sessions, please lock those down as soon as possible. I am now receiving many inquiries about the agenda as potential attendees weigh the decision to register for the event. PLEASE REMEMBER these two important items: 1. Session owners or presenters should submit presentation drafts to me by April 6. Your presentations won't be edited. We simply want to ensure that all presentations are in the spirit of the workshop (i.e., we don't want sessions to used as product or company promotions). 2. Register for the workshop!!!!!! Every attendee, including presenters and session owners, must register at this link ... http://www.acteva.com/booking.cfm?bevaid=125720 Thanks for working hard to make the Sonoma Workshop a great success. Regards, Jeff Office (408) 884-4017 Mobile (202) 903-6057 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Sonoma Agenda Planner 3-14-07.xls Type: application/vnd.ms-excel Size: 36864 bytes Desc: Sonoma Agenda Planner 3-14-07.xls URL: From kliteyn at dev.mellanox.co.il Mon Mar 19 04:42:56 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 19 Mar 2007 13:42:56 +0200 Subject: [ofa-general] [PATCHv2] osm: Clearing lid matrices before rebuilding them Message-ID: <45FE7740.2080308@dev.mellanox.co.il> Hi Hal, [V2 of the patch] This patch fixes a bug in the lid matrices creation: The lid matrices were not cleared, which caused OSM routing to crash when routing nonexisting (disconnected) lids. Please apply to ofed_1_2. I'm not sure about the trunk though. Sasha, Can you please check that you latest improvements to the routing don't have this problem? Thanks. -- Yevgeny Signed-off-by: Yevgeny Kliteynik --- osm/opensm/osm_ucast_mgr.c | 20 ++++++++++++++------ 1 files changed, 14 insertions(+), 6 deletions(-) diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index ee6b3f9..8643754 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -1189,6 +1189,7 @@ ucast_mgr_setup_all_switches(osm_subn_t { osm_switch_t *p_sw; uint16_t lids; + uint16_t i; lids = (uint16_t)cl_ptr_vector_get_size(&p_subn->port_lid_tbl); lids = lids ? lids - 1 : 0; @@ -1196,12 +1197,19 @@ ucast_mgr_setup_all_switches(osm_subn_t for (p_sw = (osm_switch_t*)cl_qmap_head(&p_subn->sw_guid_tbl); p_sw != (osm_switch_t*)cl_qmap_end(&p_subn->sw_guid_tbl); p_sw = (osm_switch_t*)cl_qmap_next(&p_sw->map_item)) - if (osm_switch_prepare_path_rebuild(p_sw, lids)) { - osm_log(&p_subn->p_osm->log, OSM_LOG_ERROR, - "ucast_mgr_setup_all_switches: ERR 3A0B: " - "cannot setup switch 0x%016" PRIx64 "\n", - cl_ntoh64(osm_node_get_node_guid(p_sw->p_node))); - return -1; + { + if (osm_switch_prepare_path_rebuild(p_sw, lids)) { + osm_log(&p_subn->p_osm->log, OSM_LOG_ERROR, + "ucast_mgr_setup_all_switches: ERR 3A0B: " + "cannot setup switch 0x%016" PRIx64 "\n", + cl_ntoh64(osm_node_get_node_guid(p_sw->p_node))); + return -1; + } + + /* Clear the LID matrix of the switch */ + for ( i = 0; i < p_sw->num_hops; i++ ) + if (p_sw->hops[i]) + memset(p_sw->hops[i], OSM_NO_PATH, p_sw->num_ports); } return 0; -- 1.4.4.1.GIT From kuznet at ms2.inr.ac.ru Mon Mar 19 05:05:34 2007 From: kuznet at ms2.inr.ac.ru (Alexey Kuznetsov) Date: Mon, 19 Mar 2007 15:05:34 +0300 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070319093632.GB8386@mellanox.co.il> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318195355.GB11078@mellanox.co.il> <20070318201826.GB27004@ms2.inr.ac.ru> <20070319093632.GB8386@mellanox.co.il> Message-ID: <20070319120534.GA28187@ms2.inr.ac.ru> Hello! > If a device driver sets neigh_destructor in neigh_params, this could > get called after the device has been unregistered and the driver module > removed. It is the same problem: if dst->neighbour holds neighbour, it should not hold device. parms->dev is not supposed to be used after neigh_parms_release(). F.e. set parms->dev to NULL to catch bad references. Do you search for a way to find real inifiniband device in ipoib_neigh_destructor()? I guess you will not be able. The problem is logical: if destructor needs device, neighbour entry _somehow_ have to hold reference to the device (via neigh->dev, neigh->parms, whatever). Hence, if we hold neighbour entry, unregister cannot be completed. Therefore, destructor cannot refer to device. Q.E.D. :-) Seems, releasing dst->neighbour is inevitable. Alexey From jsquyres at cisco.com Mon Mar 19 05:11:27 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Mon, 19 Mar 2007 08:11:27 -0400 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> Message-ID: On Mar 18, 2007, at 7:03 PM, Eric W. Biederman wrote: > There seems to be no agreement on a fortran ABI for linux. Even > little > things like f77 and g90 are incompatible. Or was their an ABI > agreement > recently and I missed it? Don't forget C++, too. I long ago went through denial/anger/bargaining/depression/acceptance that you need a separate MPI installation for each compiler suite that you want to support. The mpi-selector tool will help with this, but we can't really solve it here. It's a much larger issue than OFED or the MPI implementations can solve. -- Jeff Squyres Cisco Systems From mst at dev.mellanox.co.il Mon Mar 19 05:12:48 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 14:12:48 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070319120534.GA28187@ms2.inr.ac.ru> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318195355.GB11078@mellanox.co.il> <20070318201826.GB27004@ms2.inr.ac.ru> <20070319093632.GB8386@mellanox.co.il> <20070319120534.GA28187@ms2.inr.ac.ru> Message-ID: <20070319121248.GD18497@mellanox.co.il> > Quoting Alexey Kuznetsov : > Subject: Re: dst_ifdown breaks infiniband? > > Hello! > > > If a device driver sets neigh_destructor in neigh_params, this could > > get called after the device has been unregistered and the driver module > > removed. > > It is the same problem: if dst->neighbour holds neighbour, it should > not hold device. parms->dev is not supposed to be used after > neigh_parms_release(). F.e. set parms->dev to NULL to catch bad references. Yes. I fixed that - simply checking that neighbour->dev is a loopback device is sufficient to detect the fact that the device is being unregistered. > Do you search for a way to find real inifiniband device in > ipoib_neigh_destructor()? No, not anymore. > I guess you will not be able. I agree it's not possible. > The problem is logical: if destructor needs device, neighbour entry > _somehow_ have to hold reference to the device (via neigh->dev, neigh->parms, > whatever). Hence, if we hold neighbour entry, unregister cannot be completed. > Therefore, destructor cannot refer to device. Q.E.D. :-) > > Seems, releasing dst->neighbour is inevitable. infiniband sets parm->neigh_destructor, and I search for a way to prevent this destructor from being called after the module has been unloaded. Ideas? -- MST From mst at dev.mellanox.co.il Mon Mar 19 05:13:10 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 14:13:10 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070319120534.GA28187@ms2.inr.ac.ru> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318195355.GB11078@mellanox.co.il> <20070318201826.GB27004@ms2.inr.ac.ru> <20070319093632.GB8386@mellanox.co.il> <20070319120534.GA28187@ms2.inr.ac.ru> Message-ID: <20070319121310.GE18497@mellanox.co.il> > Quoting Alexey Kuznetsov : > Subject: Re: dst_ifdown breaks infiniband? > > Hello! > > > If a device driver sets neigh_destructor in neigh_params, this could > > get called after the device has been unregistered and the driver module > > removed. > > It is the same problem: if dst->neighbour holds neighbour, it should > not hold device. parms->dev is not supposed to be used after > neigh_parms_release(). F.e. set parms->dev to NULL to catch bad references. Yes. I fixed that - simply checking that neighbour->dev is a loopback device is sufficient to detect the fact that the device is being unregistered. > Do you search for a way to find real inifiniband device in > ipoib_neigh_destructor()? No, not anymore. > I guess you will not be able. I agree it's not possible. > The problem is logical: if destructor needs device, neighbour entry > _somehow_ have to hold reference to the device (via neigh->dev, neigh->parms, > whatever). Hence, if we hold neighbour entry, unregister cannot be completed. > Therefore, destructor cannot refer to device. Q.E.D. :-) > > Seems, releasing dst->neighbour is inevitable. infiniband sets parm->neigh_destructor, and I search for a way to prevent this destructor from being called after the module has been unloaded. Ideas? -- MST From kuznet at ms2.inr.ac.ru Mon Mar 19 05:59:19 2007 From: kuznet at ms2.inr.ac.ru (Alexey Kuznetsov) Date: Mon, 19 Mar 2007 15:59:19 +0300 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070319121248.GD18497@mellanox.co.il> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318195355.GB11078@mellanox.co.il> <20070318201826.GB27004@ms2.inr.ac.ru> <20070319093632.GB8386@mellanox.co.il> <20070319120534.GA28187@ms2.inr.ac.ru> <20070319121248.GD18497@mellanox.co.il> Message-ID: <20070319125919.GA4239@ms2.inr.ac.ru> Hello! > infiniband sets parm->neigh_destructor, and I search for a way to prevent > this destructor from being called after the module has been unloaded. > Ideas? It must be called in any case to update/release internal ipoib structures. The idea is to move call of parm->neigh_destructor from neighbour destructor to the moment when it is unhashed, right after n->dead is set. infiniband is the only user (atm clip uses it too, but that use is obviously dummy), so that nobody will be harmed. But ipoib will have to check for validity of skb->dst->neighbour before attempt to reinitialize private data on dead (n->dead != 0) neighbour. Alexey From sashak at voltaire.com Mon Mar 19 07:18:29 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 19 Mar 2007 16:18:29 +0200 Subject: [ofa-general] Re: [PATCHv2] osm: Clearing lid matrices before rebuilding them In-Reply-To: <45FE7740.2080308@dev.mellanox.co.il> References: <45FE7740.2080308@dev.mellanox.co.il> Message-ID: <1174313910.13051.25.camel@localhost> On Mon, 2007-03-19 at 13:42 +0200, Yevgeny Kliteynik wrote: > Hi Hal, > > [V2 of the patch] > > This patch fixes a bug in the lid matrices creation: > > The lid matrices were not cleared, which caused OSM routing > to crash when routing nonexisting (disconnected) lids. Where the crash happens? Could you clarify? Anyway I agree that there is no point to try to generate routes to disconnected lids, so cleanup is ok. Now about the patch itself - I would prefer to place min hop tables cleanup in osm_switch_prepare_path_rebuild(), just something like this: diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c index 7c36a54..9273459 100644 --- a/osm/opensm/osm_switch.c +++ b/osm/opensm/osm_switch.c @@ -516,6 +516,9 @@ osm_switch_prepare_path_rebuild( for ( i = 0; i < p_sw->num_ports; i++ ) osm_port_prof_construct( &p_sw->p_prof[i] ); + + osm_switch_clear_hops(p_sw); + if (!p_sw->hops) { hops = malloc((max_lids + 1)*sizeof(hops[0])); > Please apply to ofed_1_2. > > I'm not sure about the trunk though. > Sasha, > Can you please check that you latest improvements to the > routing don't have this problem? With disconnecting switches should be similar behavior I guess. Sasha > > Thanks. > > -- Yevgeny > > Signed-off-by: Yevgeny Kliteynik > --- > osm/opensm/osm_ucast_mgr.c | 20 ++++++++++++++------ > 1 files changed, 14 insertions(+), 6 deletions(-) > > diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c > index ee6b3f9..8643754 100644 > --- a/osm/opensm/osm_ucast_mgr.c > +++ b/osm/opensm/osm_ucast_mgr.c > @@ -1189,6 +1189,7 @@ ucast_mgr_setup_all_switches(osm_subn_t > { > osm_switch_t *p_sw; > uint16_t lids; > + uint16_t i; > > lids = (uint16_t)cl_ptr_vector_get_size(&p_subn->port_lid_tbl); > lids = lids ? lids - 1 : 0; > @@ -1196,12 +1197,19 @@ ucast_mgr_setup_all_switches(osm_subn_t > for (p_sw = (osm_switch_t*)cl_qmap_head(&p_subn->sw_guid_tbl); > p_sw != (osm_switch_t*)cl_qmap_end(&p_subn->sw_guid_tbl); > p_sw = (osm_switch_t*)cl_qmap_next(&p_sw->map_item)) > - if (osm_switch_prepare_path_rebuild(p_sw, lids)) { > - osm_log(&p_subn->p_osm->log, OSM_LOG_ERROR, > - "ucast_mgr_setup_all_switches: ERR 3A0B: " > - "cannot setup switch 0x%016" PRIx64 "\n", > - cl_ntoh64(osm_node_get_node_guid(p_sw->p_node))); > - return -1; > + { > + if (osm_switch_prepare_path_rebuild(p_sw, lids)) { > + osm_log(&p_subn->p_osm->log, OSM_LOG_ERROR, > + "ucast_mgr_setup_all_switches: ERR 3A0B: " > + "cannot setup switch 0x%016" PRIx64 "\n", > + cl_ntoh64(osm_node_get_node_guid(p_sw->p_node))); > + return -1; > + } > + > + /* Clear the LID matrix of the switch */ > + for ( i = 0; i < p_sw->num_hops; i++ ) > + if (p_sw->hops[i]) > + memset(p_sw->hops[i], OSM_NO_PATH, p_sw->num_ports); > } > > return 0; From dledford at redhat.com Mon Mar 19 06:59:47 2007 From: dledford at redhat.com (Doug Ledford) Date: Mon, 19 Mar 2007 09:59:47 -0400 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <20070318223400.GN11078@mellanox.co.il> References: <20070318111533.GF2862@mellanox.co.il> <76FDA6C6-C4FC-494D-BBF1-9001A15C3E8C@cisco.com> <20070318115143.GH2862@mellanox.co.il> <1174232565.4673.111.camel@athlon-x2.xsintricity.com> <1174251033.4673.133.camel@athlon-x2.xsintricity.com> <20070318211118.GH11078@mellanox.co.il> <20070318215503.GA5740@obsidianresearch.com> <20070318221039.GL11078@mellanox.co.il> <1174256789.4673.163.camel@athlon-x2.xsintricity.com> <20070318223400.GN11078@mellanox.co.il> Message-ID: <1174312787.4673.201.camel@athlon-x2.xsintricity.com> On Mon, 2007-03-19 at 00:34 +0200, Michael S. Tsirkin wrote: > > Quoting Doug Ledford : > > Subject: Re: [ofa-general] OFED 1.2 Feb-26 meeting summary > > > > On Mon, 2007-03-19 at 00:10 +0200, Michael S. Tsirkin wrote: > > > > Quoting Jason Gunthorpe : > > > > > > > > Basically, I wonder if the usefulness of a primarily source OFED > > > > distribution is shrinking? > > > > > > My laptop does not run either RHEL or SLES so I don't think so :). > > > > > > > Maybe expanding the program to provide > > > > RH/SuSE compatible source and binary upgrade RPMs is better? > > > > > > Can't distributions do that? Why not? > > > > Although not weekly or similarly frequent, RHEL4 and RHEL5 will both get > > updates to the OFED sources at each scheduled update. We won't be > > freezing our OFED support with the initial supported release. > > That's great. > BTW, something that's I'd like to learn how to do, is a way to figure out what > code is RHEL infiniband support based on. > > For example, I gather RHEL5 basically has OFED 1.1, right? > Does this include patches from the support page? For RHEL4 and RHEL5, the base package is called openib. The version, aka openib-1.1, denotes the OFED distribution used to create the package. In fact, the name of the RPM and version are intended to exactly match the name of the tarball I pulled out of the OFED distribution. As far as support patches go, I have included my own patches to OFED to make it work reasonably given a /usr location, etc. I have not downloaded any patches from the site that weren't part of the OFED 1.1 tarball. I did, however, apply all the fix patches that *were* part of the OFED 1.1 tarball via the configure option to do so. The kernel is handled separately. For the kernel, I started with the OFED 1.1 tarball, pulled the kernel sources, hand sorted through all the fix patches that were labeled as appropriate for the given kernel, then applied whatever fixups were needed to work with non-standard patches that were in our kernel (like the inode-diet patch, which required changes in the OFED kernel code, and Mike Christie, who handles our iSCSI stack, took care of iSER specific fixups). The kernel code will always match the openib package. When I update one, I update the other. So, you always know the base by looking at the openib package version, and then you can see any additional patches I've applied to user space by looking at the openib.spec file, and you can see additional patches to the kernel by looking for the main OFED patch (in RHEL4, it's patch 2700, in RHEL5 it's patch 2600), and immediately after or before the main update patch will be the individual change patches that we've applied. However, keep in mind that in some cases, like in the RHEL5 case, the ofed-1_1 update patch is a pre-munged patch, meaning I applied other patches to my working tree, then did a mondo diff between the working tree and the source tree, so the individual patch information is not complete there (fortunately, there wasn't much to patch at the time since when we froze on 2.6.18, OFED 1.1 was running off that as a starting base anyway). For future releases, I want to start getting the libraries and such separated out into different packages completely, including source. So, for RHEL6, I plan by then to be using totally separate packages each pulled from a release version of that particular package, or if no such tarball exists, from a release branch in the library's git repo. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From swise at opengridcomputing.com Mon Mar 19 07:05:32 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 19 Mar 2007 09:05:32 -0500 Subject: [ofa-general] Re: drivers/infiniband/hw/cxgb3/iwch_provider.c: uninitialized variable used In-Reply-To: <20070319092655.GR752@stusta.de> References: <20070319092655.GR752@stusta.de> Message-ID: <1174313132.8747.0.camel@stevo-desktop> Thanks Adrian, I'll address this... Steve. On Mon, 2007-03-19 at 10:26 +0100, Adrian Bunk wrote: > The Coverity checker spotted that "npages" will be used uninitialized in > the following code if !(mr_rereg_mask & IB_MR_REREG_TRANS): > > <-- snip --> > > ... > static int iwch_reregister_phys_mem(struct ib_mr *mr, > int mr_rereg_mask, > struct ib_pd *pd, > struct ib_phys_buf *buffer_list, > int num_phys_buf, > int acc, u64 * iova_start) > { > > struct iwch_mr mh, *mhp; > struct iwch_pd *php; > struct iwch_dev *rhp; > __be64 *page_list = NULL; > int shift = 0; > u64 total_size; > int npages; > int ret; > > PDBG("%s ib_mr %p ib_pd %p\n", __FUNCTION__, mr, pd); > > /* There can be no memory windows */ > if (atomic_read(&mr->usecnt)) > return -EINVAL; > > mhp = to_iwch_mr(mr); > rhp = mhp->rhp; > php = to_iwch_pd(mr->pd); > > /* make sure we are on the same adapter */ > if (rhp != php->rhp) > return -EINVAL; > > memcpy(&mh, mhp, sizeof *mhp); > > if (mr_rereg_mask & IB_MR_REREG_PD) > php = to_iwch_pd(pd); > if (mr_rereg_mask & IB_MR_REREG_ACCESS) > mh.attr.perms = iwch_ib_to_tpt_access(acc); > if (mr_rereg_mask & IB_MR_REREG_TRANS) > ret = build_phys_page_list(buffer_list, num_phys_buf, > iova_start, > &total_size, &npages, > &shift, &page_list); > > ret = iwch_reregister_mem(rhp, php, &mh, shift, page_list, npages); > ... > > <-- snip --> > > Looking at the code, it also seems some orignally planned error handling > code for the build_phys_page_list() call was forgotten ("ret" is never > checked before it's overwritten again). > > cu > Adrian > From tziporet at dev.mellanox.co.il Mon Mar 19 07:31:19 2007 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 19 Mar 2007 16:31:19 +0200 Subject: [ofa-general] OFED install issues Message-ID: <45FE9EB7.9010206@mellanox.co.il> Hi All, There was a long discussion regarding the OFED installation process. There were several issues raised: 1. Prefix and the distro default: My suggestion is that we will change the default of the install according to the distro that is used. 1. For Redhat the default will be /usr 2. For SLES I don't know what should be the default - Moiz can you send me a contact person that will educate us on SLES preferred prefix. 3. Debian - Roland do you know what should be the default In case someone wish to stay with another prefix (or change it back to /usr/local/ofed) it can be easily done during installation: * In an interactive installation there is a specific question: "Please enter the OFED installation directory [/usr/local/ofed]: " * In a non-attended installation only need to change this line in the conf file: STACK_PREFIX=/usr/local/ofed Please reply if this is OK and we will do this change this week. In this way we will have enough time to test it before RC1. (Note that in Mellanox we always test both /usr/local/ofed and /usr/local prefixes) 2. SPEC files - Change this now is too risky for 1.2 - I suggest we do it immediately after OFED 1.2 so we have enough time to stabilize this change. In any case we may want to have a teleconference to define what we want. Also if anyone want to "raise the glove" - change and test the build & install scripts and send patches it will be great Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From kliteyn at dev.mellanox.co.il Mon Mar 19 07:50:56 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 19 Mar 2007 16:50:56 +0200 Subject: [ofa-general] Re: [PATCHv2] osm: Clearing lid matrices before rebuilding them In-Reply-To: <1174313910.13051.25.camel@localhost> References: <45FE7740.2080308@dev.mellanox.co.il> <1174313910.13051.25.camel@localhost> Message-ID: <45FEA350.7070605@dev.mellanox.co.il> Sasha Khapyorsky wrote: > On Mon, 2007-03-19 at 13:42 +0200, Yevgeny Kliteynik wrote: >> Hi Hal, >> >> [V2 of the patch] >> >> This patch fixes a bug in the lid matrices creation: >> >> The lid matrices were not cleared, which caused OSM routing >> to crash when routing nonexisting (disconnected) lids. > > Where the crash happens? Could you clarify? In __osm_ucast_mgr_process_neighbor(), there is the following assertion: CL_ASSERT( hops <= osm_switch_get_hop_count( p_sw, lid_ho, port_num ) ); This assertion fails, since the hop count becomes inconsistent. > Anyway I agree that there is no point to try to generate routes to > disconnected lids, so cleanup is ok. Now about the patch itself - I > would prefer to place min hop tables cleanup in > osm_switch_prepare_path_rebuild(), just something like this: > > diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c > index 7c36a54..9273459 100644 > --- a/osm/opensm/osm_switch.c > +++ b/osm/opensm/osm_switch.c > @@ -516,6 +516,9 @@ osm_switch_prepare_path_rebuild( > > for ( i = 0; i < p_sw->num_ports; i++ ) > osm_port_prof_construct( &p_sw->p_prof[i] ); > + > + osm_switch_clear_hops(p_sw); > + > if (!p_sw->hops) > { > hops = malloc((max_lids + 1)*sizeof(hops[0])); No problem. >> Please apply to ofed_1_2. >> >> I'm not sure about the trunk though. >> Sasha, >> Can you please check that you latest improvements to the >> routing don't have this problem? > > With disconnecting switches should be similar behavior I guess. Right, I checked it - same problem. I'll issue new patch. -- Yevgeny. > Sasha > >> Thanks. >> >> -- Yevgeny >> >> Signed-off-by: Yevgeny Kliteynik >> --- >> osm/opensm/osm_ucast_mgr.c | 20 ++++++++++++++------ >> 1 files changed, 14 insertions(+), 6 deletions(-) >> >> diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c >> index ee6b3f9..8643754 100644 >> --- a/osm/opensm/osm_ucast_mgr.c >> +++ b/osm/opensm/osm_ucast_mgr.c >> @@ -1189,6 +1189,7 @@ ucast_mgr_setup_all_switches(osm_subn_t >> { >> osm_switch_t *p_sw; >> uint16_t lids; >> + uint16_t i; >> >> lids = (uint16_t)cl_ptr_vector_get_size(&p_subn->port_lid_tbl); >> lids = lids ? lids - 1 : 0; >> @@ -1196,12 +1197,19 @@ ucast_mgr_setup_all_switches(osm_subn_t >> for (p_sw = (osm_switch_t*)cl_qmap_head(&p_subn->sw_guid_tbl); >> p_sw != (osm_switch_t*)cl_qmap_end(&p_subn->sw_guid_tbl); >> p_sw = (osm_switch_t*)cl_qmap_next(&p_sw->map_item)) >> - if (osm_switch_prepare_path_rebuild(p_sw, lids)) { >> - osm_log(&p_subn->p_osm->log, OSM_LOG_ERROR, >> - "ucast_mgr_setup_all_switches: ERR 3A0B: " >> - "cannot setup switch 0x%016" PRIx64 "\n", >> - cl_ntoh64(osm_node_get_node_guid(p_sw->p_node))); >> - return -1; >> + { >> + if (osm_switch_prepare_path_rebuild(p_sw, lids)) { >> + osm_log(&p_subn->p_osm->log, OSM_LOG_ERROR, >> + "ucast_mgr_setup_all_switches: ERR 3A0B: " >> + "cannot setup switch 0x%016" PRIx64 "\n", >> + cl_ntoh64(osm_node_get_node_guid(p_sw->p_node))); >> + return -1; >> + } >> + >> + /* Clear the LID matrix of the switch */ >> + for ( i = 0; i < p_sw->num_hops; i++ ) >> + if (p_sw->hops[i]) >> + memset(p_sw->hops[i], OSM_NO_PATH, p_sw->num_ports); >> } >> >> return 0; > From kliteyn at dev.mellanox.co.il Mon Mar 19 07:58:01 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 19 Mar 2007 16:58:01 +0200 Subject: [ofa-general] [PATCHv3] osm: Clearing lid matrices before rebuilding them Message-ID: <45FEA4F9.4060708@dev.mellanox.co.il> Hi Hal, [V3 of the patch] This patch fixes a bug in the lid matrices creation: The lid matrices were not cleared, which caused OSM routing to crash when routing nonexisting (disconnected) lids. Please apply both to ofed_1_2 and to master. Thanks. -- Yevgeny Signed-off-by: Yevgeny Kliteynik --- osm/opensm/osm_switch.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c index 913f34b..86d0153 100644 --- a/osm/opensm/osm_switch.c +++ b/osm/opensm/osm_switch.c @@ -499,6 +499,9 @@ osm_switch_prepare_path_rebuild( for ( i = 0; i < p_sw->num_ports; i++ ) osm_port_prof_construct( &p_sw->p_prof[i] ); + + osm_switch_clear_hops(p_sw); + if (!p_sw->hops) { hops = malloc((max_lids + 1)*sizeof(hops[0])); -- 1.4.4.1.GIT From ebiederm at xmission.com Mon Mar 19 08:10:16 2007 From: ebiederm at xmission.com (Eric W. Biederman) Date: Mon, 19 Mar 2007 09:10:16 -0600 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070318.231316.59470365.davem@davemloft.net> (David Miller's message of "Sun, 18 Mar 2007 23:13:16 -0700 (PDT)") References: <20070318224234.GP11078@mellanox.co.il> <20070318.171337.112622504.davem@davemloft.net> <20070318.231316.59470365.davem@davemloft.net> Message-ID: David Miller writes: > I think the thing to do is to just leave the loopback references > in place, try to unregister the per-namespace loopback device, > and that will safely wait for all the references to go away. Right. The only thing I have found that needs to be changed so far in this area is specifying which loopback device I want to replace it with. > If you do it that way, you should need absolutely no changes to > the other code in this area. > > As per Herbert, I think he works on Xen rather than vserver :-) > Perhaps you're thinking of Alexey Kuznetsov or another one of the > vserver guys. I think you are thinking of a different Herbert. I was thinking of Herbert Poetzl the vserver maintainer. Alexey works on OpenVZ. Until we get the basic architecture merged they are rival projects. Eric From mst at dev.mellanox.co.il Mon Mar 19 08:13:36 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 17:13:36 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070319125919.GA4239@ms2.inr.ac.ru> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318195355.GB11078@mellanox.co.il> <20070318201826.GB27004@ms2.inr.ac.ru> <20070319093632.GB8386@mellanox.co.il> <20070319120534.GA28187@ms2.inr.ac.ru> <20070319121248.GD18497@mellanox.co.il> <20070319125919.GA4239@ms2.inr.ac.ru> Message-ID: <20070319151336.GA24225@mellanox.co.il> > Quoting Alexey Kuznetsov : > Subject: Re: dst_ifdown breaks infiniband? > > Hello! > > > infiniband sets parm->neigh_destructor, and I search for a way to prevent > > this destructor from being called after the module has been unloaded. > > Ideas? > > It must be called in any case to update/release internal ipoib structures. I don't think there's a problem. All we do in destructor is release the ipoib_neigh resource. And on device unregister we release all resources anyway. > The idea is to move call of parm->neigh_destructor from neighbour destructor > to the moment when it is unhashed, right after n->dead is set. > > infiniband is the only user (atm clip uses it too, but that use is obviously > dummy), so that nobody will be harmed. This might work. Could you post a patch to better show what you mean to do? > But ipoib will have to check for validity of skb->dst->neighbour before > attempt to reinitialize private data on dead (n->dead != 0) neighbour. We set a flag before unregister_netdev and test it in start_xmit so that's covered I think. -- MST From dledford at redhat.com Mon Mar 19 08:13:05 2007 From: dledford at redhat.com (Doug Ledford) Date: Mon, 19 Mar 2007 11:13:05 -0400 Subject: [ofa-general] Re: OFED install issues In-Reply-To: <45FE9EB7.9010206@mellanox.co.il> References: <45FE9EB7.9010206@mellanox.co.il> Message-ID: <1174317185.4673.212.camel@athlon-x2.xsintricity.com> On Mon, 2007-03-19 at 16:31 +0200, Tziporet Koren wrote: > Hi All, > There was a long discussion regarding the OFED installation process. > There were several issues raised: > 1. Prefix and the distro default: > My suggestion is that we will change the default of the install > according to the distro that is used. > 1. For Redhat the default will be /usr > 2. For SLES I don't know what should be the default - Moiz can > you send me a contact person that will educate us on SLES > preferred prefix. > 3. Debian - Roland do you know what should be the default The prefix is pretty much determined by LFHS and LSB compliance. The distro doesn't really matter much unless the distro simply has no interest in being compliant with these standards. That being said, it's pretty much /usr for all of us. The only possible exception is the MPI stacks, but if they want to go into /opt, they need to register with LANNA for that. > In case someone wish to stay with another prefix (or change it back > to /usr/local/ofed) it can be easily done during installation: > * In an interactive installation there is a specific question: > "Please enter the OFED installation directory > [/usr/local/ofed]: " > * In a non-attended installation only need to change this line > in the conf file: STACK_PREFIX=/usr/local/ofed > Please reply if this is OK and we will do this change this week. In > this way we will have enough time to test it before RC1. (Note that in > Mellanox we always test both /usr/local/ofed and /usr/local prefixes) > > 2. SPEC files - > Change this now is too risky for 1.2 - I suggest we do it immediately > after OFED 1.2 so we have enough time to stabilize this change. > In any case we may want to have a teleconference to define what we > want. > Also if anyone want to "raise the glove" - change and test the build & > install scripts and send patches it will be great I would suggest the spec file/install changes go hand in hand. Besides, I've done them twice now already, so it's pretty easy for me to do. /me raises hand > Tziporet > -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From jsquyres at cisco.com Mon Mar 19 08:19:50 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Mon, 19 Mar 2007 11:19:50 -0400 Subject: [ofa-general] OFED install issues In-Reply-To: <45FE9EB7.9010206@mellanox.co.il> References: <45FE9EB7.9010206@mellanox.co.il> Message-ID: <24597033-8429-4F4D-ACE5-CCCB0934FE2F@cisco.com> On Mar 19, 2007, at 10:31 AM, Tziporet Koren wrote: > There was a long discussion regarding the OFED installation process. > There were several issues raised: > > 1. Prefix and the distro default: > My suggestion is that we will change the default of the install > according to the distro that is used. > > 1. For Redhat the default will be /usr > 2. For SLES I don't know what should be the default - Moiz can > you send me a contact person that will educate us on SLES preferred > prefix. > 3. Debian - Roland do you know what should be the default > > In case someone wish to stay with another prefix (or change it back > to /usr/local/ofed) it can be easily done during installation: > * In an interactive installation there is a specific question: > "Please enter the OFED installation directory [/usr/local/ofed]: " > * In a non-attended installation only need to change this line > in the conf file: STACK_PREFIX=/usr/local/ofed > > Please reply if this is OK and we will do this change this week. In > this way we will have enough time to test it before RC1. (Note that > in Mellanox we always test both /usr/local/ofed and /usr/local > prefixes) I think that this sounds fine (and will be quite helpful for all the reasons Doug has specified), but I'd like to hear from the testers. > 2. SPEC files - > Change this now is too risky for 1.2 - I suggest we do it > immediately after OFED 1.2 so we have enough time to stabilize this > change. > In any case we may want to have a teleconference to define what we > want. I will happily participate in this effort. I think that it's [unfortunately] far too late to do anything about this for v1.2, so I propose that we table the discussion until after 1.2 is out. Then let's re-start the discussion shortly/immediately after 1.2 is released, decide what we want, come up with a way to move forward, etc. > Also if anyone want to "raise the glove" - change and test the > build & install scripts and send patches it will be great I think that this is very much in the spirit of open source, but given that there is so much apathy and confusion about the issue :-), I think we should probably have a discussion about exactly what we *want* before someone tries to code this up. Just my $0.000000001. -- Jeff Squyres Cisco Systems From mst at dev.mellanox.co.il Mon Mar 19 08:20:56 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 17:20:56 +0200 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <1174312787.4673.201.camel@athlon-x2.xsintricity.com> References: <20070318115143.GH2862@mellanox.co.il> <1174232565.4673.111.camel@athlon-x2.xsintricity.com> <1174251033.4673.133.camel@athlon-x2.xsintricity.com> <20070318211118.GH11078@mellanox.co.il> <20070318215503.GA5740@obsidianresearch.com> <20070318221039.GL11078@mellanox.co.il> <1174256789.4673.163.camel@athlon-x2.xsintricity.com> <20070318223400.GN11078@mellanox.co.il> <1174312787.4673.201.camel@athlon-x2.xsintricity.com> Message-ID: <20070319152056.GB24225@mellanox.co.il> > The kernel code will > always match the openib package. When I update one, I update the other. > So, you always know the base by looking at the openib package version, > and then you can see any additional patches I've applied to user space > by looking at the openib.spec file, and you can see additional patches > to the kernel by looking for the main OFED patch (in RHEL4, it's patch > 2700, in RHEL5 it's patch 2600), and immediately after or before the > main update patch will be the individual change patches that we've > applied. You lost me here. Example? The OFED support page https://wiki.openfabrics.org/tiki-index.php?page=OFED+Support mentions patches for two critical bugs in kernel code: IPoIB kernel oops, and mthca off-by-one. Are these two applied? How to find out? -- MST From sweitzen at cisco.com Mon Mar 19 08:32:35 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 19 Mar 2007 08:32:35 -0700 Subject: [ofa-general] OFED install issues In-Reply-To: <45FE9EB7.9010206@mellanox.co.il> References: <45FE9EB7.9010206@mellanox.co.il> Message-ID: I defer to Doug L and Jeff S. If they like it, I have no objections. Scott ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Tziporet Koren Sent: Monday, March 19, 2007 7:31 AM To: Doug Ledford; Moiz Kohari; EWG; Roland Dreier (rdreier) Cc: OPENIB Subject: [ofa-general] OFED install issues Hi All, There was a long discussion regarding the OFED installation process. There were several issues raised: 1. Prefix and the distro default: My suggestion is that we will change the default of the install according to the distro that is used. 1. For Redhat the default will be /usr 2. For SLES I don't know what should be the default - Moiz can you send me a contact person that will educate us on SLES preferred prefix. 3. Debian - Roland do you know what should be the default In case someone wish to stay with another prefix (or change it back to /usr/local/ofed) it can be easily done during installation: * In an interactive installation there is a specific question: "Please enter the OFED installation directory [/usr/local/ofed]: " * In a non-attended installation only need to change this line in the conf file: STACK_PREFIX=/usr/local/ofed Please reply if this is OK and we will do this change this week. In this way we will have enough time to test it before RC1. (Note that in Mellanox we always test both /usr/local/ofed and /usr/local prefixes) 2. SPEC files - Change this now is too risky for 1.2 - I suggest we do it immediately after OFED 1.2 so we have enough time to stabilize this change. In any case we may want to have a teleconference to define what we want. Also if anyone want to "raise the glove" - change and test the build & install scripts and send patches it will be great Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgunthorpe at obsidianresearch.com Mon Mar 19 09:08:20 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Mon, 19 Mar 2007 10:08:20 -0600 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> Message-ID: <20070319160820.GE5740@obsidianresearch.com> On Mon, Mar 19, 2007 at 08:11:27AM -0400, Jeff Squyres wrote: > I long ago went through denial/anger/bargaining/depression/acceptance > that you need a separate MPI installation for each compiler suite > that you want to support. FWIW, a neat trick to help with this is to encode part of the soname of libstdc++ in the soname of your library when you build it. The main observation is that any ABI/compiler/etc differences must be already taken care of by libstdc++'s soname versioning policy. Then you can have parallel installations of MPI libraries without too much trouble. Jason From rdreier at cisco.com Mon Mar 19 09:10:39 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 19 Mar 2007 09:10:39 -0700 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <20070319160820.GE5740@obsidianresearch.com> (Jason Gunthorpe's message of "Mon, 19 Mar 2007 10:08:20 -0600") References: <45E58D3A.8060906@mellanox.co.il> <1172685419.4777.145.camel@fc6.xsintricity.com> <9EFD229F-252C-423D-A0F2-1A3AD214A2B4@cisco.com> <20070319160820.GE5740@obsidianresearch.com> Message-ID: > FWIW, a neat trick to help with this is to encode part of the soname > of libstdc++ in the soname of your library when you build it. The main > observation is that any ABI/compiler/etc differences must be already > taken care of by libstdc++'s soname versioning policy. Then you can > have parallel installations of MPI libraries without too much trouble. But there's also the fortran ABI mess to worry about... From sweitzen at cisco.com Mon Mar 19 09:19:07 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 19 Mar 2007 09:19:07 -0700 Subject: [ofa-general] OFED 1.2 beta on RHEL5 Message-ID: I have created a "RHEL5" OS version in bugzilla. I have been able to compile OFED-1.2-20070314-0600 on RHEL5 x86_64 and run IPoIB/SDP/SRP/MPI. I opened bugs 466 and 467 regarding compilation issues I saw. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Mon Mar 19 09:22:34 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 19 Mar 2007 11:22:34 -0500 Subject: [ofa-general] FW: weekly update - OFA Sonoma agenda planning In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9A0E065@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C9A0E065@mtlexch01.mtl.com> Message-ID: <1174321354.8747.36.camel@stevo-desktop> I will not be attending and thus won't be presenting on iWARP status... Steve. On Mon, 2007-03-19 at 12:39 +0200, Tziporet Koren wrote: > Hi All, > > Please review the proposed agenda for the Sonoma. > Please review and send comments and suggestions to Jeff > > Tziporet > > > ______________________________________________________________________ > From: Jeffrey Scott [mailto:jeff at splitrockpr.com] > Sent: Wednesday, March 14, 2007 11:31 PM > To: Thad Omura > Cc: Tziporet Koren; Ryan, Jim; Sujal Das; John Hagerman; Bill Boas; > Bob Woolery; Chet Mehta; Roland Dreier; Gilad Shainer; > mlleinin at hpcn.ca.sandia.gov; Asaf Somekh; Phamdo, Tuan; > jriotto at cisco.com; paul.grun at intel.com; Brian Sparks; > Arkady.Kanevsky at netapp.com; Dror Goldenberg; seager at llnl.gov; > christyl at voltaire.com; ogerlitz at voltaire.com > Subject: weekly update - OFA Sonoma agenda planning > > > > Session Owners- > Attached is this week's Sonoma agenda update. We're up to 30 > confirmed sessions. Thanks for everyone's support. We still have a > lot of ground to cover. If you have unconfirmed sessions, please lock > those down as soon as possible. I am now receiving many inquiries > about the agenda as potential attendees weigh the decision to register > for the event. > > PLEASE REMEMBER these two important items: > > 1. Session owners or presenters should submit presentation drafts to > me by April 6. Your presentations won't be edited. We simply want to > ensure that all presentations are in the spirit of the workshop (i.e., > we don't want sessions to used as product or company promotions). > 2. Register for the workshop!!!!!! Every attendee, including > presenters and session owners, must register at this link ... > http://www.acteva.com/booking.cfm?bevaid=125720 > > Thanks for working hard to make the Sonoma Workshop a great success. > > Regards, > Jeff > > Office (408) 884-4017 > Mobile (202) 903-6057 > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From wombat2 at us.ibm.com Mon Mar 19 09:52:51 2007 From: wombat2 at us.ibm.com (Bernard King-Smith) Date: Mon, 19 Mar 2007 12:52:51 -0400 Subject: [ofa-general] IPoIB-CM Performance In-Reply-To: <20070319162049.3BE4BE6083C@openfabrics.org> Message-ID: Michael, When you posted the first patch to add IPoIB-CM to OFED 1.2 you posted a unidirectional performance of 891 MB/s using Netperf. We don't have SRQ adapters at the moment but are we still getting 891 MB/s with all the changes that went into the various patches you have posted since then? Thanks. Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner -------------- next part -------------- An HTML attachment was scrubbed... URL: From sashak at voltaire.com Mon Mar 19 11:16:48 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 19 Mar 2007 20:16:48 +0200 Subject: [ofa-general] Re: [PATCHv3] osm: Clearing lid matrices before rebuilding them In-Reply-To: <45FEA4F9.4060708@dev.mellanox.co.il> References: <45FEA4F9.4060708@dev.mellanox.co.il> Message-ID: <20070319181648.GM19999@sashak.voltaire.com> On 16:58 Mon 19 Mar , Yevgeny Kliteynik wrote: > Hi Hal, > > [V3 of the patch] > > This patch fixes a bug in the lid matrices creation: > > The lid matrices were not cleared, which caused OSM routing > to crash when routing nonexisting (disconnected) lids. > > Please apply both to ofed_1_2 and to master. > > Thanks. > -- Yevgeny > > Signed-off-by: Yevgeny Kliteynik Looks correct for me. Sasha > --- > osm/opensm/osm_switch.c | 3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c > index 913f34b..86d0153 100644 > --- a/osm/opensm/osm_switch.c > +++ b/osm/opensm/osm_switch.c > @@ -499,6 +499,9 @@ osm_switch_prepare_path_rebuild( > > for ( i = 0; i < p_sw->num_ports; i++ ) > osm_port_prof_construct( &p_sw->p_prof[i] ); > + > + osm_switch_clear_hops(p_sw); > + > if (!p_sw->hops) > { > hops = malloc((max_lids + 1)*sizeof(hops[0])); > -- > 1.4.4.1.GIT > From halr at voltaire.com Mon Mar 19 12:30:22 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Mar 2007 14:30:22 -0500 Subject: [ofa-general] Re: [PATCHv3] osm: Clearing lid matrices before rebuilding them In-Reply-To: <45FEA4F9.4060708@dev.mellanox.co.il> References: <45FEA4F9.4060708@dev.mellanox.co.il> Message-ID: <1174332608.4684.393690.camel@hal.voltaire.com> On Mon, 2007-03-19 at 09:58, Yevgeny Kliteynik wrote: > Hi Hal, > > [V3 of the patch] > > This patch fixes a bug in the lid matrices creation: > > The lid matrices were not cleared, which caused OSM routing > to crash when routing nonexisting (disconnected) lids. > > Please apply both to ofed_1_2 and to master. > > Thanks. > -- Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied (to both master and ofed_1_2). -- Hal From sashak at voltaire.com Mon Mar 19 11:55:31 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 19 Mar 2007 20:55:31 +0200 Subject: [ofa-general] Re: [PATCHv2] osm: Clearing lid matrices before rebuilding them In-Reply-To: <45FEA350.7070605@dev.mellanox.co.il> References: <45FE7740.2080308@dev.mellanox.co.il> <1174313910.13051.25.camel@localhost> <45FEA350.7070605@dev.mellanox.co.il> Message-ID: <20070319185531.GN19999@sashak.voltaire.com> On 16:50 Mon 19 Mar , Yevgeny Kliteynik wrote: > > In __osm_ucast_mgr_process_neighbor(), there is the following assertion: > > CL_ASSERT( hops <= osm_switch_get_hop_count( p_sw, lid_ho, > port_num ) ); > > This assertion fails, since the hop count becomes inconsistent. This is not big problem IMO, we just need to not deal with non-existing LIDs there (so __osm_ucast_mgr_process_neighbor() code should be improved in this direction and this assertion removed). And the LFTs generation code doesn't try to build entries for non-existing LIDs, so "old" min hop vectors will be ignored there. But I think we could have a problem when the port (switch with master) is reconnected at different location. Then old/invalid hop counts will be counted again and if it "wins" we can get not expected routing paths. So obviously hop matrix cleanup is simplest fix - Agreed. > >>I'm not sure about the trunk though. > >>Sasha, > >>Can you please check that you latest improvements to the > >>routing don't have this problem? > > > >With disconnecting switches should be similar behavior I guess. > > Right, I checked it - same problem. Interesting. This function is different in the master and doesn't scan LIDs from 1 up to max anymore, instead it scans only switches existing at the moment. Could you provide more details about the master? Do you able to see the problem with just switch disconnections? What is the test case? Sasha From swise at opengridcomputing.com Mon Mar 19 12:13:51 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 19 Mar 2007 14:13:51 -0500 Subject: [ofa-general] [PATCH ofed_1_2] iw_cxgb3: Reserve the pages of dma coherent memory for older kernels. Message-ID: <1174331631.8747.55.camel@stevo-desktop> Hey Vlad, This change, along with a libcxgb3 fix resolves bug 353. You can pull this ofed_1_2 change directly from: git://staging.openfabrics.org/~swise/ofed_1_2 ofed_1_2 Thanks, Steve. --------------------------- Reserve the pages of dma coherent memory for older kernels. Signed-off-by: Steve Wise --- .../2.6.5_sles9_sp3/cxio_hal_to_2.6.14.patch | 127 +++++++++++++++++++++++ .../backport/2.6.9_U2/cxio_hal_to_2.6.14.patch | 127 +++++++++++++++++++++++ .../2.6.9_U2/iwch_provider_to_2.6.9_U4.patch | 16 +++ .../backport/2.6.9_U3/cxio_hal_to_2.6.14.patch | 127 +++++++++++++++++++++++ .../2.6.9_U3/iwch_provider_to_2.6.9_U4.patch | 16 +++ .../backport/2.6.9_U4/cxio_hal_to_2.6.14.patch | 127 +++++++++++++++++++++++ 6 files changed, 540 insertions(+), 0 deletions(-) diff --git a/kernel_patches/backport/2.6.5_sles9_sp3/cxio_hal_to_2.6.14.patch b/kernel_patches/backport/2.6.5_sles9_sp3/cxio_hal_to_2.6.14.patch new file mode 100644 index 0000000..34556bb --- /dev/null +++ b/kernel_patches/backport/2.6.5_sles9_sp3/cxio_hal_to_2.6.14.patch @@ -0,0 +1,127 @@ +Reserve pages to support userspace mapping in older kernels. + +From: Steve Wise + +This is needed for kernels prior to 2.6.15 to correctly map kernel +memory into userspace. + +Signed-off-by: Steve Wise +--- + + drivers/infiniband/hw/cxgb3/core/cxio_hal.c | 53 +++++++++++++++++++-------- + 1 files changed, 38 insertions(+), 15 deletions(-) + +diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c +index 229edd5..067fe46 100644 +--- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c ++++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c +@@ -170,10 +170,30 @@ int cxio_hal_clear_qp_ctx(struct cxio_rd + return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb)); + } + ++static void reserve_pages(void *p, int size) ++{ ++ while (size > 0) { ++ SetPageReserved(virt_to_page(p)); ++ p += PAGE_SIZE; ++ size -= PAGE_SIZE; ++ } ++ BUG_ON(size < 0); ++} ++ ++static void unreserve_pages(void *p, int size) ++{ ++ while (size > 0) { ++ ClearPageReserved(virt_to_page(p)); ++ p += PAGE_SIZE; ++ size -= PAGE_SIZE; ++ } ++ BUG_ON(size < 0); ++} ++ + int cxio_create_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq) + { + struct rdma_cq_setup setup; +- int size = (1UL << (cq->size_log2)) * sizeof(struct t3_cqe); ++ int size = PAGE_ALIGN((1UL << (cq->size_log2)) * sizeof(struct t3_cqe)); + + cq->cqid = cxio_hal_get_cqid(rdev_p->rscp); + if (!cq->cqid) +@@ -181,16 +201,15 @@ int cxio_create_cq(struct cxio_rdev *rde + cq->sw_queue = kzalloc(size, GFP_KERNEL); + if (!cq->sw_queue) + return -ENOMEM; +- cq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), +- (1UL << (cq->size_log2)) * +- sizeof(struct t3_cqe), +- &(cq->dma_addr), GFP_KERNEL); ++ cq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), size, ++ &(cq->dma_addr), GFP_KERNEL); + if (!cq->queue) { + kfree(cq->sw_queue); + return -ENOMEM; + } + pci_unmap_addr_set(cq, mapping, cq->dma_addr); + memset(cq->queue, 0, size); ++ reserve_pages(cq->queue, size); + setup.id = cq->cqid; + setup.base_addr = (u64) (cq->dma_addr); + setup.size = 1UL << cq->size_log2; +@@ -288,6 +307,7 @@ int cxio_create_qp(struct cxio_rdev *rde + { + int depth = 1UL << wq->size_log2; + int rqsize = 1UL << wq->rq_size_log2; ++ int size = PAGE_ALIGN(depth * sizeof(union t3_wr)); + + wq->qpid = get_qpid(rdev_p, uctx); + if (!wq->qpid) +@@ -305,14 +325,15 @@ int cxio_create_qp(struct cxio_rdev *rde + if (!wq->sq) + goto err3; + +- wq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), +- depth * sizeof(union t3_wr), +- &(wq->dma_addr), GFP_KERNEL); ++ wq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), size, ++ &(wq->dma_addr), GFP_KERNEL); + if (!wq->queue) + goto err4; + +- memset(wq->queue, 0, depth * sizeof(union t3_wr)); + pci_unmap_addr_set(wq, mapping, wq->dma_addr); ++ memset(wq->queue, 0, size); ++ reserve_pages(wq->queue, size); ++ + wq->doorbell = (void __iomem *)rdev_p->rnic_info.kdb_addr; + if (!kernel_domain) + wq->udb = (u64)rdev_p->rnic_info.udbell_physbase + +@@ -334,11 +355,12 @@ err1: + int cxio_destroy_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq) + { + int err; ++ int size = PAGE_ALIGN((1UL << (cq->size_log2)) * sizeof(struct t3_cqe)); ++ + err = cxio_hal_clear_cq_ctx(rdev_p, cq->cqid); + kfree(cq->sw_queue); +- dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), +- (1UL << (cq->size_log2)) +- * sizeof(struct t3_cqe), cq->queue, ++ unreserve_pages(cq->queue, size); ++ dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), size, cq->queue, + pci_unmap_addr(cq, mapping)); + cxio_hal_put_cqid(rdev_p->rscp, cq->cqid); + return err; +@@ -347,9 +369,10 @@ int cxio_destroy_cq(struct cxio_rdev *rd + int cxio_destroy_qp(struct cxio_rdev *rdev_p, struct t3_wq *wq, + struct cxio_ucontext *uctx) + { +- dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), +- (1UL << (wq->size_log2)) +- * sizeof(union t3_wr), wq->queue, ++ int size = PAGE_ALIGN((1UL << (wq->size_log2)) * sizeof(union t3_wr)); ++ ++ unreserve_pages(wq->queue, size); ++ dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), size, wq->queue, + pci_unmap_addr(wq, mapping)); + kfree(wq->sq); + cxio_hal_rqtpool_free(rdev_p, wq->rq_addr, (1UL << wq->rq_size_log2)); diff --git a/kernel_patches/backport/2.6.9_U2/cxio_hal_to_2.6.14.patch b/kernel_patches/backport/2.6.9_U2/cxio_hal_to_2.6.14.patch new file mode 100644 index 0000000..34556bb --- /dev/null +++ b/kernel_patches/backport/2.6.9_U2/cxio_hal_to_2.6.14.patch @@ -0,0 +1,127 @@ +Reserve pages to support userspace mapping in older kernels. + +From: Steve Wise + +This is needed for kernels prior to 2.6.15 to correctly map kernel +memory into userspace. + +Signed-off-by: Steve Wise +--- + + drivers/infiniband/hw/cxgb3/core/cxio_hal.c | 53 +++++++++++++++++++-------- + 1 files changed, 38 insertions(+), 15 deletions(-) + +diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c +index 229edd5..067fe46 100644 +--- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c ++++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c +@@ -170,10 +170,30 @@ int cxio_hal_clear_qp_ctx(struct cxio_rd + return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb)); + } + ++static void reserve_pages(void *p, int size) ++{ ++ while (size > 0) { ++ SetPageReserved(virt_to_page(p)); ++ p += PAGE_SIZE; ++ size -= PAGE_SIZE; ++ } ++ BUG_ON(size < 0); ++} ++ ++static void unreserve_pages(void *p, int size) ++{ ++ while (size > 0) { ++ ClearPageReserved(virt_to_page(p)); ++ p += PAGE_SIZE; ++ size -= PAGE_SIZE; ++ } ++ BUG_ON(size < 0); ++} ++ + int cxio_create_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq) + { + struct rdma_cq_setup setup; +- int size = (1UL << (cq->size_log2)) * sizeof(struct t3_cqe); ++ int size = PAGE_ALIGN((1UL << (cq->size_log2)) * sizeof(struct t3_cqe)); + + cq->cqid = cxio_hal_get_cqid(rdev_p->rscp); + if (!cq->cqid) +@@ -181,16 +201,15 @@ int cxio_create_cq(struct cxio_rdev *rde + cq->sw_queue = kzalloc(size, GFP_KERNEL); + if (!cq->sw_queue) + return -ENOMEM; +- cq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), +- (1UL << (cq->size_log2)) * +- sizeof(struct t3_cqe), +- &(cq->dma_addr), GFP_KERNEL); ++ cq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), size, ++ &(cq->dma_addr), GFP_KERNEL); + if (!cq->queue) { + kfree(cq->sw_queue); + return -ENOMEM; + } + pci_unmap_addr_set(cq, mapping, cq->dma_addr); + memset(cq->queue, 0, size); ++ reserve_pages(cq->queue, size); + setup.id = cq->cqid; + setup.base_addr = (u64) (cq->dma_addr); + setup.size = 1UL << cq->size_log2; +@@ -288,6 +307,7 @@ int cxio_create_qp(struct cxio_rdev *rde + { + int depth = 1UL << wq->size_log2; + int rqsize = 1UL << wq->rq_size_log2; ++ int size = PAGE_ALIGN(depth * sizeof(union t3_wr)); + + wq->qpid = get_qpid(rdev_p, uctx); + if (!wq->qpid) +@@ -305,14 +325,15 @@ int cxio_create_qp(struct cxio_rdev *rde + if (!wq->sq) + goto err3; + +- wq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), +- depth * sizeof(union t3_wr), +- &(wq->dma_addr), GFP_KERNEL); ++ wq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), size, ++ &(wq->dma_addr), GFP_KERNEL); + if (!wq->queue) + goto err4; + +- memset(wq->queue, 0, depth * sizeof(union t3_wr)); + pci_unmap_addr_set(wq, mapping, wq->dma_addr); ++ memset(wq->queue, 0, size); ++ reserve_pages(wq->queue, size); ++ + wq->doorbell = (void __iomem *)rdev_p->rnic_info.kdb_addr; + if (!kernel_domain) + wq->udb = (u64)rdev_p->rnic_info.udbell_physbase + +@@ -334,11 +355,12 @@ err1: + int cxio_destroy_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq) + { + int err; ++ int size = PAGE_ALIGN((1UL << (cq->size_log2)) * sizeof(struct t3_cqe)); ++ + err = cxio_hal_clear_cq_ctx(rdev_p, cq->cqid); + kfree(cq->sw_queue); +- dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), +- (1UL << (cq->size_log2)) +- * sizeof(struct t3_cqe), cq->queue, ++ unreserve_pages(cq->queue, size); ++ dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), size, cq->queue, + pci_unmap_addr(cq, mapping)); + cxio_hal_put_cqid(rdev_p->rscp, cq->cqid); + return err; +@@ -347,9 +369,10 @@ int cxio_destroy_cq(struct cxio_rdev *rd + int cxio_destroy_qp(struct cxio_rdev *rdev_p, struct t3_wq *wq, + struct cxio_ucontext *uctx) + { +- dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), +- (1UL << (wq->size_log2)) +- * sizeof(union t3_wr), wq->queue, ++ int size = PAGE_ALIGN((1UL << (wq->size_log2)) * sizeof(union t3_wr)); ++ ++ unreserve_pages(wq->queue, size); ++ dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), size, wq->queue, + pci_unmap_addr(wq, mapping)); + kfree(wq->sq); + cxio_hal_rqtpool_free(rdev_p, wq->rq_addr, (1UL << wq->rq_size_log2)); diff --git a/kernel_patches/backport/2.6.9_U2/iwch_provider_to_2.6.9_U4.patch b/kernel_patches/backport/2.6.9_U2/iwch_provider_to_2.6.9_U4.patch new file mode 100644 index 0000000..1fbc717 --- /dev/null +++ b/kernel_patches/backport/2.6.9_U2/iwch_provider_to_2.6.9_U4.patch @@ -0,0 +1,16 @@ +--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c 2007-01-17 09:22:39.000000000 -0600 ++++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c 2007-01-22 17:46:16.000000000 -0600 +@@ -337,13 +337,6 @@ static int iwch_mmap(struct ib_ucontext + (pgaddr < (rdev_p->rnic_info.udbell_physbase + + rdev_p->rnic_info.udbell_len))) { + +- /* +- * Map T3 DB register. +- */ +- if (vma->vm_flags & VM_READ) { +- return -EPERM; +- } +- + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND; + vma->vm_flags &= ~VM_MAYREAD; diff --git a/kernel_patches/backport/2.6.9_U3/cxio_hal_to_2.6.14.patch b/kernel_patches/backport/2.6.9_U3/cxio_hal_to_2.6.14.patch new file mode 100644 index 0000000..34556bb --- /dev/null +++ b/kernel_patches/backport/2.6.9_U3/cxio_hal_to_2.6.14.patch @@ -0,0 +1,127 @@ +Reserve pages to support userspace mapping in older kernels. + +From: Steve Wise + +This is needed for kernels prior to 2.6.15 to correctly map kernel +memory into userspace. + +Signed-off-by: Steve Wise +--- + + drivers/infiniband/hw/cxgb3/core/cxio_hal.c | 53 +++++++++++++++++++-------- + 1 files changed, 38 insertions(+), 15 deletions(-) + +diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c +index 229edd5..067fe46 100644 +--- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c ++++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c +@@ -170,10 +170,30 @@ int cxio_hal_clear_qp_ctx(struct cxio_rd + return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb)); + } + ++static void reserve_pages(void *p, int size) ++{ ++ while (size > 0) { ++ SetPageReserved(virt_to_page(p)); ++ p += PAGE_SIZE; ++ size -= PAGE_SIZE; ++ } ++ BUG_ON(size < 0); ++} ++ ++static void unreserve_pages(void *p, int size) ++{ ++ while (size > 0) { ++ ClearPageReserved(virt_to_page(p)); ++ p += PAGE_SIZE; ++ size -= PAGE_SIZE; ++ } ++ BUG_ON(size < 0); ++} ++ + int cxio_create_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq) + { + struct rdma_cq_setup setup; +- int size = (1UL << (cq->size_log2)) * sizeof(struct t3_cqe); ++ int size = PAGE_ALIGN((1UL << (cq->size_log2)) * sizeof(struct t3_cqe)); + + cq->cqid = cxio_hal_get_cqid(rdev_p->rscp); + if (!cq->cqid) +@@ -181,16 +201,15 @@ int cxio_create_cq(struct cxio_rdev *rde + cq->sw_queue = kzalloc(size, GFP_KERNEL); + if (!cq->sw_queue) + return -ENOMEM; +- cq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), +- (1UL << (cq->size_log2)) * +- sizeof(struct t3_cqe), +- &(cq->dma_addr), GFP_KERNEL); ++ cq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), size, ++ &(cq->dma_addr), GFP_KERNEL); + if (!cq->queue) { + kfree(cq->sw_queue); + return -ENOMEM; + } + pci_unmap_addr_set(cq, mapping, cq->dma_addr); + memset(cq->queue, 0, size); ++ reserve_pages(cq->queue, size); + setup.id = cq->cqid; + setup.base_addr = (u64) (cq->dma_addr); + setup.size = 1UL << cq->size_log2; +@@ -288,6 +307,7 @@ int cxio_create_qp(struct cxio_rdev *rde + { + int depth = 1UL << wq->size_log2; + int rqsize = 1UL << wq->rq_size_log2; ++ int size = PAGE_ALIGN(depth * sizeof(union t3_wr)); + + wq->qpid = get_qpid(rdev_p, uctx); + if (!wq->qpid) +@@ -305,14 +325,15 @@ int cxio_create_qp(struct cxio_rdev *rde + if (!wq->sq) + goto err3; + +- wq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), +- depth * sizeof(union t3_wr), +- &(wq->dma_addr), GFP_KERNEL); ++ wq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), size, ++ &(wq->dma_addr), GFP_KERNEL); + if (!wq->queue) + goto err4; + +- memset(wq->queue, 0, depth * sizeof(union t3_wr)); + pci_unmap_addr_set(wq, mapping, wq->dma_addr); ++ memset(wq->queue, 0, size); ++ reserve_pages(wq->queue, size); ++ + wq->doorbell = (void __iomem *)rdev_p->rnic_info.kdb_addr; + if (!kernel_domain) + wq->udb = (u64)rdev_p->rnic_info.udbell_physbase + +@@ -334,11 +355,12 @@ err1: + int cxio_destroy_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq) + { + int err; ++ int size = PAGE_ALIGN((1UL << (cq->size_log2)) * sizeof(struct t3_cqe)); ++ + err = cxio_hal_clear_cq_ctx(rdev_p, cq->cqid); + kfree(cq->sw_queue); +- dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), +- (1UL << (cq->size_log2)) +- * sizeof(struct t3_cqe), cq->queue, ++ unreserve_pages(cq->queue, size); ++ dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), size, cq->queue, + pci_unmap_addr(cq, mapping)); + cxio_hal_put_cqid(rdev_p->rscp, cq->cqid); + return err; +@@ -347,9 +369,10 @@ int cxio_destroy_cq(struct cxio_rdev *rd + int cxio_destroy_qp(struct cxio_rdev *rdev_p, struct t3_wq *wq, + struct cxio_ucontext *uctx) + { +- dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), +- (1UL << (wq->size_log2)) +- * sizeof(union t3_wr), wq->queue, ++ int size = PAGE_ALIGN((1UL << (wq->size_log2)) * sizeof(union t3_wr)); ++ ++ unreserve_pages(wq->queue, size); ++ dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), size, wq->queue, + pci_unmap_addr(wq, mapping)); + kfree(wq->sq); + cxio_hal_rqtpool_free(rdev_p, wq->rq_addr, (1UL << wq->rq_size_log2)); diff --git a/kernel_patches/backport/2.6.9_U3/iwch_provider_to_2.6.9_U4.patch b/kernel_patches/backport/2.6.9_U3/iwch_provider_to_2.6.9_U4.patch new file mode 100644 index 0000000..1fbc717 --- /dev/null +++ b/kernel_patches/backport/2.6.9_U3/iwch_provider_to_2.6.9_U4.patch @@ -0,0 +1,16 @@ +--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c 2007-01-17 09:22:39.000000000 -0600 ++++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c 2007-01-22 17:46:16.000000000 -0600 +@@ -337,13 +337,6 @@ static int iwch_mmap(struct ib_ucontext + (pgaddr < (rdev_p->rnic_info.udbell_physbase + + rdev_p->rnic_info.udbell_len))) { + +- /* +- * Map T3 DB register. +- */ +- if (vma->vm_flags & VM_READ) { +- return -EPERM; +- } +- + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND; + vma->vm_flags &= ~VM_MAYREAD; diff --git a/kernel_patches/backport/2.6.9_U4/cxio_hal_to_2.6.14.patch b/kernel_patches/backport/2.6.9_U4/cxio_hal_to_2.6.14.patch new file mode 100644 index 0000000..34556bb --- /dev/null +++ b/kernel_patches/backport/2.6.9_U4/cxio_hal_to_2.6.14.patch @@ -0,0 +1,127 @@ +Reserve pages to support userspace mapping in older kernels. + +From: Steve Wise + +This is needed for kernels prior to 2.6.15 to correctly map kernel +memory into userspace. + +Signed-off-by: Steve Wise +--- + + drivers/infiniband/hw/cxgb3/core/cxio_hal.c | 53 +++++++++++++++++++-------- + 1 files changed, 38 insertions(+), 15 deletions(-) + +diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c +index 229edd5..067fe46 100644 +--- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c ++++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c +@@ -170,10 +170,30 @@ int cxio_hal_clear_qp_ctx(struct cxio_rd + return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb)); + } + ++static void reserve_pages(void *p, int size) ++{ ++ while (size > 0) { ++ SetPageReserved(virt_to_page(p)); ++ p += PAGE_SIZE; ++ size -= PAGE_SIZE; ++ } ++ BUG_ON(size < 0); ++} ++ ++static void unreserve_pages(void *p, int size) ++{ ++ while (size > 0) { ++ ClearPageReserved(virt_to_page(p)); ++ p += PAGE_SIZE; ++ size -= PAGE_SIZE; ++ } ++ BUG_ON(size < 0); ++} ++ + int cxio_create_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq) + { + struct rdma_cq_setup setup; +- int size = (1UL << (cq->size_log2)) * sizeof(struct t3_cqe); ++ int size = PAGE_ALIGN((1UL << (cq->size_log2)) * sizeof(struct t3_cqe)); + + cq->cqid = cxio_hal_get_cqid(rdev_p->rscp); + if (!cq->cqid) +@@ -181,16 +201,15 @@ int cxio_create_cq(struct cxio_rdev *rde + cq->sw_queue = kzalloc(size, GFP_KERNEL); + if (!cq->sw_queue) + return -ENOMEM; +- cq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), +- (1UL << (cq->size_log2)) * +- sizeof(struct t3_cqe), +- &(cq->dma_addr), GFP_KERNEL); ++ cq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), size, ++ &(cq->dma_addr), GFP_KERNEL); + if (!cq->queue) { + kfree(cq->sw_queue); + return -ENOMEM; + } + pci_unmap_addr_set(cq, mapping, cq->dma_addr); + memset(cq->queue, 0, size); ++ reserve_pages(cq->queue, size); + setup.id = cq->cqid; + setup.base_addr = (u64) (cq->dma_addr); + setup.size = 1UL << cq->size_log2; +@@ -288,6 +307,7 @@ int cxio_create_qp(struct cxio_rdev *rde + { + int depth = 1UL << wq->size_log2; + int rqsize = 1UL << wq->rq_size_log2; ++ int size = PAGE_ALIGN(depth * sizeof(union t3_wr)); + + wq->qpid = get_qpid(rdev_p, uctx); + if (!wq->qpid) +@@ -305,14 +325,15 @@ int cxio_create_qp(struct cxio_rdev *rde + if (!wq->sq) + goto err3; + +- wq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), +- depth * sizeof(union t3_wr), +- &(wq->dma_addr), GFP_KERNEL); ++ wq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev), size, ++ &(wq->dma_addr), GFP_KERNEL); + if (!wq->queue) + goto err4; + +- memset(wq->queue, 0, depth * sizeof(union t3_wr)); + pci_unmap_addr_set(wq, mapping, wq->dma_addr); ++ memset(wq->queue, 0, size); ++ reserve_pages(wq->queue, size); ++ + wq->doorbell = (void __iomem *)rdev_p->rnic_info.kdb_addr; + if (!kernel_domain) + wq->udb = (u64)rdev_p->rnic_info.udbell_physbase + +@@ -334,11 +355,12 @@ err1: + int cxio_destroy_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq) + { + int err; ++ int size = PAGE_ALIGN((1UL << (cq->size_log2)) * sizeof(struct t3_cqe)); ++ + err = cxio_hal_clear_cq_ctx(rdev_p, cq->cqid); + kfree(cq->sw_queue); +- dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), +- (1UL << (cq->size_log2)) +- * sizeof(struct t3_cqe), cq->queue, ++ unreserve_pages(cq->queue, size); ++ dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), size, cq->queue, + pci_unmap_addr(cq, mapping)); + cxio_hal_put_cqid(rdev_p->rscp, cq->cqid); + return err; +@@ -347,9 +369,10 @@ int cxio_destroy_cq(struct cxio_rdev *rd + int cxio_destroy_qp(struct cxio_rdev *rdev_p, struct t3_wq *wq, + struct cxio_ucontext *uctx) + { +- dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), +- (1UL << (wq->size_log2)) +- * sizeof(union t3_wr), wq->queue, ++ int size = PAGE_ALIGN((1UL << (wq->size_log2)) * sizeof(union t3_wr)); ++ ++ unreserve_pages(wq->queue, size); ++ dma_free_coherent(&(rdev_p->rnic_info.pdev->dev), size, wq->queue, + pci_unmap_addr(wq, mapping)); + kfree(wq->sq); + cxio_hal_rqtpool_free(rdev_p, wq->rq_addr, (1UL << wq->rq_size_log2)); From rdreier at cisco.com Mon Mar 19 12:16:04 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 19 Mar 2007 12:16:04 -0700 Subject: [ofa-general] IPOIB CM (NOSRQ) patch for review In-Reply-To: (Pradeep Satyanarayana's message of "Mon, 12 Mar 2007 12:13:36 -0700") References: Message-ID: > I dug through the spec and found that ib_query_device() tells one if the > HCA supports SRQ or not. Is that what you had in mind? Yes, something like that. To see why the test has to be at runtime rather than a compile time decision, just think about a distribution kernel -- how can they ship IPoIB CM that works for adapters that don't support SRQ if that breaks disables CM for adapters that do support SRQ? > > Not to mention the fact that basically mixing together two different > > implementations with a liberal sprinkling of #ifdef IPOIB_CM_NOSRQ > > makes the code basically unreadable and unmaintainable. > > One way to alleviate this problem would be to duplicate mainly the > receive side functions and name them something like xyz_nosrq(). > However, there will still be many instances of > > if(SRQ) > xyz(); > else > xyz_nosrq(); > > Is that a better solution than the #ifdef IPOIB_CM_NOSRQ? On the > other hand, this will add duplicate code and may pose some > maintainability issues in the future. I would like to understand as > to which one is the preferred approach. Well, anything is better than the #ifdef stuff, since in addition to being out of the question for the reasons I outlined above, it also has the problem that testing changes requires two builds to see if anything broke, etc. > > This seems crazy -- you do a linear search through a list of QPs > > (which potentially has 100s of entries) for every receive completion! > > Just the spinlock alone is something we would want to avoid in the hot > > performance path. > > I envisaged the NOSRQ case for small clusters only. Othewise, this > may be a memory hog and affect (other) application performance. There has to be a better way. You have 64 bits of work request ID to work with, surely you can avoid a linear search and a spinlock to do this lookup. - R. From rdreier at cisco.com Mon Mar 19 12:28:44 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 19 Mar 2007 12:28:44 -0700 Subject: [ofa-general] Re: Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: <20070312083523.GB4928@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 12 Mar 2007 10:35:23 +0200") References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com> <1172845045.21241.0.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net> <1172854585.21241.14.camel@stevo-desktop> <1172854873.21241.19.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net> <1172856154.21241.34.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039979EC@G3W0634.americas.hpqcorp.net> <20070312083523.GB4928@mellanox.co.il> Message-ID: > Should we add a size parameter for event channels? > And, we might need to add "event channel overrun" flag as well. > > If we want to address the problem in this way, we need to do this before > libibverbs 1.1 freezes I think. I think since no one is hitting this in practice we can take our time here and get it right. From mst at dev.mellanox.co.il Mon Mar 19 12:39:39 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 21:39:39 +0200 Subject: [ofa-general] Re: IPoIB-CM Performance In-Reply-To: References: <20070319162049.3BE4BE6083C@openfabrics.org> Message-ID: <20070319193938.GA5068@mellanox.co.il> Yes, pretty much. Quoting Bernard King-Smith : Subject: IPoIB-CM Performance Michael, When you posted the first patch to add IPoIB-CM to OFED 1.2 you posted a unidirectional performance of 891 MB/s using Netperf. We don't have SRQ adapters at the moment but are we still getting 891 MB/s with all the changes that went into the various patches you have posted since then? Thanks. Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner -- MST From mst at dev.mellanox.co.il Mon Mar 19 12:44:51 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Mar 2007 21:44:51 +0200 Subject: [ofa-general] Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy In-Reply-To: References: Message-ID: <20070319194451.GB5068@mellanox.co.il> > > Here's a (compile tested only) patch for merging async/command queues. > > OK, can you test that and my patch with the test that started this > thread? If it looks good we can merge it for 2.6.21. Works OK here. Pls merge it up. -- MST From arkady at netapp.com Mon Mar 19 13:01:58 2007 From: arkady at netapp.com (Arkady Kanevsky) Date: Mon, 19 Mar 2007 15:01:58 -0500 Subject: [ofa-general] broken links Message-ID: <200703191501.59020.arkady@netapp.com> The README for diagnostics building and running is located here: https://openib.org/svn/gen2/trunk/src/userspace/management/README and A more complete description and command syntax of the diagnostic tools can be found as: https://openib.org/svn/gen2/trunk/src/userspace/management/doc/diagtools.txt are broken. What are the correct links? Thanks, From rdreier at cisco.com Mon Mar 19 13:02:59 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 19 Mar 2007 13:02:59 -0700 Subject: [ofa-general] Re: [PATCH] libibverbs: fix memory leak in case of error flow In-Reply-To: <1174299795.3215.1.camel@mtldesk014.lab.mtl.com> (Dotan Barak's message of "Mon, 19 Mar 2007 12:23:15 +0200") References: <1174299795.3215.1.camel@mtldesk014.lab.mtl.com> Message-ID: Thanks, good catch. Applied and pushed out. From halr at voltaire.com Mon Mar 19 14:06:32 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Mar 2007 16:06:32 -0500 Subject: [ofa-general] broken links In-Reply-To: <200703191501.59020.arkady@netapp.com> References: <200703191501.59020.arkady@netapp.com> Message-ID: <1174338389.4684.399719.camel@hal.voltaire.com> On Mon, 2007-03-19 at 15:01, Arkady Kanevsky wrote: > The README for diagnostics building and running is located here: > https://openib.org/svn/gen2/trunk/src/userspace/management/README > > and > > A more complete description and command syntax of the diagnostic tools can be > found as: > https://openib.org/svn/gen2/trunk/src/userspace/management/doc/diagtools.txt > > are broken. > What are the correct links? You would need to go into the management git repository via gitweb or git. Not sure how to make up a constant link using gitweb. It appears to have some magic numbers in the URLs. -- Hal > Thanks, > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From jsquyres at cisco.com Mon Mar 19 13:12:18 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Mon, 19 Mar 2007 16:12:18 -0400 Subject: [ofa-general] broken links In-Reply-To: <200703191501.59020.arkady@netapp.com> References: <200703191501.59020.arkady@netapp.com> Message-ID: <3BEC1C9A-E20F-4F19-84A6-DCA423EF3F62@cisco.com> 1. The SVN server moved to svn.openfabrics.org 2. The base for the SVN URLs move to /svn/openib/ For example, your first URL should be: https://svn.openfabrics.org/svn/openib/gen2/trunk/src/userspace/ management/README However, you'll still get a 404 because just about the only thing left in the OFA SVN is OFED 1.1; everything else has been "svn rm"'ed. It's all there in the SVN history if you want it; see https://svn.openfabrics.org/svn/openib/README.txt for the file "Where in the world did all the OpenFabrics sources go?" Hope that helps... On Mar 19, 2007, at 4:01 PM, Arkady Kanevsky wrote: > The README for diagnostics building and running is located here: > https://openib.org/svn/gen2/trunk/src/userspace/management/README > > and > > A more complete description and command syntax of the diagnostic > tools can be > found as: > https://openib.org/svn/gen2/trunk/src/userspace/management/doc/ > diagtools.txt > > are broken. > What are the correct links? > Thanks, > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Cisco Systems From halr at voltaire.com Mon Mar 19 14:15:09 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Mar 2007 16:15:09 -0500 Subject: [ofa-general] broken links In-Reply-To: <3BEC1C9A-E20F-4F19-84A6-DCA423EF3F62@cisco.com> References: <200703191501.59020.arkady@netapp.com> <3BEC1C9A-E20F-4F19-84A6-DCA423EF3F62@cisco.com> Message-ID: <1174338907.4684.400325.camel@hal.voltaire.com> On Mon, 2007-03-19 at 15:12, Jeff Squyres wrote: > 1. The SVN server moved to svn.openfabrics.org > 2. The base for the SVN URLs move to /svn/openib/ > > For example, your first URL should be: > > https://svn.openfabrics.org/svn/openib/gen2/trunk/src/userspace/ > management/README > > However, you'll still get a 404 because just about the only thing > left in the OFA SVN is OFED 1.1; everything else has been "svn > rm"'ed. It's all there in the SVN history if you want it; see > https://svn.openfabrics.org/svn/openib/README.txt for the file "Where > in the world did all the OpenFabrics sources go?" > > Hope that helps... Thanks, but those may be out of date and these should really point into the "active" ones in the new git trees as they could be updated with more recent info. -- Hal > On Mar 19, 2007, at 4:01 PM, Arkady Kanevsky wrote: > > > The README for diagnostics building and running is located here: > > https://openib.org/svn/gen2/trunk/src/userspace/management/README > > > > and > > > > A more complete description and command syntax of the diagnostic > > tools can be > > found as: > > https://openib.org/svn/gen2/trunk/src/userspace/management/doc/ > > diagtools.txt > > > > are broken. > > What are the correct links? > > Thanks, > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > > openib-general > From Arkady.Kanevsky at netapp.com Mon Mar 19 13:22:11 2007 From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady) Date: Mon, 19 Mar 2007 16:22:11 -0400 Subject: [ofa-general] broken links In-Reply-To: <1174338907.4684.400325.camel@hal.voltaire.com> References: <200703191501.59020.arkady@netapp.com><3BEC1C9A-E20F-4F19-84A6-DCA423EF3F62@cisco.com> <1174338907.4684.400325.camel@hal.voltaire.com> Message-ID: Thanks. Can we at least remove the broken links from the WIKI? Hal, are you saying that we can not link document files in git from WIKI? Thanks, Arkady Kanevsky email: arkady at netapp.com Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Monday, March 19, 2007 5:15 PM > To: Jeff Squyres (jsquyres) > Cc: Kanevsky, Arkady; general at lists.openfabrics.org > Subject: Re: [ofa-general] broken links > > On Mon, 2007-03-19 at 15:12, Jeff Squyres wrote: > > 1. The SVN server moved to svn.openfabrics.org 2. The base > for the SVN > > URLs move to /svn/openib/ > > > > For example, your first URL should be: > > > > https://svn.openfabrics.org/svn/openib/gen2/trunk/src/userspace/ > > management/README > > > > However, you'll still get a 404 because just about the only > thing left > > in the OFA SVN is OFED 1.1; everything else has been "svn rm"'ed. > > It's all there in the SVN history if you want it; see > > https://svn.openfabrics.org/svn/openib/README.txt for the > file "Where > > in the world did all the OpenFabrics sources go?" > > > > Hope that helps... > > Thanks, but those may be out of date and these should really > point into the "active" ones in the new git trees as they > could be updated with more recent info. > > -- Hal > > > On Mar 19, 2007, at 4:01 PM, Arkady Kanevsky wrote: > > > > > The README for diagnostics building and running is located here: > > > https://openib.org/svn/gen2/trunk/src/userspace/management/README > > > > > > and > > > > > > A more complete description and command syntax of the diagnostic > > > tools can be found as: > > > https://openib.org/svn/gen2/trunk/src/userspace/management/doc/ > > > diagtools.txt > > > > > > are broken. > > > What are the correct links? > > > Thanks, > > > _______________________________________________ > > > general mailing list > > > general at lists.openfabrics.org > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > > > openib-general > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Mon Mar 19 13:26:22 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 19 Mar 2007 13:26:22 -0700 Subject: [ofa-general] broken links In-Reply-To: (Arkady Kanevsky's message of "Mon, 19 Mar 2007 16:22:11 -0400") References: <200703191501.59020.arkady@netapp.com> <3BEC1C9A-E20F-4F19-84A6-DCA423EF3F62@cisco.com> <1174338907.4684.400325.camel@hal.voltaire.com> Message-ID: > Can we at least remove the broken links from the WIKI? It's a wiki... so go ahead... From halr at voltaire.com Mon Mar 19 14:28:37 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Mar 2007 16:28:37 -0500 Subject: [ofa-general] broken links In-Reply-To: References: <200703191501.59020.arkady@netapp.com> <3BEC1C9A-E20F-4F19-84A6-DCA423EF3F62@cisco.com> <1174338907.4684.400325.camel@hal.voltaire.com> Message-ID: <1174339714.4684.401188.camel@hal.voltaire.com> On Mon, 2007-03-19 at 15:22, Kanevsky, Arkady wrote: > Thanks. > > Can we at least remove the broken links from the WIKI? Sure but let's see if they can be fixed properly first. > Hal, > are you saying that we can not link document files in git > from WIKI? No; just that I'm not sure right now how to do it. I need to do a little digging first. -- Hal > Thanks, > > Arkady Kanevsky email: arkady at netapp.com > Network Appliance Inc. phone: 781-768-5395 > 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 > Waltham, MA 02451 central phone: 781-768-5300 > > > > -----Original Message----- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Monday, March 19, 2007 5:15 PM > > To: Jeff Squyres (jsquyres) > > Cc: Kanevsky, Arkady; general at lists.openfabrics.org > > Subject: Re: [ofa-general] broken links > > > > On Mon, 2007-03-19 at 15:12, Jeff Squyres wrote: > > > 1. The SVN server moved to svn.openfabrics.org 2. The base > > for the SVN > > > URLs move to /svn/openib/ > > > > > > For example, your first URL should be: > > > > > > https://svn.openfabrics.org/svn/openib/gen2/trunk/src/userspace/ > > > management/README > > > > > > However, you'll still get a 404 because just about the only > > thing left > > > in the OFA SVN is OFED 1.1; everything else has been "svn rm"'ed. > > > It's all there in the SVN history if you want it; see > > > https://svn.openfabrics.org/svn/openib/README.txt for the > > file "Where > > > in the world did all the OpenFabrics sources go?" > > > > > > Hope that helps... > > > > Thanks, but those may be out of date and these should really > > point into the "active" ones in the new git trees as they > > could be updated with more recent info. > > > > -- Hal > > > > > On Mar 19, 2007, at 4:01 PM, Arkady Kanevsky wrote: > > > > > > > The README for diagnostics building and running is located here: > > > > https://openib.org/svn/gen2/trunk/src/userspace/management/README > > > > > > > > and > > > > > > > > A more complete description and command syntax of the diagnostic > > > > tools can be found as: > > > > https://openib.org/svn/gen2/trunk/src/userspace/management/doc/ > > > > diagtools.txt > > > > > > > > are broken. > > > > What are the correct links? > > > > Thanks, > > > > _______________________________________________ > > > > general mailing list > > > > general at lists.openfabrics.org > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > > > > openib-general > > > > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > From sashak at voltaire.com Mon Mar 19 13:41:31 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 19 Mar 2007 22:41:31 +0200 Subject: [ofa-general] broken links In-Reply-To: <1174338389.4684.399719.camel@hal.voltaire.com> References: <200703191501.59020.arkady@netapp.com> <1174338389.4684.399719.camel@hal.voltaire.com> Message-ID: <20070319204131.GQ19999@sashak.voltaire.com> On 16:06 Mon 19 Mar , Hal Rosenstock wrote: > On Mon, 2007-03-19 at 15:01, Arkady Kanevsky wrote: > > The README for diagnostics building and running is located here: > > https://openib.org/svn/gen2/trunk/src/userspace/management/README > > > > and > > > > A more complete description and command syntax of the diagnostic tools can be > > found as: > > https://openib.org/svn/gen2/trunk/src/userspace/management/doc/diagtools.txt > > > > are broken. > > What are the correct links? > > You would need to go into the management git repository via gitweb or > git. Not sure how to make up a constant link using gitweb. It appears to > have some magic numbers in the URLs. There are "raw" refernces in gitweb without magic numbers: http://git.openfabrics.org/git/?p=~halr/management.git;a=blob_plain;f=README;hb=HEAD , similar can be done for "blob" view: http://git.openfabrics.org/git/?p=~halr/management.git;a=blob;f=README;hb=HEAD Sasha From jsquyres at cisco.com Mon Mar 19 13:38:54 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Mon, 19 Mar 2007 16:38:54 -0400 Subject: [ofa-general] broken links In-Reply-To: <1174338907.4684.400325.camel@hal.voltaire.com> References: <200703191501.59020.arkady@netapp.com> <3BEC1C9A-E20F-4F19-84A6-DCA423EF3F62@cisco.com> <1174338907.4684.400325.camel@hal.voltaire.com> Message-ID: <5FEBEC05-5D07-427C-93C1-5A2133844987@cisco.com> On Mar 19, 2007, at 5:15 PM, Hal Rosenstock wrote: >> However, you'll still get a 404 because just about the only thing >> left in the OFA SVN is OFED 1.1; everything else has been "svn >> rm"'ed. It's all there in the SVN history if you want it; see >> https://svn.openfabrics.org/svn/openib/README.txt for the file "Where >> in the world did all the OpenFabrics sources go?" > > Thanks, but those may be out of date and these should really point > into > the "active" ones in the new git trees as they could be updated with > more recent info. I think what's in SVN is ok: 1. The first paragraph of the README.txt says: "The majority of content here in the OpenFabrics Subversion repository has been moved to various git-based repositories. See the OpenFabrics web site (http://www.openfabrics.org/) for details on where the git repositories are located and how they can be accessed." 2. The specific files in question were "svn rm"'ed so that they're not at the SVN HEAD anymore. So even if you get into SVN, you won't find the files you're talking about without diving into the history. I think that if someone ignores the README.txt and goes into the history to find old versions of those files, they deserve what they get. :-) -- Jeff Squyres Cisco Systems From halr at voltaire.com Mon Mar 19 14:40:40 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Mar 2007 16:40:40 -0500 Subject: [ofa-general] broken links In-Reply-To: <5FEBEC05-5D07-427C-93C1-5A2133844987@cisco.com> References: <200703191501.59020.arkady@netapp.com> <3BEC1C9A-E20F-4F19-84A6-DCA423EF3F62@cisco.com> <1174338907.4684.400325.camel@hal.voltaire.com> <5FEBEC05-5D07-427C-93C1-5A2133844987@cisco.com> Message-ID: <1174340437.4684.401966.camel@hal.voltaire.com> On Mon, 2007-03-19 at 15:38, Jeff Squyres wrote: > On Mar 19, 2007, at 5:15 PM, Hal Rosenstock wrote: > > >> However, you'll still get a 404 because just about the only thing > >> left in the OFA SVN is OFED 1.1; everything else has been "svn > >> rm"'ed. It's all there in the SVN history if you want it; see > >> https://svn.openfabrics.org/svn/openib/README.txt for the file "Where > >> in the world did all the OpenFabrics sources go?" > > > > Thanks, but those may be out of date and these should really point > > into > > the "active" ones in the new git trees as they could be updated with > > more recent info. > > I think what's in SVN is ok: > > 1. The first paragraph of the README.txt says: > > "The majority of content here in the OpenFabrics Subversion repository > has been moved to various git-based repositories. See the OpenFabrics > web site (http://www.openfabrics.org/) for details on where the git > repositories are located and how they can be accessed." > > 2. The specific files in question were "svn rm"'ed so that they're > not at the SVN HEAD anymore. So even if you get into SVN, you won't > find the files you're talking about without diving into the history. > > I think that if someone ignores the README.txt and goes into the > history to find old versions of those files, they deserve what they > get. :-) Sure but all I was trying to say is that the links should point elsewhere now and not into svn. -- Hal From jsquyres at cisco.com Mon Mar 19 13:46:41 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Mon, 19 Mar 2007 16:46:41 -0400 Subject: [ofa-general] broken links In-Reply-To: <1174340437.4684.401966.camel@hal.voltaire.com> References: <200703191501.59020.arkady@netapp.com> <3BEC1C9A-E20F-4F19-84A6-DCA423EF3F62@cisco.com> <1174338907.4684.400325.camel@hal.voltaire.com> <5FEBEC05-5D07-427C-93C1-5A2133844987@cisco.com> <1174340437.4684.401966.camel@hal.voltaire.com> Message-ID: On Mar 19, 2007, at 5:40 PM, Hal Rosenstock wrote: >> I think that if someone ignores the README.txt and goes into the >> history to find old versions of those files, they deserve what they >> get. :-) > > Sure but all I was trying to say is that the links should point > elsewhere now and not into svn. Agreed -- the wiki links should be updated. -- Jeff Squyres Cisco Systems From changquing.tang at hp.com Mon Mar 19 14:05:16 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Mon, 19 Mar 2007 21:05:16 -0000 Subject: [ofa-general] Re: Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com><1172845045.21241.0.camel@stevo-desktop><349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net><1172854585.21241.14.camel@stevo-desktop><1172854873.21241.19.camel@stevo-desktop><349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net><1172856154.21241.34.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039979EC@G3W0634.americas.hpqcorp.net> <20070312083523.GB4928@mellanox.co.il> Message-ID: <349DCDA352EACF42A0C49FA6DCEA8403C01D39@G3W0634.americas.hpqcorp.net> Can you have more details how you do it and how APIs are changed ? Thanks. We just want ibv_get_async_event() to be a non-blocking call, you told me to set the file descriptor to non-blocking to make this function non-blocking. But if you can set it by default, that would be best, because everyone can accept this change. Also can we make the same change to ibv_get_cq_event() ? I hope all functions in IB are non-blocking. --CQ > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Roland Dreier > Sent: Monday, March 19, 2007 2:29 PM > To: Michael S. Tsirkin > Cc: General at lists.openfabrics.org > Subject: Re: [ofa-general] Re: Re: Is ibv_get_async_event() a > blocking call ? > > > Should we add a size parameter for event channels? > > And, we might need to add "event channel overrun" flag as well. > > > > If we want to address the problem in this way, we need to > do this before > libibverbs 1.1 freezes I think. > > I think since no one is hitting this in practice we can take > our time here and get it right. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Mon Mar 19 14:16:48 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 19 Mar 2007 14:16:48 -0700 Subject: [ofa-general] Re: Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA8403C01D39@G3W0634.americas.hpqcorp.net> (Changqing Tang's message of "Mon, 19 Mar 2007 21:05:16 -0000") References: <000201c75c5e$224e5a70$ff0da8c0@amr.corp.intel.com> <1172845045.21241.0.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net> <1172854585.21241.14.camel@stevo-desktop> <1172854873.21241.19.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net> <1172856154.21241.34.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039979EC@G3W0634.americas.hpqcorp.net> <20070312083523.GB4928@mellanox.co.il> <349DCDA352EACF42A0C49FA6DCEA8403C01D39@G3W0634.americas.hpqcorp.net> Message-ID: Changqing> Can you have more details how you do it and how APIs Changqing> are changed ? Thanks. I don't think there's any real plan to change the API. Changqing> We just want ibv_get_async_event() to be a non-blocking Changqing> call, you told me to set the file descriptor to Changqing> non-blocking to make this function non-blocking. But if Changqing> you can set it by default, that would be best, because Changqing> everyone can accept this change. Changqing> Also can we make the same change to ibv_get_cq_event() Changqing> ? I hope all functions in IB are non-blocking. I don't see a good reason to change the current behavior here. I think it's less surprising for file descriptors to be blocking by default. And if you really want them to be nonblocking, it just takes one call to fcntl(fd, F_SETFL, O_NONBLOCK) (which is what the library would have to do anyway). - R. From sean.hefty at intel.com Mon Mar 19 14:29:25 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 19 Mar 2007 14:29:25 -0700 Subject: [ofa-general] [RFC] host stack IB-to-IB router support Message-ID: <000301c76a6d$ab61ee90$c9d8180a@amr.corp.intel.com> Based on previous e-mail threads, this is my plan for implementing IB-to-IB router support in the host stack capable of supporting RC communication. Note that this work is part of the PathForward project aimed at supporting early IB-to-IB router development. It is not intended to define IB router architecture. 1. Extend struct ib_cm_req_param: struct ib_cm_req_param { struct ib_sa_path_rec *primary_path; struct ib_sa_path_rec *alternate_path; + struct ib_sa_path_rec *remote_primary_path; + struct ib_sa_path_rec *remote_alternate_path; The remote path information would be valid only if the provided paths had a hop_limit > 1, but could also be used to support paths where reversible = 0. 2. Add an ib_remote_sa module. This module would be responsible for obtaining remote path information. Because the architecture does not define how this information is obtained, my intent is to encapsulate this functionality into a single module to simplify out of tree maintenance. Its basic operation is: a. Local ib_remote_sa sends query request to remote ib_remote_sa. b. Remote ib_remote_sa queries its local SA. c. Remote ib_remote_sa sends query response to local ib_remote_sa. I expect the ib_remote_sa implementation to be a temporary solution only. It will layer above either the ib_mad or ib_cm services, whichever ends up being easier. 3. Extend the rdma_cm route resolution to include remote route lookup. - Sean From rdreier at cisco.com Mon Mar 19 14:48:24 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 19 Mar 2007 14:48:24 -0700 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count to completion event channels In-Reply-To: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> (Dotan Barak's message of "Mon, 12 Mar 2007 12:00:43 +0200") References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> Message-ID: Thanks. However, it seems racy to me to use the mutex as you do here. I think if a consumer calls ibv_create_cq() and ibv_destroy_comp_channel() simultaneously (for the same completion channel obviously), exactly one of the two calls should succeed. As it stands now, you might have a comp channel that starts with a refcnt of 0, have the test in ibv_destroy_comp_channel() succeed, then have ibv_create_cq() use the completion channel and bump the refcnt too late and also return success. From rdreier at cisco.com Mon Mar 19 14:49:22 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 19 Mar 2007 14:49:22 -0700 Subject: [ofa-general] [PATCH] ib_ipoib: fix race detaching from mcast group before attaching In-Reply-To: <000301c761de$acd913d0$ff0da8c0@amr.corp.intel.com> (Sean Hefty's message of "Thu, 8 Mar 2007 16:05:41 -0800") References: <000301c761de$acd913d0$ff0da8c0@amr.corp.intel.com> Message-ID: What's the theory here? It's not obvious why moving the call to ib_sa_free_multicast() fixes the race... From sean.hefty at intel.com Mon Mar 19 14:56:06 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 19 Mar 2007 14:56:06 -0700 Subject: [ofa-general] [PATCH] ib_ipoib: fix race detaching from mcast group before attaching In-Reply-To: Message-ID: <000401c76a71$653fd7c0$c9d8180a@amr.corp.intel.com> >What's the theory here? It's not obvious why moving the call to >ib_sa_free_multicast() fixes the race... The attach QP only occurs in the context of the multicast callback thread. ib_sa_free_multicast() blocks until the callback returns, which ensures that the detach check/call (which is now done after ib_sa_free_multicast) cannot race with the attach call. - Sean From rdreier at cisco.com Mon Mar 19 14:58:37 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 19 Mar 2007 14:58:37 -0700 Subject: [ofa-general] [PATCH] ib_ipoib: fix race detaching from mcast group before attaching In-Reply-To: <000401c76a71$653fd7c0$c9d8180a@amr.corp.intel.com> (Sean Hefty's message of "Mon, 19 Mar 2007 14:56:06 -0700") References: <000401c76a71$653fd7c0$c9d8180a@amr.corp.intel.com> Message-ID: > The attach QP only occurs in the context of the multicast callback thread. > ib_sa_free_multicast() blocks until the callback returns, which ensures that the > detach check/call (which is now done after ib_sa_free_multicast) cannot race > with the attach call. OK, makes sense. Do we have confirmation that this fixes the original problem? If so I'll queue this up... - R. From sean.hefty at intel.com Mon Mar 19 15:07:10 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 19 Mar 2007 15:07:10 -0700 Subject: [ofa-general] [PATCH] ib_ipoib: fix race detaching from mcast group before attaching In-Reply-To: Message-ID: <000501c76a72$f173e9b0$c9d8180a@amr.corp.intel.com> >OK, makes sense. Do we have confirmation that this fixes the original >problem? If so I'll queue this up... Yes - the mcast detach error messages in the OFED stack went away with this fix in place. I pushed the patch up to: git://git.openfabrics.org/~shefty/rdma-dev.git for-roland - Sean From jgunthorpe at obsidianresearch.com Mon Mar 19 15:16:17 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Mon, 19 Mar 2007 16:16:17 -0600 Subject: [ofa-general] Re: [RFC] host stack IB-to-IB router support In-Reply-To: <000301c76a6d$ab61ee90$c9d8180a@amr.corp.intel.com> References: <000301c76a6d$ab61ee90$c9d8180a@amr.corp.intel.com> Message-ID: <20070319221617.GA2986@obsidianresearch.com> On Mon, Mar 19, 2007 at 02:29:25PM -0700, Sean Hefty wrote: > Based on previous e-mail threads, this is my plan for implementing IB-to-IB > router support in the host stack capable of supporting RC communication. Note > that this work is part of the PathForward project aimed at supporting early > IB-to-IB router development. It is not intended to define IB router > architecture. Would it become part of openfabrics or just as a 3rd party patch that interested parties could apply? > 1. Extend struct ib_cm_req_param: > > struct ib_cm_req_param { > struct ib_sa_path_rec *primary_path; > struct ib_sa_path_rec *alternate_path; > + struct ib_sa_path_rec *remote_primary_path; > + struct ib_sa_path_rec *remote_alternate_path; So the idea is that the CM REQ now uses remote_*_path to set the fields? You intend to go with the notion that IBA specifies the active sides sets the passive's LIDs in all cases? > 2. Add an ib_remote_sa module. It looks like there is a new wire protocol to support this? Does it still work if, for instance, a modified active side tries to talk to an arbitary existing target (such as storage or something)? Broadly, do you describe having the active side send a seperate GMP to the passive node to do PR queries and then sending a CM REQ? > 3. Extend the rdma_cm route resolution to include remote route lookup. This means produce a ib_cm_req_param using ib_remote_sa when appropriate right? How does this compare to the idea of using the LIDs from the REQ's LRH on the passive side? Thanks, Jason From Kapil.Dukle at med.ge.com Mon Mar 19 15:36:44 2007 From: Kapil.Dukle at med.ge.com (Dukle, Kapil (GE Healthcare)) Date: Mon, 19 Mar 2007 18:36:44 -0400 Subject: [ofa-general] ibping fails in loopback mode Message-ID: Hi all, I'm having trouble getting ibping to work in loopback node. The original config is to connect first port of Blade 1 with first port of Blade 2. For my loopback test, I have connected the Infiniband cable between the 2 ports on the same Linux blade. Both opensm and ibping (server mode) seem to be running on the blade. Ping succeeds to the first active port, but fails for the second one. See below... Why is the ibping to the second active (and linkup) port failing? Am I missing something? The version is OFED1.0 [root at XXXX]# ps -elf | grep opensm 4 S root 3078 1 0 76 0 - 19039 stext Mar16 ? 00:00:09 /usr/sbin/opensm 4 S root 5444 5424 0 77 0 - 13981 pipe_w 17:22 pts/1 00:00:00 grep opensm [root at XXXX]# ibping -v -S & [1] 5445 [root at vre sdc]# ibwarn: [5445] ibping_serv: starting to serve... [root at XXXX]# ps -elf | grep ibping 4 S root 5445 5424 0 77 0 - 1454 - 17:22 pts/1 00:00:00 ibping -v -S 4 S root 5447 5424 0 77 0 - 13982 - 17:22 pts/1 00:00:00 grep ibping [root at XXXX]# sminfo sminfo: sm lid 0x3 sm guid 0x3ba00010027d9, activity count 203459 priority 1 state SMINFO_MASTER 3 [root at XXXX]# ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.400 Hardware version: a0 Node GUID: 0x0003ba00010027d8 System image GUID: 0x0003ba00010027db Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 3 LMC: 0 SM lid: 3 Capability mask: 0x02510a6a Port GUID: 0x0003ba00010027d9 Port 2: State: Active Physical state: LinkUp Rate: 10 Base lid: 1 LMC: 0 SM lid: 3 Capability mask: 0x02510a68 Port GUID: 0x0003ba00010027da [root at XXXX]# ibping -v 3 ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.111 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.087 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.069 ms ibwarn: [5452] report: out due signal 2 --- vre.(none) (Lid 0x3) ibping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2320 ms rtt min/avg/max = 0.069/0.089/0.111 ms [root at XXXX]# ibping -v 1 ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] report: out due signal 2 --- (Lid 0x1) ibping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 11358 ms rtt min/avg/max = 0.000/0.000/0.000 ms Thanks, Kapil -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kapil.Dukle at med.ge.com Mon Mar 19 15:36:44 2007 From: Kapil.Dukle at med.ge.com (Dukle, Kapil (GE Healthcare)) Date: Mon, 19 Mar 2007 18:36:44 -0400 Subject: [ofa-general] ibping fails in loopback mode Message-ID: Hi all, I'm having trouble getting ibping to work in loopback node. The original config is to connect first port of Blade 1 with first port of Blade 2. For my loopback test, I have connected the Infiniband cable between the 2 ports on the same Linux blade. Both opensm and ibping (server mode) seem to be running on the blade. Ping succeeds to the first active port, but fails for the second one. See below... Why is the ibping to the second active (and linkup) port failing? Am I missing something? The version is OFED1.0 [root at XXXX]# ps -elf | grep opensm 4 S root 3078 1 0 76 0 - 19039 stext Mar16 ? 00:00:09 /usr/sbin/opensm 4 S root 5444 5424 0 77 0 - 13981 pipe_w 17:22 pts/1 00:00:00 grep opensm [root at XXXX]# ibping -v -S & [1] 5445 [root at vre sdc]# ibwarn: [5445] ibping_serv: starting to serve... [root at XXXX]# ps -elf | grep ibping 4 S root 5445 5424 0 77 0 - 1454 - 17:22 pts/1 00:00:00 ibping -v -S 4 S root 5447 5424 0 77 0 - 13982 - 17:22 pts/1 00:00:00 grep ibping [root at XXXX]# sminfo sminfo: sm lid 0x3 sm guid 0x3ba00010027d9, activity count 203459 priority 1 state SMINFO_MASTER 3 [root at XXXX]# ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.400 Hardware version: a0 Node GUID: 0x0003ba00010027d8 System image GUID: 0x0003ba00010027db Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 3 LMC: 0 SM lid: 3 Capability mask: 0x02510a6a Port GUID: 0x0003ba00010027d9 Port 2: State: Active Physical state: LinkUp Rate: 10 Base lid: 1 LMC: 0 SM lid: 3 Capability mask: 0x02510a68 Port GUID: 0x0003ba00010027da [root at XXXX]# ibping -v 3 ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.111 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.087 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.069 ms ibwarn: [5452] report: out due signal 2 --- vre.(none) (Lid 0x3) ibping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2320 ms rtt min/avg/max = 0.069/0.089/0.111 ms [root at XXXX]# ibping -v 1 ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] report: out due signal 2 --- (Lid 0x1) ibping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 11358 ms rtt min/avg/max = 0.000/0.000/0.000 ms Thanks, Kapil -------------- next part -------------- An HTML attachment was scrubbed... URL: From sashak at voltaire.com Mon Mar 19 15:56:51 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 20 Mar 2007 00:56:51 +0200 Subject: [ofa-general] ibping fails in loopback mode In-Reply-To: References: Message-ID: <20070319225651.GZ19999@sashak.voltaire.com> On 18:36 Mon 19 Mar , Dukle, Kapil (GE Healthcare) wrote: > Hi all, > > I'm having trouble getting ibping to work in loopback node. The original > config is to connect first port of Blade 1 with first port of Blade 2. > For my loopback test, I have connected the Infiniband cable between the > 2 ports on the same Linux blade. > Both opensm and ibping (server mode) seem to be running on the blade. > Ping succeeds to the first active port, but fails > for the second one. See below... > > Why is the ibping to the second active (and linkup) port failing? Am I > missing something? The version is OFED1.0 > > > [root at XXXX]# ps -elf | grep opensm > 4 S root 3078 1 0 76 0 - 19039 stext Mar16 ? > 00:00:09 /usr/sbin/opensm > 4 S root 5444 5424 0 77 0 - 13981 pipe_w 17:22 pts/1 > 00:00:00 grep opensm > > [root at XXXX]# ibping -v -S & > [1] 5445 > [root at vre sdc]# ibwarn: [5445] ibping_serv: starting to serve... I guess you need to start ibping server for both ports (with -C, -P), so each one will answer "ping". Sasha > > [root at XXXX]# ps -elf | grep ibping > 4 S root 5445 5424 0 77 0 - 1454 - 17:22 pts/1 > 00:00:00 ibping -v -S > 4 S root 5447 5424 0 77 0 - 13982 - 17:22 pts/1 > 00:00:00 grep ibping > > [root at XXXX]# sminfo > sminfo: sm lid 0x3 sm guid 0x3ba00010027d9, activity count 203459 > priority 1 state SMINFO_MASTER 3 > > [root at XXXX]# ibstat > CA 'mthca0' > CA type: MT25208 (MT23108 compat mode) > Number of ports: 2 > Firmware version: 4.7.400 > Hardware version: a0 > Node GUID: 0x0003ba00010027d8 > System image GUID: 0x0003ba00010027db > Port 1: > State: Active > Physical state: LinkUp > Rate: 10 > Base lid: 3 > LMC: 0 > SM lid: 3 > Capability mask: 0x02510a6a > Port GUID: 0x0003ba00010027d9 > Port 2: > State: Active > Physical state: LinkUp > Rate: 10 > Base lid: 1 > LMC: 0 > SM lid: 3 > Capability mask: 0x02510a68 > Port GUID: 0x0003ba00010027da > > [root at XXXX]# ibping -v 3 > ibwarn: [5452] ibping: Ping.. > ibwarn: [5445] ibping_serv: Pong: vre.(none) > Pong from vre.(none) (Lid 0x3): time 0.111 ms > ibwarn: [5452] ibping: Ping.. > ibwarn: [5445] ibping_serv: Pong: vre.(none) > Pong from vre.(none) (Lid 0x3): time 0.087 ms > ibwarn: [5452] ibping: Ping.. > ibwarn: [5445] ibping_serv: Pong: vre.(none) > Pong from vre.(none) (Lid 0x3): time 0.069 ms > ibwarn: [5452] report: out due signal 2 > > --- vre.(none) (Lid 0x3) ibping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 2320 ms > rtt min/avg/max = 0.069/0.089/0.111 ms > > [root at XXXX]# ibping -v 1 > ibwarn: [5449] ibping: Ping.. > ibwarn: [5449] main: ibping to Lid 0x1 failed > ibwarn: [5449] ibping: Ping.. > ibwarn: [5449] main: ibping to Lid 0x1 failed > ibwarn: [5449] ibping: Ping.. > ibwarn: [5449] main: ibping to Lid 0x1 failed > ibwarn: [5449] report: out due signal 2 > > --- (Lid 0x1) ibping statistics --- > 3 packets transmitted, 0 received, 100% packet loss, time 11358 ms > rtt min/avg/max = 0.000/0.000/0.000 ms > > > Thanks, > Kapil > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sashak at voltaire.com Mon Mar 19 15:56:51 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 20 Mar 2007 00:56:51 +0200 Subject: [ofa-general] ibping fails in loopback mode In-Reply-To: References: Message-ID: <20070319225651.GZ19999@sashak.voltaire.com> On 18:36 Mon 19 Mar , Dukle, Kapil (GE Healthcare) wrote: > Hi all, > > I'm having trouble getting ibping to work in loopback node. The original > config is to connect first port of Blade 1 with first port of Blade 2. > For my loopback test, I have connected the Infiniband cable between the > 2 ports on the same Linux blade. > Both opensm and ibping (server mode) seem to be running on the blade. > Ping succeeds to the first active port, but fails > for the second one. See below... > > Why is the ibping to the second active (and linkup) port failing? Am I > missing something? The version is OFED1.0 > > > [root at XXXX]# ps -elf | grep opensm > 4 S root 3078 1 0 76 0 - 19039 stext Mar16 ? > 00:00:09 /usr/sbin/opensm > 4 S root 5444 5424 0 77 0 - 13981 pipe_w 17:22 pts/1 > 00:00:00 grep opensm > > [root at XXXX]# ibping -v -S & > [1] 5445 > [root at vre sdc]# ibwarn: [5445] ibping_serv: starting to serve... I guess you need to start ibping server for both ports (with -C, -P), so each one will answer "ping". Sasha > > [root at XXXX]# ps -elf | grep ibping > 4 S root 5445 5424 0 77 0 - 1454 - 17:22 pts/1 > 00:00:00 ibping -v -S > 4 S root 5447 5424 0 77 0 - 13982 - 17:22 pts/1 > 00:00:00 grep ibping > > [root at XXXX]# sminfo > sminfo: sm lid 0x3 sm guid 0x3ba00010027d9, activity count 203459 > priority 1 state SMINFO_MASTER 3 > > [root at XXXX]# ibstat > CA 'mthca0' > CA type: MT25208 (MT23108 compat mode) > Number of ports: 2 > Firmware version: 4.7.400 > Hardware version: a0 > Node GUID: 0x0003ba00010027d8 > System image GUID: 0x0003ba00010027db > Port 1: > State: Active > Physical state: LinkUp > Rate: 10 > Base lid: 3 > LMC: 0 > SM lid: 3 > Capability mask: 0x02510a6a > Port GUID: 0x0003ba00010027d9 > Port 2: > State: Active > Physical state: LinkUp > Rate: 10 > Base lid: 1 > LMC: 0 > SM lid: 3 > Capability mask: 0x02510a68 > Port GUID: 0x0003ba00010027da > > [root at XXXX]# ibping -v 3 > ibwarn: [5452] ibping: Ping.. > ibwarn: [5445] ibping_serv: Pong: vre.(none) > Pong from vre.(none) (Lid 0x3): time 0.111 ms > ibwarn: [5452] ibping: Ping.. > ibwarn: [5445] ibping_serv: Pong: vre.(none) > Pong from vre.(none) (Lid 0x3): time 0.087 ms > ibwarn: [5452] ibping: Ping.. > ibwarn: [5445] ibping_serv: Pong: vre.(none) > Pong from vre.(none) (Lid 0x3): time 0.069 ms > ibwarn: [5452] report: out due signal 2 > > --- vre.(none) (Lid 0x3) ibping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 2320 ms > rtt min/avg/max = 0.069/0.089/0.111 ms > > [root at XXXX]# ibping -v 1 > ibwarn: [5449] ibping: Ping.. > ibwarn: [5449] main: ibping to Lid 0x1 failed > ibwarn: [5449] ibping: Ping.. > ibwarn: [5449] main: ibping to Lid 0x1 failed > ibwarn: [5449] ibping: Ping.. > ibwarn: [5449] main: ibping to Lid 0x1 failed > ibwarn: [5449] report: out due signal 2 > > --- (Lid 0x1) ibping statistics --- > 3 packets transmitted, 0 received, 100% packet loss, time 11358 ms > rtt min/avg/max = 0.000/0.000/0.000 ms > > > Thanks, > Kapil > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From kuznet at ms2.inr.ac.ru Mon Mar 19 16:20:43 2007 From: kuznet at ms2.inr.ac.ru (Alexey Kuznetsov) Date: Tue, 20 Mar 2007 02:20:43 +0300 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070319151336.GA24225@mellanox.co.il> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318195355.GB11078@mellanox.co.il> <20070318201826.GB27004@ms2.inr.ac.ru> <20070319093632.GB8386@mellanox.co.il> <20070319120534.GA28187@ms2.inr.ac.ru> <20070319121248.GD18497@mellanox.co.il> <20070319125919.GA4239@ms2.inr.ac.ru> <20070319151336.GA24225@mellanox.co.il> Message-ID: <20070319232043.GA23359@ms2.inr.ac.ru> Hello! > This might work. Could you post a patch to better show what you mean to do? Here it is. ->neigh_destructor() is killed (not used), replaced with ->neigh_cleanup(), which is called when neighbor entry goes to dead state. At this point everything is still valid: neigh->dev, neigh->parms etc. The device should guarantee that dead neighbor entries (neigh->dead != 0) do not get private part initialized, otherwise nobody will cleanup it. I think this is enough for ipoib which is the only user of this thing. Initialization private part of neighbor entries happens in ipib start_xmit routine, which is not reached when device is down. But it would be better to add explicit test for neigh->dead in any case. diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index f9dbc6f..2b5c297 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -814,7 +814,7 @@ static void ipoib_set_mcast_list(struct queue_work(ipoib_workqueue, &priv->restart_task); } -static void ipoib_neigh_destructor(struct neighbour *n) +static void ipoib_neigh_cleanup(struct neighbour *n) { struct ipoib_neigh *neigh; struct ipoib_dev_priv *priv = netdev_priv(n->dev); @@ -822,7 +822,7 @@ static void ipoib_neigh_destructor(struc struct ipoib_ah *ah = NULL; ipoib_dbg(priv, - "neigh_destructor for %06x " IPOIB_GID_FMT "\n", + "neigh_cleanup for %06x " IPOIB_GID_FMT "\n", IPOIB_QPN(n->ha), IPOIB_GID_RAW_ARG(n->ha + 4)); @@ -874,7 +874,7 @@ void ipoib_neigh_free(struct net_device static int ipoib_neigh_setup_dev(struct net_device *dev, struct neigh_parms *parms) { - parms->neigh_destructor = ipoib_neigh_destructor; + parms->neigh_cleanup = ipoib_neigh_cleanup; return 0; } diff --git a/include/net/neighbour.h b/include/net/neighbour.h index 3725b93..ad7fe11 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -36,7 +36,7 @@ struct neigh_parms struct net_device *dev; struct neigh_parms *next; int (*neigh_setup)(struct neighbour *); - void (*neigh_destructor)(struct neighbour *); + void (*neigh_cleanup)(struct neighbour *); struct neigh_table *tbl; void *sysctl_table; diff --git a/net/atm/clip.c b/net/atm/clip.c index ebb5d0c..8c38258 100644 --- a/net/atm/clip.c +++ b/net/atm/clip.c @@ -261,14 +261,6 @@ static void clip_pop(struct atm_vcc *vcc spin_unlock_irqrestore(&PRIV(dev)->xoff_lock, flags); } -static void clip_neigh_destroy(struct neighbour *neigh) -{ - DPRINTK("clip_neigh_destroy (neigh %p)\n", neigh); - if (NEIGH2ENTRY(neigh)->vccs) - printk(KERN_CRIT "clip_neigh_destroy: vccs != NULL !!!\n"); - NEIGH2ENTRY(neigh)->vccs = (void *) NEIGHBOR_DEAD; -} - static void clip_neigh_solicit(struct neighbour *neigh, struct sk_buff *skb) { DPRINTK("clip_neigh_solicit (neigh %p, skb %p)\n", neigh, skb); @@ -342,7 +334,6 @@ static struct neigh_table clip_tbl = { /* parameters are copied from ARP ... */ .parms = { .tbl = &clip_tbl, - .neigh_destructor = clip_neigh_destroy, .base_reachable_time = 30 * HZ, .retrans_time = 1 * HZ, .gc_staletime = 60 * HZ, diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 3183142..cfc6001 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -140,6 +140,8 @@ static int neigh_forced_gc(struct neigh_ n->dead = 1; shrunk = 1; write_unlock(&n->lock); + if (n->parms->neigh_cleanup) + n->parms->neigh_cleanup(n); neigh_release(n); continue; } @@ -211,6 +213,8 @@ static void neigh_flush_dev(struct neigh NEIGH_PRINTK2("neigh %p is stray.\n", n); } write_unlock(&n->lock); + if (n->parms->neigh_cleanup) + n->parms->neigh_cleanup(n); neigh_release(n); } } @@ -582,9 +586,6 @@ void neigh_destroy(struct neighbour *nei kfree(hh); } - if (neigh->parms->neigh_destructor) - (neigh->parms->neigh_destructor)(neigh); - skb_queue_purge(&neigh->arp_queue); dev_put(neigh->dev); @@ -675,6 +676,8 @@ static void neigh_periodic_timer(unsigne *np = n->next; n->dead = 1; write_unlock(&n->lock); + if (n->parms->neigh_cleanup) + n->parms->neigh_cleanup(n); neigh_release(n); continue; } @@ -2088,8 +2091,11 @@ void __neigh_for_each_release(struct nei } else np = &n->next; write_unlock(&n->lock); - if (release) + if (release) { + if (n->parms->neigh_cleanup) + n->parms->neigh_cleanup(n); neigh_release(n); + } } } } From pradeep at us.ibm.com Mon Mar 19 16:39:38 2007 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Mon, 19 Mar 2007 16:39:38 -0700 Subject: [ofa-general] Bug in IPOIB CM Message-ID: I see a trivial bug in ipoib_cm_stale_task (). The time_after_eq() replaces elements whose timer has not yet expired. Instead, one must use time_before_eq(). Should I supply a patch for this? Pradeep pradeep at us.ibm.com From rdreier at cisco.com Mon Mar 19 16:57:48 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 19 Mar 2007 16:57:48 -0700 Subject: [ofa-general] Bug in IPOIB CM In-Reply-To: (Pradeep Satyanarayana's message of "Mon, 19 Mar 2007 16:39:38 -0700") References: Message-ID: > I see a trivial bug in ipoib_cm_stale_task (). The time_after_eq() > replaces elements whose timer has not yet expired. Instead, one must use > time_before_eq(). Should I supply a patch for this? I assume you mean this code: p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list); if (time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT)) break; that does look wrong to me, since the time_after_eq() test seems to be true when we would want to free the connection. - R. From sean.hefty at intel.com Mon Mar 19 17:12:42 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 19 Mar 2007 17:12:42 -0700 Subject: [ofa-general] RE: [RFC] host stack IB-to-IB router support In-Reply-To: <20070319221617.GA2986@obsidianresearch.com> Message-ID: <000601c76a84$7aaaf500$c9d8180a@amr.corp.intel.com> >Would it become part of openfabrics or just as a 3rd party >patch that interested parties could apply? Portions of the changes should be suitable for upstream submission. The ib_remote_sa module could be added to an OFED release if there was enough demand. >So the idea is that the CM REQ now uses remote_*_path to set the >fields? You intend to go with the notion that IBA specifies the >active sides sets the passive's LIDs in all cases? Correct - this keeps the CM compliant with what's defined in the spec. >> 2. Add an ib_remote_sa module. > >It looks like there is a new wire protocol to support this? Basically. I would likely just exchange SA GET/GET response formatted MADs between ib_remote_sa's, but use a vendor defined MgmtClass. >Does it still work if, for instance, a modified active side tries to >talk to an arbitary existing target (such as storage or something)? The passive side must handle routed requests (include the GRH when forming the AH for the response), but, yes, I would expect this to work. One of my reasons for separating the ib_remote_sa into a separate module is that allows running it on an arbitrary node on the remote subnet. This should support unmodified targets. Additional work would just be needed to locate the remote ib_remote_sa service. (My initial implementation will just expect to find the ib_remote_sa service running at the DGID.) I would consider a slightly different design if this were taken too far, however. For example, issuing the query might belong in the kernel, but responding to the query might belong in userspace. It would depend on the evolution of the architecture. For now I was thinking of combining the functionality simply for ease of implementation. >Broadly, do you describe having the active side send a seperate GMP to >the passive node to do PR queries and then sending a CM REQ? Yes. >> 3. Extend the rdma_cm route resolution to include remote route lookup. > >This means produce a ib_cm_req_param using ib_remote_sa when >appropriate right? Yes >How does this compare to the idea of using the LIDs from the REQ's LRH >on the passive side? If I'm understanding this idea correctly, the LIDs from the REQ's LRH replace the primary_local_lid and primary_remote_lid fields carried in the REQ. Assuming that the other path fields in the REQ are usable, this is a much simpler approach. (This could be done by the ib_cm itself, or a loadable module that snooped CM REQ messages. The latter would avoid changes to the CM code if that mattered.) I can't really evaluate the trade-offs without an idea of how long such a temporary solution will be needed, and how much functionality it needs to have. Replacing LIDs in a REQ is easy enough that I can just do that while working on a more flexible solution. My proposal stays spec compliant and could be extended to support unmodified targets, but requires a more defined wire protocol. The drawback to extending it is that we push further into non-architected areas. - Sean From pradeep at us.ibm.com Mon Mar 19 17:30:58 2007 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Mon, 19 Mar 2007 17:30:58 -0700 Subject: [ofa-general] Bug in IPOIB CM In-Reply-To: Message-ID: Exactly. time_after_eq() is returning false (when the timer has not yet expired) and hence deleting active connections and destroying qps etc. Pradeep pradeep at us.ibm.com Roland Dreier wrote on 03/19/2007 04:57:48 PM: > > I see a trivial bug in ipoib_cm_stale_task (). The time_after_eq() > > replaces elements whose timer has not yet expired. Instead, one must use > > time_before_eq(). Should I supply a patch for this? > > I assume you mean this code: > > p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list); > if (time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT)) > break; > > that does look wrong to me, since the time_after_eq() test seems to be > true when we would want to free the connection. > > - R. From jgunthorpe at obsidianresearch.com Mon Mar 19 19:04:43 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Mon, 19 Mar 2007 20:04:43 -0600 Subject: [ofa-general] Re: [RFC] host stack IB-to-IB router support In-Reply-To: <000601c76a84$7aaaf500$c9d8180a@amr.corp.intel.com> References: <20070319221617.GA2986@obsidianresearch.com> <000601c76a84$7aaaf500$c9d8180a@amr.corp.intel.com> Message-ID: <20070320020443.GF5740@obsidianresearch.com> On Mon, Mar 19, 2007 at 05:12:42PM -0700, Sean Hefty wrote: > >> 2. Add an ib_remote_sa module. > > > >It looks like there is a new wire protocol to support this? > > Basically. I would likely just exchange SA GET/GET response formatted MADs > between ib_remote_sa's, but use a vendor defined MgmtClass. > > >Does it still work if, for instance, a modified active side tries to > >talk to an arbitary existing target (such as storage or something)? > > The passive side must handle routed requests (include the GRH when forming the > AH for the response), but, yes, I would expect this to work. I think this is probably the best reason to doing something like this. Even if existing targets don't work right out of the box the changes to properly make a GRH should be fairly minor.. > One of my reasons for separating the ib_remote_sa into a separate > module is that allows running it on an arbitrary node on the remote > subnet. This should support unmodified targets. Additional work > would just be needed to locate the remote ib_remote_sa service. (My > initial implementation will just expect to find the ib_remote_sa > service running at the DGID.) Hmm. If the goal is enable router development and experimentation then it would be best if the 'ib_remote_sa' server was in user space, delt with all 4 path records in one query and was centralized so it could be made to store routing topology and configuration to solve the multipath problems. Otherwise I think you are better to just talk directly to the SA. Maybe the best thing here is to have a simple ib_remote_sa client module that just consults a list of servers and makes a normal SA query. People working on multipath router support could then extend that to specify a non-SA server and a new 4 path query type. A list something like: 2001::/64 2001:1 SA 2001::/64 2001:2 SA 2002::/64 2000:1 not-SA <-- On the local subnet.. new 4 PR format Set via netlink or sysfs.. To start with no ib_remote_sa server would be needed, just a boot script to set the expected SA addresses. You could define the MAD format for a new 4 PR query but not implement a server to handle it. > If I'm understanding this idea correctly, the LIDs from the REQ's > LRH replace the primary_local_lid and primary_remote_lid fields > carried in the REQ. Assuming that the other path fields in the REQ > are usable, this is a much simpler approach. (This could be done by > the ib_cm itself, or a loadable module that snooped CM REQ messages. > The latter would avoid changes to the CM code if that mattered.) My guess is that having a way for conformant 'unmodified' targets to work is fairly important for alot of interesting applications at the start. Otherwise using the LRH method is probably much better due to the simplicity. I particularly like it because it can intrinsicly support even the most complex routed environments, although without APM/etc. Do you have any idea what the PathForward program expects to do here? Jason From sean.hefty at intel.com Mon Mar 19 20:26:22 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 19 Mar 2007 20:26:22 -0700 Subject: [ofa-general] RE: [RFC] host stack IB-to-IB router support In-Reply-To: <20070320020443.GF5740@obsidianresearch.com> Message-ID: <000201c76a9f$887e4770$8ffc070a@amr.corp.intel.com> >Hmm. If the goal is enable router development and experimentation then >it would be best if the 'ib_remote_sa' server was in user space, delt >with all 4 path records in one query and was centralized so it could >be made to store routing topology and configuration to solve the >multipath problems. Otherwise I think you are better to just talk >directly to the SA. Unfortunately, at least opensm cannot respond to SA queries issued from a remote subnet. I'm not sure how much work this would take to fix, or if other SAs have this issue. Hal briefly looked at the problems, and I do plan on trying to fix them. But that still leaves trying to find the remote SA, handling SA failover, etc. This is why I'm bouncing queries through an intermediary. I see two separate pieces that are needed: an interface to query for the path info, and a mechanism to provide it. At least the former is needed in the kernel, and I can at least envision that the implementation of this piece could evolve into some final solution. But at this point, the query response mechanism seems like throw-away code. >Maybe the best thing here is to have a simple ib_remote_sa client >module that just consults a list of servers and makes a normal SA >query. People working on multipath router support could then extend >that to specify a non-SA server and a new 4 path query type. > >A list something like: >2001::/64 2001:1 SA >2001::/64 2001:2 SA >2002::/64 2000:1 not-SA <-- On the local subnet.. new 4 PR format > >Set via netlink or sysfs.. > >To start with no ib_remote_sa server would be needed, just a boot >script to set the expected SA addresses. You could define the MAD >format for a new 4 PR query but not implement a server to handle it. Hmm... let me give this more thought. >Do you have any idea what the PathForward program expects to do here? not really... - Sean From sean.hefty at intel.com Mon Mar 19 21:48:53 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 19 Mar 2007 21:48:53 -0700 Subject: [ofa-general] [PATCH] use LIDs from REQ LRH for inter-subnet connections In-Reply-To: <20070319221617.GA2986@obsidianresearch.com> Message-ID: <000301c76aab$0fba54d0$8ffc070a@amr.corp.intel.com> Jason, When you get a chance, can you try out this patch? I tested that it worked for a local subnet connection by commenting out the hop_limit check. So, I'm interested to know if you run into any problems. If you do run into issues, madeye may be able to help. - Sean commit 1f2819006479bd72c6d79bdb80750d4766fad2f3 Author: Sean Hefty Date: Mon Mar 19 21:09:25 2007 -0800 To support inter-subnet connections, the passive endpoint needs to use its subnet local LIDs. The LIDs carried in the REQ are currently the LIDs from the active subnet (SLID and router LID). Replace LIDs in the REQ with subnet local LIDs from LRH. Signed-off-by: Sean Hefty diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 842cd0b..6eb4685 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -1379,6 +1379,18 @@ static int cm_req_handler(struct cm_work *work) cm_id_priv->id.service_id = req_msg->service_id; cm_id_priv->id.service_mask = __constant_cpu_to_be64(~0ULL); + /* + * Work-around for inter-subnet connections. We need to override LID + * data using LRH if path came from a remote subnet, since the remote + * subnet doesn't know our local LID values. + */ + if (req_msg->primary_hop_limit > 1) { + req_msg->primary_local_lid = cpu_to_be16(work->mad_recv_wc-> + wc->slid); + req_msg->primary_remote_lid = cpu_to_be16(work->mad_recv_wc-> + wc->dlid_path_bits); + } + cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]); ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av); if (ret) { From mst at dev.mellanox.co.il Mon Mar 19 22:52:56 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 20 Mar 2007 07:52:56 +0200 Subject: [ofa-general] Fwd: [ANNOUNCE] GIT 1.5.0.5 Message-ID: <20070320055256.GC5068@mellanox.co.il> FYI ----- Forwarded message from Junio C Hamano ----- Subject: [ANNOUNCE] GIT 1.5.0.5 Date: Mon, 19 Mar 2007 03:19:24 +0200 In-Reply-To: <7vlkic2xc8.fsf_-_ at assigned-by-dhcp.cox.net> (Junio C. Hamano'smessage of "Sun, 04 Mar 2007 19:17:11 -0800") References: <7vwt2ec32p.fsf at assigned-by-dhcp.cox.net><7vabz0j7td.fsf at assigned-by-dhcp.cox.net><7vlkic2xc8.fsf_-_ at assigned-by-dhcp.cox.net> From: Junio C Hamano The latest maintenance release GIT 1.5.0.5 is available at the usual places: http://www.kernel.org/pub/software/scm/git/ git-1.5.0.5.tar.{gz,bz2} (tarball) git-htmldocs-1.5.0.5.tar.{gz,bz2} (preformatted docs) git-manpages-1.5.0.5.tar.{gz,bz2} (preformatted docs) RPMS/$arch/git-*-1.5.0.5-1.$arch.rpm (RPM) I didn't send announcements for 1.5.0.4 for workload and time constraints, but Santi found and fixed a rather embarrasing regression in 1.5.0.4 soon after it was tagged anyway, so here it is. The changelog below is relative to 1.5.0.3. ---------------------------------------------------------------- Changes since v1.5.0.3 are as follows: Alexandre Julliard (2): git.el: Avoid appending a signoff line that is already present. git.el: Retrieve commit log information from .dotest directory. Avi Kivity (1): git-send-email: Document configuration options Brian Gernhardt (1): Fix diff-options references in git-diff and git-format-patch Frank Lichtenheld (1): cvsserver: asciidoc formatting changes J. Bruce Fields (7): glossary: fix overoptimistic automatic linking of defined terms user-manual: fix inconsistent example user-manual: fix inconsistent use of pull and merge user-manual: fix missing colon in git-show example user-manual: fix rendering of history diagrams user-manual: install user manual stylesheet with other web documents git-merge: warn when -m provided on a fast forward Jeff King (2): Documentation: s/seperator/separator/ fast-import: grow tree storage more aggressively Johannes Schindelin (2): Begin SubmittingPatches with a check list make t8001 work on Mac OS X again Junio C Hamano (4): git-commit: cd to top before showing the final stat git-checkout: fix "eval" used for merge labelling. GIT 1.5.0.4 GIT 1.5.0.5 Matthias Kestenholz (1): Adjust reflog filemode in shared repository Matthias Lederhofer (1): setup_git_directory_gently: fix off-by-one error Santi Béjar (1): git-merge: finish when git-read-tree fails Shawn O. Pearce (13): git-gui: Relocate the menu/transport menu code. git-gui: Add Reset to the Branch menu. git-gui: Don't create empty (same tree as parent) commits. git-gui: Remove unnecessary /dev/null redirection. fast-import: Avoid infinite loop after reset fast-import: Fail if a non-existant commit is used for merge git-gui: Make 'make' quieter by default Catch write_ref_sha1 failure in receive-pack git-gui: Allow committing empty merges git-gui: Revert "Don't modify CREDITS-FILE if it hasn't changed." git-gui: Revert "git-gui: Display all authors of git-gui." git-gui: Allow 'git gui version' outside of a repository Don't package the git-gui credits file anymore Theodore Ts'o (1): Add definition of to the main git man page. Yann Dirson (1): Clarify doc for git-config --unset-all. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ----- End forwarded message ----- -- MST From mst at dev.mellanox.co.il Mon Mar 19 23:41:30 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 20 Mar 2007 08:41:30 +0200 Subject: [ofa-general] [PATCH] IB/ipoib: fix stale connection handling In-Reply-To: References: Message-ID: <20070320064129.GD5068@mellanox.co.il> time_after_eq() replaces elements whose timer has not yet expired. Instead, one must use time_before_eq(). Noted by Pradeep Signed-off-by: Michael S. Tsirkin --- > I see a trivial bug in ipoib_cm_stale_task (). The time_after_eq() > replaces elements whose timer has not yet expired. Instead, one must use > time_before_eq(). Should I supply a patch for this? > > Pradeep > pradeep at us.ibm.com Good catch, thanks! diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 3484e8b..4675640 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -1095,7 +1095,7 @@ static void ipoib_cm_stale_task(struct work_struct *work) /* List if sorted by LRU, start from tail, * stop when we see a recently used entry */ p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list); - if (time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT)) + if (time_before_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT)) break; list_del_init(&p->list); spin_unlock_irqrestore(&priv->lock, flags); -- MST From mst at dev.mellanox.co.il Mon Mar 19 23:52:05 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 20 Mar 2007 08:52:05 +0200 Subject: [ofa-general] Re: [PATCH] ib_ipoib: fix race detaching from mcast group before attaching In-Reply-To: References: <000401c76a71$653fd7c0$c9d8180a@amr.corp.intel.com> Message-ID: <20070320065205.GF5068@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH] ib_ipoib: fix race detaching from mcast group before attaching > > > The attach QP only occurs in the context of the multicast callback thread. > > ib_sa_free_multicast() blocks until the callback returns, which ensures that the > > detach check/call (which is now done after ib_sa_free_multicast) cannot race > > with the attach call. > > OK, makes sense. Do we have confirmation that this fixes the original > problem? If so I'll queue this up... Yes, this appears to have fixed the race. -- MST From mst at dev.mellanox.co.il Tue Mar 20 00:25:39 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 20 Mar 2007 09:25:39 +0200 Subject: [ofa-general] Re: Re: Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: References: <349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net> <1172854585.21241.14.camel@stevo-desktop> <1172854873.21241.19.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net> <1172856154.21241.34.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039979EC@G3W0634.americas.hpqcorp.net> <20070312083523.GB4928@mellanox.co.il> Message-ID: <20070320072539.GG5068@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: Re: Re: Is ibv_get_async_event() a blocking call ? > > > Should we add a size parameter for event channels? > > And, we might need to add "event channel overrun" flag as well. > > > > If we want to address the problem in this way, we need to do this before > > libibverbs 1.1 freezes I think. > > I think since no one is hitting this in practice we can take our time > here and get it right. Right. Here's an idea how to do this without userspace changes: upon a completion event, if a completion event for the same CQ is outstanding, we can discard the new one as it does not add new information. Similiar tricks can be done with port state events etc. Now we can have a limit of number of outstanding events per context, and if some context violates this limit, queue new events in kernel without using potentially infinite memory. -- MST From chevchenkovic at gmail.com Tue Mar 20 00:52:22 2007 From: chevchenkovic at gmail.com (Chevchenkovic Chevchenkovic) Date: Tue, 20 Mar 2007 00:52:22 -0700 Subject: [ofa-general] [0] Abort: Not enough port is in active state Message-ID: <1c16cdf90703200052g729add3x2d767378697f94f2@mail.gmail.com> Hi, I am getting the error: [0] Abort: Not enough port is in active state while using mvapich2-0.9.8. I do not get similar error on using mvapich-0.9.7-mlx2.1.0 Any specific reasons for this behaviour? How do i get rid of this error? Awaiting your reply, -Chev From ogerlitz at voltaire.com Tue Mar 20 02:08:41 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 20 Mar 2007 11:08:41 +0200 Subject: [ofa-general] Re: Re: Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: <20070320072539.GG5068@mellanox.co.il> References: <349DCDA352EACF42A0C49FA6DCEA840396171A@G3W0634.americas.hpqcorp.net> <1172854585.21241.14.camel@stevo-desktop> <1172854873.21241.19.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net> <1172856154.21241.34.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039979EC@G3W0634.americas.hpqcorp.net> <20070312083523.GB4928@mellanox.co.il> <20070320072539.GG5068@mellanox.co.il> Message-ID: <45FFA499.9060001@voltaire.com> Michael S. Tsirkin wrote: > upon a completion event, if a completion event for the same > CQ is outstanding, we can discard the new one as it does not > add new information. I think you have replaced completion events with async events and event channel (queue) with completion queue. There is no issue with CQ getting bigger and bigger etc. Or. From vlad at lists.openfabrics.org Tue Mar 20 02:35:51 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Tue, 20 Mar 2007 02:35:51 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070320-0200 daily build status Message-ID: <20070320093551.723AAE60829@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.12 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.17 Passed on powerpc with linux-2.6.14 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Failed: From kliteyn at dev.mellanox.co.il Tue Mar 20 02:52:04 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 20 Mar 2007 11:52:04 +0200 Subject: [ofa-general] Re: [PATCHv2] osm: Clearing lid matrices before rebuilding them In-Reply-To: <20070319185531.GN19999@sashak.voltaire.com> References: <45FE7740.2080308@dev.mellanox.co.il> <1174313910.13051.25.camel@localhost> <45FEA350.7070605@dev.mellanox.co.il> <20070319185531.GN19999@sashak.voltaire.com> Message-ID: <45FFAEC4.1040300@dev.mellanox.co.il> Sasha Khapyorsky wrote: > On 16:50 Mon 19 Mar , Yevgeny Kliteynik wrote: >> In __osm_ucast_mgr_process_neighbor(), there is the following assertion: >> >> CL_ASSERT( hops <= osm_switch_get_hop_count( p_sw, lid_ho, >> port_num ) ); >> >> This assertion fails, since the hop count becomes inconsistent. > > This is not big problem IMO, we just need to not deal with non-existing > LIDs there (so __osm_ucast_mgr_process_neighbor() code should be > improved in this direction and this assertion removed). And the LFTs > generation code doesn't try to build entries for non-existing LIDs, so > "old" min hop vectors will be ignored there. > > But I think we could have a problem when the port (switch with master) > is reconnected at different location. Then old/invalid hop counts will > be counted again and if it "wins" we can get not expected routing paths. > So obviously hop matrix cleanup is simplest fix - Agreed. > >>>> I'm not sure about the trunk though. >>>> Sasha, >>>> Can you please check that you latest improvements to the >>>> routing don't have this problem? >>> With disconnecting switches should be similar behavior I guess. >> Right, I checked it - same problem. > > Interesting. This function is different in the master and doesn't scan > LIDs from 1 up to max anymore, instead it scans only switches existing at > the moment. > > Could you provide more details about the master? Do you able to see the > problem with just switch disconnections? What is the test case? I had this problem on some copy of master that wasn't updated. After updating it I can't see this problem happening again. But the hop count in not cleared there too, so even if I can't recreate this problem (or even if the new flow solves this particular bug), I think we do agree that it would be better to clear hop count anyway. -- Yevgeny > Sasha > From mst at dev.mellanox.co.il Tue Mar 20 02:58:00 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 20 Mar 2007 11:58:00 +0200 Subject: [ofa-general] Re: Re: Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: <45FFA499.9060001@voltaire.com> References: <1172854873.21241.19.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net> <1172856154.21241.34.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039979EC@G3W0634.americas.hpqcorp.net> <20070312083523.GB4928@mellanox.co.il> <20070320072539.GG5068@mellanox.co.il> <45FFA499.9060001@voltaire.com> Message-ID: <20070320095800.GA2834@mellanox.co.il> > Quoting Or Gerlitz : > Subject: Re: [ofa-general] Re: Re: Re: Is ibv_get_async_event() a blocking call ? > > Michael S. Tsirkin wrote: > >upon a completion event, if a completion event for the same > >CQ is outstanding, we can discard the new one as it does not > >add new information. > > I think you have replaced completion events with async events and event > channel (queue) with completion queue. There is no issue with CQ getting > bigger and bigger etc. I'm sorry I could not parse this. We are worried about event channels taking up to much memory. What I am pointing out is that we never need to have two events for the same CQ in the same channel. -- MST From ogerlitz at voltaire.com Tue Mar 20 03:01:59 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 20 Mar 2007 12:01:59 +0200 Subject: [ofa-general] Re: Re: Re: Is ibv_get_async_event() a blocking call ? In-Reply-To: <20070320095800.GA2834@mellanox.co.il> References: <1172854873.21241.19.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039617F9@G3W0634.americas.hpqcorp.net> <1172856154.21241.34.camel@stevo-desktop> <349DCDA352EACF42A0C49FA6DCEA84039979EC@G3W0634.americas.hpqcorp.net> <20070312083523.GB4928@mellanox.co.il> <20070320072539.GG5068@mellanox.co.il> <45FFA499.9060001@voltaire.com> <20070320095800.GA2834@mellanox.co.il> Message-ID: <45FFB117.7040300@voltaire.com> Michael S. Tsirkin wrote: >> Quoting Or Gerlitz : >> Subject: Re: [ofa-general] Re: Re: Re: Is ibv_get_async_event() a blocking call ? >> >> Michael S. Tsirkin wrote: >>> upon a completion event, if a completion event for the same >>> CQ is outstanding, we can discard the new one as it does not >>> add new information. >> I think you have replaced completion events with async events and event >> channel (queue) with completion queue. There is no issue with CQ getting >> bigger and bigger etc. > > I'm sorry I could not parse this. We are worried about event channels > taking up to much memory. What I am pointing out is that we never need > to have two events for the same CQ in the same channel. From your original post i could not understand this is what you want to say. Or. From halr at voltaire.com Tue Mar 20 05:24:38 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Mar 2007 07:24:38 -0500 Subject: [ofa-general] OpenFabrics wiki and escaping semicolons (in URLs) Message-ID: <1174393470.4684.456495.camel@hal.voltaire.com> Hi, Is the OpenFabrics wiki TikiWiki ? Any idea on the version ? Is there a way to escape a semicolon ? For example, if I create a URL with a semicolon as follows: [http://git.openfabrics.org/git/?p=~halr/management.git;a=blob_plain;f=README;hb=HEAD] it expands as: http://www.openfabrics.org//git/?p=~halr/management.git%3ba=blob_plain%3bf=README%3bhb=HEAD Thanks in advance for any light you can shed on this. -- Hal From rdreier at cisco.com Tue Mar 20 05:45:10 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 20 Mar 2007 05:45:10 -0700 Subject: [ofa-general] Re: [PATCH] IB/ipoib: fix stale connection handling References: <20070320064129.GD5068@mellanox.co.il> Message-ID: Thanks, queued for 2.6.21. From jsquyres at cisco.com Tue Mar 20 05:53:55 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 20 Mar 2007 08:53:55 -0400 Subject: [ofa-general] OpenFabrics wiki and escaping semicolons (in URLs) In-Reply-To: <1174393470.4684.456495.camel@hal.voltaire.com> References: <1174393470.4684.456495.camel@hal.voltaire.com> Message-ID: <9B45A621-598B-44C6-B94E-950222812973@cisco.com> On Mar 20, 2007, at 8:24 AM, Hal Rosenstock wrote: > Is the OpenFabrics wiki TikiWiki ? Any idea on the version ? Yes. It looks like it's v1.9.7 (found that by poking around on the server). > Is there a way to escape a semicolon ? > > For example, if I create a URL with a semicolon as follows: > [http://git.openfabrics.org/git/?p=~halr/ > management.git;a=blob_plain;f=README;hb=HEAD] > > it expands as: > http://www.openfabrics.org//git/?p=~halr/management.git% > 3ba=blob_plain%3bf=README%3bhb=HEAD Actually, it's not Tiki that's hosing you here -- it's the Apache redirects. If you use http://www.openfabrics.org/git/...., you'll be ok (Tiki will render it right and you'll land on the Right page). If you use http://git.openfabrics.org/git/..., Apache will detect the vhost "git.openfabrics.org" and redirect you to "www.openfabrics.org", and apparently munge special characters as a result. This is because http://git.openfabrics.org/ is currently being redirected to http://www.openfabrics.org/. Is this not the desired behavior? If not, we can change it -- would you prefer if http:// git.ofa.org/git/ went straight to the git browser instead of through www.ofa.org? (that sounds intuitive to me) -- Jeff Squyres Cisco Systems From ramachandra.kuchimanchi at qlogic.com Tue Mar 20 05:57:00 2007 From: ramachandra.kuchimanchi at qlogic.com (Kuchimanchi, Ramachandra) Date: Tue, 20 Mar 2007 07:57:00 -0500 Subject: [ofa-general] OFED-1.2-20070318-0600 build failure - qlvnictools References: <200703182217.PAA08838@eskimo.com> Message-ID: Mostyn, We have successfully installed OFED 1.2 Beta with qlvnictools on SLES10. We have not yet tested the specific build you mentioned (OFED-1.2-20070318-0600). We will try out the latest build and investigate any problems that we see. Regards, Ram -----Original Message----- From: general-bounces at lists.openfabrics.org on behalf of mrl at eskimo.com Sent: Mon 3/19/2007 3:47 AM To: general at lists.openfabrics.org Subject: [ofa-general] OFED-1.2-20070318-0600 build failure - qlvnictools Using SLES10 and OFED-1.2-20070318-0600, a build.sh/install.sh fails in userland in qlvnictools with: make[1]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/qlvnictools/ibvexdmto ols' cd . && /bin/sh /var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/qlvnictools/ibvexdmtools/config/m issing --run automake-1.9 --foreign configure.in:9: version mismatch. This is Automake 1.9.6, configure.in:9: but the definition used by this AM_INIT_AUTOMAKE configure.in:9: comes from Automake 1.9.2. You should recreate configure.in:9: aclocal.m4 with aclocal and run automake again. make[1]: *** [Makefile.in] Error 1 make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/qlvnictools/ibvexdmtoo ls' make: *** [qlvnictools] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.9704 (%install) Leaving out qlvnictools and doing a rpmbuild of userspace by hand and all else works. /usr/bin/automake --version automake (GNU automake) 1.9.6 Is there an easy (or not so easy) way around this, folks? Mostyn _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Mar 20 07:03:00 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Mar 2007 09:03:00 -0500 Subject: [ofa-general] {PATCH] Documentation/user_mad.txt: Clarify transaction ID usage Message-ID: <1174399318.4684.462485.camel@hal.voltaire.com> Documentation/user_mad.txt: Clarify transaction ID usage Signed-off-by: Hal Rosenstock diff --git a/Documentation/infiniband/user_mad.txt b/Documentation/infiniband/user_mad.txt index 750fe5e..1d2dbf1 100644 --- a/Documentation/infiniband/user_mad.txt +++ b/Documentation/infiniband/user_mad.txt @@ -91,6 +91,12 @@ Sending MADs if (ret != sizeof *mad + mad_length) perror("write"); +Transaction IDs + + Clients of the MAD layer can use the lower 32 bits of the + transaction ID field to track mad request/response pairs. The + upper 32 bits are reserved for use by the kernel ib_mad module. + Setting IsSM Capability Bit To set the IsSM capability bit for a port, simply open the From mst at dev.mellanox.co.il Tue Mar 20 06:14:27 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 20 Mar 2007 15:14:27 +0200 Subject: [ofa-general] Re: OpenFabrics wiki and escaping semicolons (in URLs) In-Reply-To: <9B45A621-598B-44C6-B94E-950222812973@cisco.com> References: <1174393470.4684.456495.camel@hal.voltaire.com> <9B45A621-598B-44C6-B94E-950222812973@cisco.com> Message-ID: <20070320131427.GA18162@mellanox.co.il> > This is because http://git.openfabrics.org/ is currently being > redirected to http://www.openfabrics.org/. Is this not the desired > behavior? If not, we can change it -- would you prefer if http:// > git.ofa.org/git/ went straight to the git browser instead of through > www.ofa.org? (that sounds intuitive to me) Yes, that sounds good to me too. Also, maybe we can get rid of /git/ somehow? It would be nice to take git URL: git://git.openfabrics.org/~mst/mstflint.git replace git:// http:// and get a working URL. -- MST From halr at voltaire.com Tue Mar 20 07:18:17 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Mar 2007 09:18:17 -0500 Subject: [ofa-general] RE: [RFC] host stack IB-to-IB router support In-Reply-To: <000201c76a9f$887e4770$8ffc070a@amr.corp.intel.com> References: <000201c76a9f$887e4770$8ffc070a@amr.corp.intel.com> Message-ID: <1174400288.4684.463519.camel@hal.voltaire.com> On Mon, 2007-03-19 at 22:26, Sean Hefty wrote: > >Hmm. If the goal is enable router development and experimentation then > >it would be best if the 'ib_remote_sa' server was in user space, delt > >with all 4 path records in one query and was centralized so it could > >be made to store routing topology and configuration to solve the > >multipath problems. Otherwise I think you are better to just talk > >directly to the SA. > > Unfortunately, at least opensm cannot respond to SA queries issued from a remote > subnet. I'm not sure how much work this would take to fix, or if other SAs have > this issue. Hal briefly looked at the problems, FWIW, I'll be looking some more at these again. > and I do plan on trying to fix > them. But that still leaves trying to find the remote SA, Yes, that is one primary obstacle to solve one way or the other that seems like a pretty basic need. > handling SA failover, This would be a bonus rather than an initial requirement (for experimentation in connecting more than one IB subnet) IMO. -- Hal > etc. This is why I'm bouncing queries through an intermediary. > > I see two separate pieces that are needed: an interface to query for the path > info, and a mechanism to provide it. At least the former is needed in the > kernel, and I can at least envision that the implementation of this piece could > evolve into some final solution. But at this point, the query response > mechanism seems like throw-away code. > > >Maybe the best thing here is to have a simple ib_remote_sa client > >module that just consults a list of servers and makes a normal SA > >query. People working on multipath router support could then extend > >that to specify a non-SA server and a new 4 path query type. > > > >A list something like: > >2001::/64 2001:1 SA > >2001::/64 2001:2 SA > >2002::/64 2000:1 not-SA <-- On the local subnet.. new 4 PR format > > > >Set via netlink or sysfs.. > > > >To start with no ib_remote_sa server would be needed, just a boot > >script to set the expected SA addresses. You could define the MAD > >format for a new 4 PR query but not implement a server to handle it. > > Hmm... let me give this more thought. > > >Do you have any idea what the PathForward program expects to do here? > > not really... > > - Sean From jsquyres at cisco.com Tue Mar 20 06:36:12 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 20 Mar 2007 09:36:12 -0400 Subject: [ofa-general] Re: OpenFabrics wiki and escaping semicolons (in URLs) In-Reply-To: <20070320131427.GA18162@mellanox.co.il> References: <1174393470.4684.456495.camel@hal.voltaire.com> <9B45A621-598B-44C6-B94E-950222812973@cisco.com> <20070320131427.GA18162@mellanox.co.il> Message-ID: <99DE1E48-1BE6-4C87-B38E-20D191F3DFF6@cisco.com> On Mar 20, 2007, at 9:14 AM, Michael S. Tsirkin wrote: >> This is because http://git.openfabrics.org/ is currently being >> redirected to http://www.openfabrics.org/. Is this not the desired >> behavior? If not, we can change it -- would you prefer if http:// >> git.ofa.org/git/ went straight to the git browser instead of through >> www.ofa.org? (that sounds intuitive to me) > > Yes, that sounds good to me too. Ok. Please keep Michael Lee CC'ed on these e-mails because he's the one who does the actual work. Michael Lee -- can you do what is described above? Thanks! > Also, maybe we can get rid of /git/ somehow? > It would be nice to take git URL: git://git.openfabrics.org/~mst/ > mstflint.git > replace git:// http:// and get a working URL. I'll have to defer to ML on this -- it *sounds* possible if http:// www.ofa... is going to be different from http://git.ofa... -- Jeff Squyres Cisco Systems From mst at dev.mellanox.co.il Tue Mar 20 06:39:32 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 20 Mar 2007 15:39:32 +0200 Subject: [ofa-general] [PATCH for-2.6.21] mthca: QP reset race fixup Message-ID: <20070320133932.GC18162@mellanox.co.il> This fixes openfabrics bugzilla 394: - Use common EQ for command interface and async events - Clean CQ after moving QP to reset This also fixes a potential crash in ipoib cm: - sync with completion event ISR after QP is reset to prevent ULP from getting and using QP pointer and/or WRID after they are freed. Signed-off-by: Michael S. Tsirkin --- Roland, here's a patch that OFED has - it has been through testing here and seems to work fine. Can you queue this for 2.6.21 please? We can rip the lines that sync with MTHCA_EQ_COMP out if you think the issue needs to be dealt with in some other way - and I agree this is only good for ULPs that do all their polling inside the ISR, but at least this covers all in-kernel code. diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c index efd79ef..e3c774b 100644 --- a/drivers/infiniband/hw/mthca/mthca_cq.c +++ b/drivers/infiniband/hw/mthca/mthca_cq.c @@ -279,15 +279,13 @@ static inline int is_recv_cqe(struct mthca_cqe *cqe) return !(cqe->is_send & 0x80); } -void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, - struct mthca_srq *srq) +void __mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, + struct mthca_srq *srq) { struct mthca_cqe *cqe; u32 prod_index; int nfreed = 0; - spin_lock_irq(&cq->lock); - /* * First we need to find the current producer index, so we * know where to start cleaning from. It doesn't matter if HW @@ -325,7 +323,13 @@ void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, cq->cons_index += nfreed; update_cons_index(dev, cq, nfreed); } +} +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, + struct mthca_srq *srq) +{ + spin_lock_irq(&cq->lock); + __mthca_cq_clean(dev, cq, qpn, srq); spin_unlock_irq(&cq->lock); } diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h index b7e42ef..78a092d 100644 --- a/drivers/infiniband/hw/mthca/mthca_dev.h +++ b/drivers/infiniband/hw/mthca/mthca_dev.h @@ -93,9 +93,8 @@ enum { }; enum { - MTHCA_EQ_CMD, - MTHCA_EQ_ASYNC, MTHCA_EQ_COMP, + MTHCA_EQ_ASYNC, MTHCA_NUM_EQ }; @@ -505,6 +504,8 @@ void mthca_free_cq(struct mthca_dev *dev, void mthca_cq_completion(struct mthca_dev *dev, u32 cqn); void mthca_cq_event(struct mthca_dev *dev, u32 cqn, enum ib_event_type event_type); +void __mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, + struct mthca_srq *srq); void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, struct mthca_srq *srq); void mthca_cq_resize_copy_cqes(struct mthca_cq *cq); diff --git a/drivers/infiniband/hw/mthca/mthca_eq.c b/drivers/infiniband/hw/mthca/mthca_eq.c index 8ec9fa1..f7a41b8 100644 --- a/drivers/infiniband/hw/mthca/mthca_eq.c +++ b/drivers/infiniband/hw/mthca/mthca_eq.c @@ -110,11 +110,11 @@ enum { (1ULL << MTHCA_EVENT_TYPE_WQ_ACCESS_ERROR) | \ (1ULL << MTHCA_EVENT_TYPE_LOCAL_CATAS_ERROR) | \ (1ULL << MTHCA_EVENT_TYPE_PORT_CHANGE) | \ - (1ULL << MTHCA_EVENT_TYPE_ECC_DETECT)) + (1ULL << MTHCA_EVENT_TYPE_ECC_DETECT)) | \ + (1ULL << MTHCA_EVENT_TYPE_CMD) #define MTHCA_SRQ_EVENT_MASK ((1ULL << MTHCA_EVENT_TYPE_SRQ_CATAS_ERROR) | \ (1ULL << MTHCA_EVENT_TYPE_SRQ_QP_LAST_WQE) | \ (1ULL << MTHCA_EVENT_TYPE_SRQ_LIMIT)) -#define MTHCA_CMD_EVENT_MASK (1ULL << MTHCA_EVENT_TYPE_CMD) #define MTHCA_EQ_DB_INC_CI (1 << 24) #define MTHCA_EQ_DB_REQ_NOT (2 << 24) @@ -863,23 +863,17 @@ int mthca_init_eq_table(struct mthca_dev *dev) if (err) goto err_out_unmap; - err = mthca_create_eq(dev, MTHCA_NUM_ASYNC_EQE + MTHCA_NUM_SPARE_EQE, + err = mthca_create_eq(dev, MTHCA_NUM_CMD_EQE + MTHCA_NUM_ASYNC_EQE + + MTHCA_NUM_SPARE_EQE, (dev->mthca_flags & MTHCA_FLAG_MSI_X) ? 129 : intr, &dev->eq_table.eq[MTHCA_EQ_ASYNC]); if (err) goto err_out_comp; - err = mthca_create_eq(dev, MTHCA_NUM_CMD_EQE + MTHCA_NUM_SPARE_EQE, - (dev->mthca_flags & MTHCA_FLAG_MSI_X) ? 130 : intr, - &dev->eq_table.eq[MTHCA_EQ_CMD]); - if (err) - goto err_out_async; - if (dev->mthca_flags & MTHCA_FLAG_MSI_X) { static const char *eq_name[] = { [MTHCA_EQ_COMP] = DRV_NAME " (comp)", [MTHCA_EQ_ASYNC] = DRV_NAME " (async)", - [MTHCA_EQ_CMD] = DRV_NAME " (cmd)" }; for (i = 0; i < MTHCA_NUM_EQ; ++i) { @@ -889,7 +883,7 @@ int mthca_init_eq_table(struct mthca_dev *dev) mthca_tavor_msi_x_interrupt, 0, eq_name[i], dev->eq_table.eq + i); if (err) - goto err_out_cmd; + goto err_out_async; dev->eq_table.eq[i].have_irq = 1; } } else { @@ -899,7 +893,7 @@ int mthca_init_eq_table(struct mthca_dev *dev) mthca_tavor_interrupt, IRQF_SHARED, DRV_NAME, dev); if (err) - goto err_out_cmd; + goto err_out_async; dev->eq_table.have_irq = 1; } @@ -912,15 +906,6 @@ int mthca_init_eq_table(struct mthca_dev *dev) mthca_warn(dev, "MAP_EQ for async EQ %d returned status 0x%02x\n", dev->eq_table.eq[MTHCA_EQ_ASYNC].eqn, status); - err = mthca_MAP_EQ(dev, MTHCA_CMD_EVENT_MASK, - 0, dev->eq_table.eq[MTHCA_EQ_CMD].eqn, &status); - if (err) - mthca_warn(dev, "MAP_EQ for cmd EQ %d failed (%d)\n", - dev->eq_table.eq[MTHCA_EQ_CMD].eqn, err); - if (status) - mthca_warn(dev, "MAP_EQ for cmd EQ %d returned status 0x%02x\n", - dev->eq_table.eq[MTHCA_EQ_CMD].eqn, status); - for (i = 0; i < MTHCA_NUM_EQ; ++i) if (mthca_is_memfree(dev)) arbel_eq_req_not(dev, dev->eq_table.eq[i].eqn_mask); @@ -929,11 +914,8 @@ int mthca_init_eq_table(struct mthca_dev *dev) return 0; -err_out_cmd: - mthca_free_irqs(dev); - mthca_free_eq(dev, &dev->eq_table.eq[MTHCA_EQ_CMD]); - err_out_async: + mthca_free_irqs(dev); mthca_free_eq(dev, &dev->eq_table.eq[MTHCA_EQ_ASYNC]); err_out_comp: @@ -956,8 +938,6 @@ void mthca_cleanup_eq_table(struct mthca_dev *dev) mthca_MAP_EQ(dev, async_mask(dev), 1, dev->eq_table.eq[MTHCA_EQ_ASYNC].eqn, &status); - mthca_MAP_EQ(dev, MTHCA_CMD_EVENT_MASK, - 1, dev->eq_table.eq[MTHCA_EQ_CMD].eqn, &status); for (i = 0; i < MTHCA_NUM_EQ; ++i) mthca_free_eq(dev, &dev->eq_table.eq[i]); diff --git a/drivers/infiniband/hw/mthca/mthca_main.c b/drivers/infiniband/hw/mthca/mthca_main.c index 0d9b7d0..5bfef62 100644 --- a/drivers/infiniband/hw/mthca/mthca_main.c +++ b/drivers/infiniband/hw/mthca/mthca_main.c @@ -835,7 +835,7 @@ static int mthca_setup_hca(struct mthca_dev *dev) if (err || status) { mthca_err(dev, "NOP command failed to generate interrupt (IRQ %d), aborting.\n", dev->mthca_flags & MTHCA_FLAG_MSI_X ? - dev->eq_table.eq[MTHCA_EQ_CMD].msi_x_vector : + dev->eq_table.eq[MTHCA_EQ_ASYNC].msi_x_vector : dev->pdev->irq); if (dev->mthca_flags & (MTHCA_FLAG_MSI | MTHCA_FLAG_MSI_X)) mthca_err(dev, "Try again with MSI/MSI-X disabled.\n"); @@ -976,12 +976,11 @@ static void mthca_release_regions(struct pci_dev *pdev, static int mthca_enable_msi_x(struct mthca_dev *mdev) { - struct msix_entry entries[3]; + struct msix_entry entries[2]; int err; entries[0].entry = 0; entries[1].entry = 1; - entries[2].entry = 2; err = pci_enable_msix(mdev->pdev, entries, ARRAY_SIZE(entries)); if (err) { @@ -993,7 +992,6 @@ static int mthca_enable_msi_x(struct mthca_dev *mdev) mdev->eq_table.eq[MTHCA_EQ_COMP ].msi_x_vector = entries[0].vector; mdev->eq_table.eq[MTHCA_EQ_ASYNC].msi_x_vector = entries[1].vector; - mdev->eq_table.eq[MTHCA_EQ_CMD ].msi_x_vector = entries[2].vector; return 0; } diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 71dc84b..3d6591b 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -37,6 +37,7 @@ #include #include +#include #include @@ -864,6 +865,11 @@ int mthca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask, if (qp->ibqp.send_cq != qp->ibqp.recv_cq) mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, NULL); + if (dev->mthca_flags & MTHCA_FLAG_MSI_X) + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_COMP].msi_x_vector); + else + synchronize_irq(dev->pdev->irq); + mthca_wq_reset(&qp->sq); qp->sq.last = get_send_wqe(qp, qp->sq.max - 1); @@ -1390,6 +1396,10 @@ void mthca_free_qp(struct mthca_dev *dev, struct mthca_cq *send_cq; struct mthca_cq *recv_cq; + if (qp->state != IB_QPS_RESET) + mthca_MODIFY_QP(dev, qp->state, IB_QPS_RESET, qp->qpn, 0, + NULL, 0, &status); + send_cq = to_mcq(qp->ibqp.send_cq); recv_cq = to_mcq(qp->ibqp.recv_cq); @@ -1403,15 +1413,25 @@ void mthca_free_qp(struct mthca_dev *dev, mthca_array_clear(&dev->qp_table.qp, qp->qpn & (dev->limits.num_qps - 1)); --qp->refcount; + + if (!qp->ibqp.uobject) { + __mthca_cq_clean(dev, send_cq, qp->qpn, + qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); + if (send_cq != recv_cq) + __mthca_cq_clean(dev, recv_cq, qp->qpn, + qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); + } + spin_unlock(&dev->qp_table.lock); mthca_unlock_cqs(send_cq, recv_cq); - wait_event(qp->wait, !get_qp_refcount(dev, qp)); + if (dev->mthca_flags & MTHCA_FLAG_MSI_X) + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_COMP].msi_x_vector); + else + synchronize_irq(dev->pdev->irq); - if (qp->state != IB_QPS_RESET) - mthca_MODIFY_QP(dev, qp->state, IB_QPS_RESET, qp->qpn, 0, - NULL, 0, &status); + wait_event(qp->wait, !get_qp_refcount(dev, qp)); /* * If this is a userspace QP, the buffers, MR, CQs and so on @@ -1419,12 +1439,6 @@ void mthca_free_qp(struct mthca_dev *dev, * unref the mem-free tables and free the QPN in our table. */ if (!qp->ibqp.uobject) { - mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, - qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); - if (qp->ibqp.send_cq != qp->ibqp.recv_cq) - mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, - qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); - mthca_free_memfree(dev, qp); mthca_free_wqe_buf(dev, qp); } -- MST From dotanb at dev.mellanox.co.il Tue Mar 20 06:41:15 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Tue, 20 Mar 2007 15:41:15 +0200 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count to completion event channels In-Reply-To: References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> Message-ID: <45FFE47B.8000408@dev.mellanox.co.il> Roland Dreier wrote: > Thanks. However, it seems racy to me to use the mutex as you do here. > I think if a consumer calls ibv_create_cq() and ibv_destroy_comp_channel() > simultaneously (for the same completion channel obviously), exactly > one of the two calls should succeed. As it stands now, you might have > a comp channel that starts with a refcnt of 0, have the test in > ibv_destroy_comp_channel() succeed, then have ibv_create_cq() use the > completion channel and bump the refcnt too late and also return success. > you are right - this is why i thought in the first place why this check need to be done in kernel level. is there is any difference between check this check and the check that being done when destroying a CQ? why do we check the CQ in kernel level and the completion channel in user level? i think that need to do this check in kernel level for a cleaner code. what do you think? thanks Dotan From rdreier at cisco.com Tue Mar 20 07:14:51 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 20 Mar 2007 07:14:51 -0700 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count to completion event channels In-Reply-To: <45FFE47B.8000408@dev.mellanox.co.il> (Dotan Barak's message of "Tue, 20 Mar 2007 15:41:15 +0200") References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> <45FFE47B.8000408@dev.mellanox.co.il> Message-ID: > you are right - this is why i thought in the first place why this > check need to be done in kernel level. > > is there is any difference between check this check and the check that > being done when destroying a CQ? > why do we check the CQ in kernel level and the completion channel in > user level? > > i think that need to do this check in kernel level for a cleaner code. I just don't see how doing this in the kernel would help. By the time the kernel knows that userspace is destroying a completion channel, it's too late, since the uverbs code in the kernel has no way of failing out a call to close(2) on the completiong channel's fd. However feel free to prove me wrong by posting a patch that works. It seems that userspace is the only place that has a chance of making this work. We just need to get the locking correct, and at first glance it looks possible to me. If you don't see a way to do it then I'll work on it in the next day or so. - R. From dledford at redhat.com Tue Mar 20 07:44:57 2007 From: dledford at redhat.com (Doug Ledford) Date: Tue, 20 Mar 2007 10:44:57 -0400 Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary In-Reply-To: <20070319152056.GB24225@mellanox.co.il> References: <20070318115143.GH2862@mellanox.co.il> <1174232565.4673.111.camel@athlon-x2.xsintricity.com> <1174251033.4673.133.camel@athlon-x2.xsintricity.com> <20070318211118.GH11078@mellanox.co.il> <20070318215503.GA5740@obsidianresearch.com> <20070318221039.GL11078@mellanox.co.il> <1174256789.4673.163.camel@athlon-x2.xsintricity.com> <20070318223400.GN11078@mellanox.co.il> <1174312787.4673.201.camel@athlon-x2.xsintricity.com> <20070319152056.GB24225@mellanox.co.il> Message-ID: <1174401897.4673.245.camel@athlon-x2.xsintricity.com> On Mon, 2007-03-19 at 17:20 +0200, Michael S. Tsirkin wrote: > > The kernel code will > > always match the openib package. When I update one, I update the other. > > So, you always know the base by looking at the openib package version, > > and then you can see any additional patches I've applied to user space > > by looking at the openib.spec file, and you can see additional patches > > to the kernel by looking for the main OFED patch (in RHEL4, it's patch > > 2700, in RHEL5 it's patch 2600), and immediately after or before the > > main update patch will be the individual change patches that we've > > applied. > > You lost me here. Example? From the RHEL4 kernel spec file: # OpenIB Infiniband patches Patch2700: linux-2.6.9-OFED-1.1.patch Patch2701: linux-2.6.9-spinlock-define.patch Patch2702: linux-2.6.9-if_infiniband.patch Patch2703: linux-2.6.9-gfp_t-typedef.patch Patch2704: linux-2.6.9-empty-debugfs.patch Patch2705: linux-2.6.9-pci_find_next_cap.patch Patch2706: linux-2.6.9-wait_for_completion_timeout.patch Patch2707: linux-2.6.9-OpenIB-build.patch Patch2708: linux-2.6.9-OpenIB-read_mostly.patch Patch2709: linux-2.6.9-OpenIB-flush_core_git.patch Patch2710: linux-2.6.9-OpenIB-flush_users.patch Patch2711: linux-2.6.9-OpenIB-mad_rmpp_requester_retry.patch Patch2712: linux-2.6.9-OpenIB-srp_avoid_null_deref.patch Patch2713: linux-2.6.9-OpenIB-4g-dma.patch Patch2714: linux-2.6.9-scsi_scan_target-export.patch Patch2715: linux-2.6.9-mutex-backport.patch # Uncertain patches Patch2720: linux-2.6.9-OpenIB-rdma_misc.patch Patch2721: linux-2.6.9-OpenIB-sa_pack_unpack.patch To see which of those patches are actually applied to the kernel, search for %patch2700 and look from there. Currently, that looks like this: # OpenIB Infiniband support %patch2700 -p1 %patch2701 -p1 %patch2702 -p1 %patch2703 -p1 %patch2704 -p1 %patch2705 -p1 %patch2706 -p1 %patch2707 -p1 %patch2708 -p1 %patch2709 -p1 %patch2710 -p1 %patch2711 -p1 %patch2712 -p1 %patch2713 -p1 %patch2714 -p1 %patch2715 -p1 # Don't apply these two for now #%patch2720 -p1 #%patch2721 -p1 So, you can see what the patches are, and which are or aren't applied. The easiest way to see if something you are interested in is applied would be to do a rpmbuild --bp kernel-2.6.spec from where ever you installed the kernel src.rpm files into, then diff just the drivers/infiniband tree of that kernel repo against the one that has all the fixes you are interested in already applied and examine the difference. Of course, be aware that since we patch some things in the core kernel that you guys patch up in your code instead, there will be a non-trivial amount of noise in the diff because of that. From the RHEL5 kernel spec file: # Infiniband driver Patch2600: linux-2.6-openib-sdp.patch Patch2601: linux-2.6-openib-ehca.patch Patch2602: linux-2.6-openib-ofed-1_1-update.patch > The OFED support page > https://wiki.openfabrics.org/tiki-index.php?page=OFED+Support > mentions patches for two critical bugs in kernel code: > > IPoIB kernel oops, and mthca off-by-one. > > Are these two applied? How to find out? Neither are applied to RHEL5, but that's not surprising given that it's out the door already which means it froze long ago. For RHEL4.5, I *might* be able to get them in. We'll see. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From robert.j.woodruff at intel.com Tue Mar 20 08:54:33 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Tue, 20 Mar 2007 08:54:33 -0700 Subject: [ofa-general] OFA Sonoma hour-by-hour agenda Message-ID: Jeff wrote, >Bob- >Can you please distribute the attached to the OFA developer community? >If anyone would like to provide feedback or recommendations, they should >send the comments to me. >Thank you, >Jeff Hi all, Attached is a draft of the agenda for the OFA workshop in Sonoma. The organizers are looking for feedback so please review. The one thing I see is that we should probably have an OFED 1.3 planning session. Other ideas or comments ? woody -------------- next part -------------- A non-text attachment was scrubbed... Name: Sonoma Agenda 3-19-07.xls Type: application/vnd.ms-excel Size: 49152 bytes Desc: Sonoma Agenda 3-19-07.xls URL: From mst at dev.mellanox.co.il Tue Mar 20 09:02:17 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 20 Mar 2007 18:02:17 +0200 Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070319232043.GA23359@ms2.inr.ac.ru> References: <20070318155532.GG7958@mellanox.co.il> <20070318191238.GA20518@ms2.inr.ac.ru> <20070318195355.GB11078@mellanox.co.il> <20070318201826.GB27004@ms2.inr.ac.ru> <20070319093632.GB8386@mellanox.co.il> <20070319120534.GA28187@ms2.inr.ac.ru> <20070319121248.GD18497@mellanox.co.il> <20070319125919.GA4239@ms2.inr.ac.ru> <20070319151336.GA24225@mellanox.co.il> <20070319232043.GA23359@ms2.inr.ac.ru> Message-ID: <20070320160217.GC31495@mellanox.co.il> > Quoting Alexey Kuznetsov : > Subject: Re: dst_ifdown breaks infiniband? > > Hello! > > > This might work. Could you post a patch to better show what you mean to do? > > Here it is. > > ->neigh_destructor() is killed (not used), replaced with ->neigh_cleanup(), > which is called when neighbor entry goes to dead state. At this point > everything is still valid: neigh->dev, neigh->parms etc. > > The device should guarantee that dead neighbor entries (neigh->dead != 0) > do not get private part initialized, otherwise nobody will cleanup it. OK, I stress-tested this for about 9 hours - apparently this resolves the issues I was seeing both with hotplug device unregister and module removal. This is an old bug, but somehow it did not trigger on older kernels - some code restructuring in infiniband is probably the reason - so from that POV it's a regression in 2.6.21. So now several people are experiencing these crashes. David, Alexey, what do you think about this patch? Is it right? Could this patch be considered for 2.6.21? Acked-by: Michael S. Tsirkin > > I think this is enough for ipoib which is the only user of this thing. > Initialization private part of neighbor entries happens in ipib > start_xmit routine, which is not reached when device is down. > But it would be better to add explicit test for neigh->dead > in any case. Additionally, ip over infiniband actually tests a separate flag IPOIB_FLAG_ADMIN_UP before looking at an skb. This flag is cleared before the device goes down. Taken together this should be sufficient I think. -- MST From swise at opengridcomputing.com Tue Mar 20 09:08:17 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 20 Mar 2007 11:08:17 -0500 Subject: [ofa-general] OFA Sonoma hour-by-hour agenda In-Reply-To: References: Message-ID: <1174406897.20982.26.camel@stevo-desktop> Once again: I will not be attending. Please remove me as speaker. Thanks, Steve. On Tue, 2007-03-20 at 08:54 -0700, Woodruff, Robert J wrote: > Jeff wrote, > >Bob- > >Can you please distribute the attached to the OFA developer community? > > >If anyone would like to provide feedback or recommendations, they > should > >send the comments to me. > > >Thank you, > >Jeff > > Hi all, > > Attached is a draft of the agenda for the OFA workshop in > Sonoma. The organizers are looking for feedback so please > review. The one thing I see is that we should probably have > an OFED 1.3 planning session. Other ideas or comments ? > > woody > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mshefty at ichips.intel.com Tue Mar 20 09:25:24 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Mar 2007 09:25:24 -0700 Subject: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <45F879B6.2040904@ichips.intel.com> References: <000001c765c0$7d3bdd70$8698070a@amr.corp.intel.com> <20070314051158.GA7997@mellanox.co.il> <45F879B6.2040904@ichips.intel.com> Message-ID: <46000AF4.90801@ichips.intel.com> Sean Hefty wrote: >>> Vlad, please pull from: >>> >>> git://git.openfabrics.org/~shefty/ofed_1_2.git ofed_1_2 >>> >>> This should add some necessary fixes to the OFED code: >>> >>> RDMA/ucma: avoid sending reject if backlog is full >>> RDMA/cma: Request reversible paths only >>> IB/cm: fix MRA timeout patch Vlad, can you please pull these patches from my git tree into OFED 1.2? - Sean From vlad at dev.mellanox.co.il Tue Mar 20 09:43:33 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 20 Mar 2007 18:43:33 +0200 Subject: [ofa-general] Re: [PATCH ofed_1_2] iw_cxgb3: Reserve the pages of dma coherent memory for older kernels. In-Reply-To: <1174331631.8747.55.camel@stevo-desktop> References: <1174331631.8747.55.camel@stevo-desktop> Message-ID: <46000F35.1060305@dev.mellanox.co.il> Steve Wise wrote: > Hey Vlad, > > This change, along with a libcxgb3 fix resolves bug 353. > > You can pull this ofed_1_2 change directly from: > > git://staging.openfabrics.org/~swise/ofed_1_2 ofed_1_2 > > Thanks, > > Steve. > > Done, Regards, Vladimir From eeb at bartonsoftware.com Tue Mar 20 10:59:45 2007 From: eeb at bartonsoftware.com (Eric Barton) Date: Tue, 20 Mar 2007 17:59:45 -0000 Subject: [ofa-general] Help with an MTHCA "catastrophe" Message-ID: <029101c76b19$8af42900$0281a8c0@ebpc> The following is console output immediately before a panic on a system running lustre with OFED 1.1. How can I find out what it means? 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: Catastrophic error detected: internal error 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[00]: 001d79f4 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[01]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[02]: 00198538 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[03]: 00136038 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[04]: 00207730 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[05]: 001d79cc 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[06]: 0023cf24 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[07]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[08]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[09]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[0a]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[0b]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[0c]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[0d]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[0e]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[0f]: 00000000 ...shortly before it happens, the lustre/lnet OFED driver receives a number of what I believe to be duplicate SEND completion events. It seems quite sporadic, and doesn't appear to track hardware. More info at https://bugzilla.lustre.org/show_bug.cgi?id=11381 Cheers, Eric From mike.heffner at evergrid.com Tue Mar 20 12:49:00 2007 From: mike.heffner at evergrid.com (Mike Heffner) Date: Tue, 20 Mar 2007 14:49:00 -0500 Subject: [ofa-general] Problem with dropped CQE's on RDMA CM channel Message-ID: <46003AAC.3090505@evergrid.com> Hi, I'm writing a program that allows two clients to communicate over an RC channel that is connected using the RDMA CM. To negotiate a clean shutdown of the channel both clients send IBV_WR_SEND's with the IBV_SEND_SIGNALED bit set. The connection is only rdma_disconnect()'d when a client receives the CQE from its signaled send and the CQE from the peer's incoming IBV_WR_SEND (ie., when the peer receives the send). This ensures that both clients have conceptually called "close()" on both ends of the connection before the connection is torn down and the QP moved into the error state with rdma_disconnect(). The problem I'm seeing is that occasionally one peer will not receive both CQE's while the other peer has successfully received both and has called rdma_disconnect(). What's odd is that one client may not receive the local CQE for the "signaled" IBV_WR_SEND send even though the peer has received the client's send. Since one peer does not receive both CQE events, the connection remains in an open state and does not get cleaned up appropriately. Can you call rdma_disconnect() immediately after posting sends on the QP? I don't see any CQE's come back with errors but they appear to "disappear" and never get signaled on one peer side. Are there any potential race issues to avoid here (it only happens about one out of every 100 connections)? Any assistance would be greatly appreciated. Thanks, Mike -- Mike Heffner EverGrid Software Blacksburg, VA USA Voice: (540) 443-3500 x603 From mike.heffner at evergrid.com Tue Mar 20 12:52:56 2007 From: mike.heffner at evergrid.com (Mike Heffner) Date: Tue, 20 Mar 2007 14:52:56 -0500 Subject: [ofa-general] Problem with dropped CQE's on RDMA CM channel In-Reply-To: <46003AAC.3090505@evergrid.com> References: <46003AAC.3090505@evergrid.com> Message-ID: <46003B98.4040500@evergrid.com> Forgot to mention that this is with OFED 1.1 on a SUSE 10 box: Linux amd13 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux with a "Mellanox Technologies MT23108 InfiniHost (rev a1)" PCI-X card with firmware version 3.5.0. Mike Heffner wrote: > Hi, > > I'm writing a program that allows two clients to communicate over an RC > channel that is connected using the RDMA CM. To negotiate a clean > shutdown of the channel both clients send IBV_WR_SEND's with the > IBV_SEND_SIGNALED bit set. The connection is only rdma_disconnect()'d > when a client receives the CQE from its signaled send and the CQE from > the peer's incoming IBV_WR_SEND (ie., when the peer receives the send). > This ensures that both clients have conceptually called "close()" on > both ends of the connection before the connection is torn down and the > QP moved into the error state with rdma_disconnect(). > > The problem I'm seeing is that occasionally one peer will not receive > both CQE's while the other peer has successfully received both and has > called rdma_disconnect(). What's odd is that one client may not receive > the local CQE for the "signaled" IBV_WR_SEND send even though the peer > has received the client's send. Since one peer does not receive both CQE > events, the connection remains in an open state and does not get cleaned > up appropriately. > > Can you call rdma_disconnect() immediately after posting sends on the > QP? I don't see any CQE's come back with errors but they appear to > "disappear" and never get signaled on one peer side. Are there any > potential race issues to avoid here (it only happens about one out of > every 100 connections)? > > Any assistance would be greatly appreciated. > > > Thanks, > > Mike > -- Mike Heffner EverGrid Software Blacksburg, VA USA Voice: (540) 443-3500 x603 From vlad at dev.mellanox.co.il Tue Mar 20 11:57:05 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 20 Mar 2007 20:57:05 +0200 Subject: [ewg] Re: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <46000AF4.90801@ichips.intel.com> References: <000001c765c0$7d3bdd70$8698070a@amr.corp.intel.com> <20070314051158.GA7997@mellanox.co.il> <45F879B6.2040904@ichips.intel.com> <46000AF4.90801@ichips.intel.com> Message-ID: <46002E81.1040509@dev.mellanox.co.il> Sean Hefty wrote: > Sean Hefty wrote: >>>> Vlad, please pull from: >>>> >>>> git://git.openfabrics.org/~shefty/ofed_1_2.git ofed_1_2 >>>> >>>> This should add some necessary fixes to the OFED code: >>>> >>>> RDMA/ucma: avoid sending reject if backlog is full >>>> RDMA/cma: Request reversible paths only >>>> IB/cm: fix MRA timeout patch > > Vlad, can you please pull these patches from my git tree into OFED 1.2? > > - Sean > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > Done. Regards, Vladimir From mshefty at ichips.intel.com Tue Mar 20 12:02:06 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Mar 2007 12:02:06 -0700 Subject: [ofa-general] Problem with dropped CQE's on RDMA CM channel In-Reply-To: <46003AAC.3090505@evergrid.com> References: <46003AAC.3090505@evergrid.com> Message-ID: <46002FAE.60405@ichips.intel.com> > Can you call rdma_disconnect() immediately after posting sends on the > QP? I don't see any CQE's come back with errors but they appear to > "disappear" and never get signaled on one peer side. Are there any > potential race issues to avoid here (it only happens about one out of > every 100 connections)? rdma_disconnect() will immediately transition the QP into the error state, which can affect queued send operations. I think the situation that you're describing could happen if the side that received the send transitioned the QP into the error state, but the ACK sent back to the sender was lost. Can anyone confirm this? You could try having one side initiate the rdma_disconnect, with the other side waiting for the disconnect event before calling rdma_disconnect itself. - Sean From rdreier at cisco.com Tue Mar 20 12:41:33 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 20 Mar 2007 12:41:33 -0700 Subject: [ofa-general] Help with an MTHCA "catastrophe" In-Reply-To: <029101c76b19$8af42900$0281a8c0@ebpc> (Eric Barton's message of "Tue, 20 Mar 2007 17:59:45 -0000") References: <029101c76b19$8af42900$0281a8c0@ebpc> Message-ID: > 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: Catastrophic error detected: internal error Most likely this is a firmware bug, given that it's repeatable on different hardware. What is your HCA type and FW rev? Or does it happen on more than one HCA type? The catastrophic error buffer contents may be meaningful to someone at mellanox... - R. From rdreier at cisco.com Tue Mar 20 12:43:02 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 20 Mar 2007 12:43:02 -0700 Subject: [ofa-general] Problem with dropped CQE's on RDMA CM channel In-Reply-To: <46002FAE.60405@ichips.intel.com> (Sean Hefty's message of "Tue, 20 Mar 2007 12:02:06 -0700") References: <46003AAC.3090505@evergrid.com> <46002FAE.60405@ichips.intel.com> Message-ID: > rdma_disconnect() will immediately transition the QP into the error > state, which can affect queued send operations. Transitioning to error state might make a CQE come back as a flush error, but it should still ensure that all pending work requests complete. However, transitioning a QP to reset might make some completions disappear. - R. From Kapil.Dukle at med.ge.com Tue Mar 20 12:44:08 2007 From: Kapil.Dukle at med.ge.com (Dukle, Kapil (GE Healthcare)) Date: Tue, 20 Mar 2007 15:44:08 -0400 Subject: [ofa-general] Tools for Infiniband Performance measurement In-Reply-To: Message-ID: Hi all, I was wondering if there are any tools specific for Infiniband-performance measurement - throughput, round trip time with adjustable packet sizes. ________________________________ From: Dukle, Kapil (GE Healthcare) Sent: Monday, March 19, 2007 5:37 PM To: general at lists.openfabrics.org; openib-general at openib.org Subject: ibping fails in loopback mode Hi all, I'm having trouble getting ibping to work in loopback node. The original config is to connect first port of Blade 1 with first port of Blade 2. For my loopback test, I have connected the Infiniband cable between the 2 ports on the same Linux blade. Both opensm and ibping (server mode) seem to be running on the blade. Ping succeeds to the first active port, but fails for the second one. See below... Why is the ibping to the second active (and linkup) port failing? Am I missing something? The version is OFED1.0 [root at XXXX]# ps -elf | grep opensm 4 S root 3078 1 0 76 0 - 19039 stext Mar16 ? 00:00:09 /usr/sbin/opensm 4 S root 5444 5424 0 77 0 - 13981 pipe_w 17:22 pts/1 00:00:00 grep opensm [root at XXXX]# ibping -v -S & [1] 5445 [root at vre sdc]# ibwarn: [5445] ibping_serv: starting to serve... [root at XXXX]# ps -elf | grep ibping 4 S root 5445 5424 0 77 0 - 1454 - 17:22 pts/1 00:00:00 ibping -v -S 4 S root 5447 5424 0 77 0 - 13982 - 17:22 pts/1 00:00:00 grep ibping [root at XXXX]# sminfo sminfo: sm lid 0x3 sm guid 0x3ba00010027d9, activity count 203459 priority 1 state SMINFO_MASTER 3 [root at XXXX]# ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.400 Hardware version: a0 Node GUID: 0x0003ba00010027d8 System image GUID: 0x0003ba00010027db Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 3 LMC: 0 SM lid: 3 Capability mask: 0x02510a6a Port GUID: 0x0003ba00010027d9 Port 2: State: Active Physical state: LinkUp Rate: 10 Base lid: 1 LMC: 0 SM lid: 3 Capability mask: 0x02510a68 Port GUID: 0x0003ba00010027da [root at XXXX]# ibping -v 3 ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.111 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.087 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.069 ms ibwarn: [5452] report: out due signal 2 --- vre.(none) (Lid 0x3) ibping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2320 ms rtt min/avg/max = 0.069/0.089/0.111 ms [root at XXXX]# ibping -v 1 ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] report: out due signal 2 --- (Lid 0x1) ibping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 11358 ms rtt min/avg/max = 0.000/0.000/0.000 ms Thanks, Kapil -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kapil.Dukle at med.ge.com Tue Mar 20 12:44:08 2007 From: Kapil.Dukle at med.ge.com (Dukle, Kapil (GE Healthcare)) Date: Tue, 20 Mar 2007 15:44:08 -0400 Subject: [ofa-general] Tools for Infiniband Performance measurement In-Reply-To: Message-ID: Hi all, I was wondering if there are any tools specific for Infiniband-performance measurement - throughput, round trip time with adjustable packet sizes. ________________________________ From: Dukle, Kapil (GE Healthcare) Sent: Monday, March 19, 2007 5:37 PM To: general at lists.openfabrics.org; openib-general at openib.org Subject: ibping fails in loopback mode Hi all, I'm having trouble getting ibping to work in loopback node. The original config is to connect first port of Blade 1 with first port of Blade 2. For my loopback test, I have connected the Infiniband cable between the 2 ports on the same Linux blade. Both opensm and ibping (server mode) seem to be running on the blade. Ping succeeds to the first active port, but fails for the second one. See below... Why is the ibping to the second active (and linkup) port failing? Am I missing something? The version is OFED1.0 [root at XXXX]# ps -elf | grep opensm 4 S root 3078 1 0 76 0 - 19039 stext Mar16 ? 00:00:09 /usr/sbin/opensm 4 S root 5444 5424 0 77 0 - 13981 pipe_w 17:22 pts/1 00:00:00 grep opensm [root at XXXX]# ibping -v -S & [1] 5445 [root at vre sdc]# ibwarn: [5445] ibping_serv: starting to serve... [root at XXXX]# ps -elf | grep ibping 4 S root 5445 5424 0 77 0 - 1454 - 17:22 pts/1 00:00:00 ibping -v -S 4 S root 5447 5424 0 77 0 - 13982 - 17:22 pts/1 00:00:00 grep ibping [root at XXXX]# sminfo sminfo: sm lid 0x3 sm guid 0x3ba00010027d9, activity count 203459 priority 1 state SMINFO_MASTER 3 [root at XXXX]# ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.400 Hardware version: a0 Node GUID: 0x0003ba00010027d8 System image GUID: 0x0003ba00010027db Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 3 LMC: 0 SM lid: 3 Capability mask: 0x02510a6a Port GUID: 0x0003ba00010027d9 Port 2: State: Active Physical state: LinkUp Rate: 10 Base lid: 1 LMC: 0 SM lid: 3 Capability mask: 0x02510a68 Port GUID: 0x0003ba00010027da [root at XXXX]# ibping -v 3 ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.111 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.087 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.069 ms ibwarn: [5452] report: out due signal 2 --- vre.(none) (Lid 0x3) ibping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2320 ms rtt min/avg/max = 0.069/0.089/0.111 ms [root at XXXX]# ibping -v 1 ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] report: out due signal 2 --- (Lid 0x1) ibping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 11358 ms rtt min/avg/max = 0.000/0.000/0.000 ms Thanks, Kapil -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Tue Mar 20 12:46:55 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 20 Mar 2007 12:46:55 -0700 Subject: [ofa-general] Tools for Infiniband Performance measurement In-Reply-To: References: Message-ID: Look in the OFED perftest RPM (documentation is in /usr/local/ofed/docs/PERF_TEST_README.txt): # rpm -qli perftest Name : perftest Relocations: (not relocatable) Version : 1.2 Vendor: OpenFabrics Release : 0 Build Date: Wed 14 Mar 2007 09:21:36 AM PDT Install Date: Thu 15 Mar 2007 08:26:42 PM PDT Build Host: svbu-qa1850-1.cis co.com Group : System Environment/Libraries Source RPM: ofa_user-1.2-beta1.src. rpm Size : 262469 License: GPL/BSD Signature : (none) URL : http://www.openfabrics.org/ Summary : IB Performance tests Description : gen2 uverbs microbenchmarks /usr/local/ofed/bin/ib_clock_test /usr/local/ofed/bin/ib_rdma_bw /usr/local/ofed/bin/ib_rdma_lat /usr/local/ofed/bin/ib_read_bw /usr/local/ofed/bin/ib_read_lat /usr/local/ofed/bin/ib_send_bw /usr/local/ofed/bin/ib_send_lat /usr/local/ofed/bin/ib_write_bw /usr/local/ofed/bin/ib_write_bw_postlist /usr/local/ofed/bin/ib_write_lat /usr/local/ofed/bin/runme Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Dukle, Kapil (GE Healthcare) Sent: Tuesday, March 20, 2007 12:44 PM To: general at lists.openfabrics.org; openib-general at openib.org Subject: [ofa-general] Tools for Infiniband Performance measurement Hi all, I was wondering if there are any tools specific for Infiniband-performance measurement - throughput, round trip time with adjustable packet sizes. ________________________________ From: Dukle, Kapil (GE Healthcare) Sent: Monday, March 19, 2007 5:37 PM To: general at lists.openfabrics.org; openib-general at openib.org Subject: ibping fails in loopback mode Hi all, I'm having trouble getting ibping to work in loopback node. The original config is to connect first port of Blade 1 with first port of Blade 2. For my loopback test, I have connected the Infiniband cable between the 2 ports on the same Linux blade. Both opensm and ibping (server mode) seem to be running on the blade. Ping succeeds to the first active port, but fails for the second one. See below... Why is the ibping to the second active (and linkup) port failing? Am I missing something? The version is OFED1.0 [root at XXXX]# ps -elf | grep opensm 4 S root 3078 1 0 76 0 - 19039 stext Mar16 ? 00:00:09 /usr/sbin/opensm 4 S root 5444 5424 0 77 0 - 13981 pipe_w 17:22 pts/1 00:00:00 grep opensm [root at XXXX]# ibping -v -S & [1] 5445 [root at vre sdc]# ibwarn: [5445] ibping_serv: starting to serve... [root at XXXX]# ps -elf | grep ibping 4 S root 5445 5424 0 77 0 - 1454 - 17:22 pts/1 00:00:00 ibping -v -S 4 S root 5447 5424 0 77 0 - 13982 - 17:22 pts/1 00:00:00 grep ibping [root at XXXX]# sminfo sminfo: sm lid 0x3 sm guid 0x3ba00010027d9, activity count 203459 priority 1 state SMINFO_MASTER 3 [root at XXXX]# ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.400 Hardware version: a0 Node GUID: 0x0003ba00010027d8 System image GUID: 0x0003ba00010027db Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 3 LMC: 0 SM lid: 3 Capability mask: 0x02510a6a Port GUID: 0x0003ba00010027d9 Port 2: State: Active Physical state: LinkUp Rate: 10 Base lid: 1 LMC: 0 SM lid: 3 Capability mask: 0x02510a68 Port GUID: 0x0003ba00010027da [root at XXXX]# ibping -v 3 ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.111 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.087 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.069 ms ibwarn: [5452] report: out due signal 2 --- vre.(none) (Lid 0x3) ibping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2320 ms rtt min/avg/max = 0.069/0.089/0.111 ms [root at XXXX]# ibping -v 1 ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] report: out due signal 2 --- (Lid 0x1) ibping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 11358 ms rtt min/avg/max = 0.000/0.000/0.000 ms Thanks, Kapil -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Tue Mar 20 12:46:55 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 20 Mar 2007 12:46:55 -0700 Subject: [ofa-general] Tools for Infiniband Performance measurement In-Reply-To: References: Message-ID: Look in the OFED perftest RPM (documentation is in /usr/local/ofed/docs/PERF_TEST_README.txt): # rpm -qli perftest Name : perftest Relocations: (not relocatable) Version : 1.2 Vendor: OpenFabrics Release : 0 Build Date: Wed 14 Mar 2007 09:21:36 AM PDT Install Date: Thu 15 Mar 2007 08:26:42 PM PDT Build Host: svbu-qa1850-1.cis co.com Group : System Environment/Libraries Source RPM: ofa_user-1.2-beta1.src. rpm Size : 262469 License: GPL/BSD Signature : (none) URL : http://www.openfabrics.org/ Summary : IB Performance tests Description : gen2 uverbs microbenchmarks /usr/local/ofed/bin/ib_clock_test /usr/local/ofed/bin/ib_rdma_bw /usr/local/ofed/bin/ib_rdma_lat /usr/local/ofed/bin/ib_read_bw /usr/local/ofed/bin/ib_read_lat /usr/local/ofed/bin/ib_send_bw /usr/local/ofed/bin/ib_send_lat /usr/local/ofed/bin/ib_write_bw /usr/local/ofed/bin/ib_write_bw_postlist /usr/local/ofed/bin/ib_write_lat /usr/local/ofed/bin/runme Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Dukle, Kapil (GE Healthcare) Sent: Tuesday, March 20, 2007 12:44 PM To: general at lists.openfabrics.org; openib-general at openib.org Subject: [ofa-general] Tools for Infiniband Performance measurement Hi all, I was wondering if there are any tools specific for Infiniband-performance measurement - throughput, round trip time with adjustable packet sizes. ________________________________ From: Dukle, Kapil (GE Healthcare) Sent: Monday, March 19, 2007 5:37 PM To: general at lists.openfabrics.org; openib-general at openib.org Subject: ibping fails in loopback mode Hi all, I'm having trouble getting ibping to work in loopback node. The original config is to connect first port of Blade 1 with first port of Blade 2. For my loopback test, I have connected the Infiniband cable between the 2 ports on the same Linux blade. Both opensm and ibping (server mode) seem to be running on the blade. Ping succeeds to the first active port, but fails for the second one. See below... Why is the ibping to the second active (and linkup) port failing? Am I missing something? The version is OFED1.0 [root at XXXX]# ps -elf | grep opensm 4 S root 3078 1 0 76 0 - 19039 stext Mar16 ? 00:00:09 /usr/sbin/opensm 4 S root 5444 5424 0 77 0 - 13981 pipe_w 17:22 pts/1 00:00:00 grep opensm [root at XXXX]# ibping -v -S & [1] 5445 [root at vre sdc]# ibwarn: [5445] ibping_serv: starting to serve... [root at XXXX]# ps -elf | grep ibping 4 S root 5445 5424 0 77 0 - 1454 - 17:22 pts/1 00:00:00 ibping -v -S 4 S root 5447 5424 0 77 0 - 13982 - 17:22 pts/1 00:00:00 grep ibping [root at XXXX]# sminfo sminfo: sm lid 0x3 sm guid 0x3ba00010027d9, activity count 203459 priority 1 state SMINFO_MASTER 3 [root at XXXX]# ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.400 Hardware version: a0 Node GUID: 0x0003ba00010027d8 System image GUID: 0x0003ba00010027db Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 3 LMC: 0 SM lid: 3 Capability mask: 0x02510a6a Port GUID: 0x0003ba00010027d9 Port 2: State: Active Physical state: LinkUp Rate: 10 Base lid: 1 LMC: 0 SM lid: 3 Capability mask: 0x02510a68 Port GUID: 0x0003ba00010027da [root at XXXX]# ibping -v 3 ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.111 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.087 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.069 ms ibwarn: [5452] report: out due signal 2 --- vre.(none) (Lid 0x3) ibping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2320 ms rtt min/avg/max = 0.069/0.089/0.111 ms [root at XXXX]# ibping -v 1 ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] report: out due signal 2 --- (Lid 0x1) ibping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 11358 ms rtt min/avg/max = 0.000/0.000/0.000 ms Thanks, Kapil -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at dev.mellanox.co.il Tue Mar 20 13:55:35 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 20 Mar 2007 22:55:35 +0200 Subject: [ofa-general] Re: Tools for Infiniband Performance measurement In-Reply-To: References: Message-ID: <20070320205535.GB3409@mellanox.co.il> git://git.openfabrics.org/~mst/perftest.git Quoting Dukle, Kapil (GE Healthcare) : Subject: Tools for Infiniband Performance measurement Hi all, I was wondering if there are any tools specific for Infiniband-performance measurement - throughput, round trip time with adjustable packet sizes. ________________________________ From: Dukle, Kapil (GE Healthcare) Sent: Monday, March 19, 2007 5:37 PM To: general at lists.openfabrics.org; openib-general at openib.org Subject: ibping fails in loopback mode Hi all, I'm having trouble getting ibping to work in loopback node. The original config is to connect first port of Blade 1 with first port of Blade 2. For my loopback test, I have connected the Infiniband cable between the 2 ports on the same Linux blade. Both opensm and ibping (server mode) seem to be running on the blade. Ping succeeds to the first active port, but fails for the second one. See below... Why is the ibping to the second active (and linkup) port failing? Am I missing something? The version is OFED1.0 [root at XXXX]# ps -elf | grep opensm 4 S root 3078 1 0 76 0 - 19039 stext Mar16 ? 00:00:09 /usr/sbin/opensm 4 S root 5444 5424 0 77 0 - 13981 pipe_w 17:22 pts/1 00:00:00 grep opensm [root at XXXX]# ibping -v -S & [1] 5445 [root at vre sdc]# ibwarn: [5445] ibping_serv: starting to serve... [root at XXXX]# ps -elf | grep ibping 4 S root 5445 5424 0 77 0 - 1454 - 17:22 pts/1 00:00:00 ibping -v -S 4 S root 5447 5424 0 77 0 - 13982 - 17:22 pts/1 00:00:00 grep ibping [root at XXXX]# sminfo sminfo: sm lid 0x3 sm guid 0x3ba00010027d9, activity count 203459 priority 1 state SMINFO_MASTER 3 [root at XXXX]# ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.400 Hardware version: a0 Node GUID: 0x0003ba00010027d8 System image GUID: 0x0003ba00010027db Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 3 LMC: 0 SM lid: 3 Capability mask: 0x02510a6a Port GUID: 0x0003ba00010027d9 Port 2: State: Active Physical state: LinkUp Rate: 10 Base lid: 1 LMC: 0 SM lid: 3 Capability mask: 0x02510a68 Port GUID: 0x0003ba00010027da [root at XXXX]# ibping -v 3 ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.111 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.087 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.069 ms ibwarn: [5452] report: out due signal 2 --- vre.(none) (Lid 0x3) ibping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2320 ms rtt min/avg/max = 0.069/0.089/0.111 ms [root at XXXX]# ibping -v 1 ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] report: out due signal 2 --- (Lid 0x1) ibping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 11358 ms rtt min/avg/max = 0.000/0.000/0.000 ms Thanks, Kapil -- MST From mst at dev.mellanox.co.il Tue Mar 20 13:55:35 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 20 Mar 2007 22:55:35 +0200 Subject: [ofa-general] Re: Tools for Infiniband Performance measurement In-Reply-To: References: Message-ID: <20070320205535.GB3409@mellanox.co.il> git://git.openfabrics.org/~mst/perftest.git Quoting Dukle, Kapil (GE Healthcare) : Subject: Tools for Infiniband Performance measurement Hi all, I was wondering if there are any tools specific for Infiniband-performance measurement - throughput, round trip time with adjustable packet sizes. ________________________________ From: Dukle, Kapil (GE Healthcare) Sent: Monday, March 19, 2007 5:37 PM To: general at lists.openfabrics.org; openib-general at openib.org Subject: ibping fails in loopback mode Hi all, I'm having trouble getting ibping to work in loopback node. The original config is to connect first port of Blade 1 with first port of Blade 2. For my loopback test, I have connected the Infiniband cable between the 2 ports on the same Linux blade. Both opensm and ibping (server mode) seem to be running on the blade. Ping succeeds to the first active port, but fails for the second one. See below... Why is the ibping to the second active (and linkup) port failing? Am I missing something? The version is OFED1.0 [root at XXXX]# ps -elf | grep opensm 4 S root 3078 1 0 76 0 - 19039 stext Mar16 ? 00:00:09 /usr/sbin/opensm 4 S root 5444 5424 0 77 0 - 13981 pipe_w 17:22 pts/1 00:00:00 grep opensm [root at XXXX]# ibping -v -S & [1] 5445 [root at vre sdc]# ibwarn: [5445] ibping_serv: starting to serve... [root at XXXX]# ps -elf | grep ibping 4 S root 5445 5424 0 77 0 - 1454 - 17:22 pts/1 00:00:00 ibping -v -S 4 S root 5447 5424 0 77 0 - 13982 - 17:22 pts/1 00:00:00 grep ibping [root at XXXX]# sminfo sminfo: sm lid 0x3 sm guid 0x3ba00010027d9, activity count 203459 priority 1 state SMINFO_MASTER 3 [root at XXXX]# ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.400 Hardware version: a0 Node GUID: 0x0003ba00010027d8 System image GUID: 0x0003ba00010027db Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 3 LMC: 0 SM lid: 3 Capability mask: 0x02510a6a Port GUID: 0x0003ba00010027d9 Port 2: State: Active Physical state: LinkUp Rate: 10 Base lid: 1 LMC: 0 SM lid: 3 Capability mask: 0x02510a68 Port GUID: 0x0003ba00010027da [root at XXXX]# ibping -v 3 ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.111 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.087 ms ibwarn: [5452] ibping: Ping.. ibwarn: [5445] ibping_serv: Pong: vre.(none) Pong from vre.(none) (Lid 0x3): time 0.069 ms ibwarn: [5452] report: out due signal 2 --- vre.(none) (Lid 0x3) ibping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2320 ms rtt min/avg/max = 0.069/0.089/0.111 ms [root at XXXX]# ibping -v 1 ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] ibping: Ping.. ibwarn: [5449] main: ibping to Lid 0x1 failed ibwarn: [5449] report: out due signal 2 --- (Lid 0x1) ibping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 11358 ms rtt min/avg/max = 0.000/0.000/0.000 ms Thanks, Kapil -- MST From kschoche at scl.ameslab.gov Tue Mar 20 14:43:03 2007 From: kschoche at scl.ameslab.gov (Kyle Schochenmaier) Date: Tue, 20 Mar 2007 16:43:03 -0500 Subject: [ofa-general] Tools for Infiniband Performance measurement In-Reply-To: References: Message-ID: <46005567.8030004@scl.ameslab.gov> Dukle, Kapil (GE Healthcare) wrote: > Hi all, > > I was wondering if there are any tools specific for > Infiniband-performance measurement - throughput, round trip time with > adjustable > packet sizes. > The latest NetPIPE benchmarking utility has Infiniband support: It will measure what you asked. http://source.scl.ameslab.gov/NetPIPE/NetPIPE_3.7.tar.gz Kyle > ------------------------------------------------------------------------ > *From:* Dukle, Kapil (GE Healthcare) > *Sent:* Monday, March 19, 2007 5:37 PM > *To:* general at lists.openfabrics.org; openib-general at openib.org > *Subject:* ibping fails in loopback mode > > Hi all, > > I'm having trouble getting ibping to work in loopback node. The > original config is to connect first port of Blade 1 with first port of > Blade 2. > For my loopback test, I have connected the Infiniband cable between > the 2 ports on the same Linux blade. > Both opensm and ibping (server mode) seem to be running on the blade. > Ping succeeds to the first active port, but fails > for the second one. See below... > > Why is the ibping to the second active (and linkup) port failing? Am > I missing something? The version is OFED1.0 > > > *[root at XXXX]# ps -elf | grep opensm* > 4 S root 3078 1 0 76 0 - 19039 stext Mar16 ? > 00:00:09 */usr/sbin/opensm* > 4 S root 5444 5424 0 77 0 - 13981 pipe_w 17:22 pts/1 > 00:00:00 grep opensm > *[root at XXXX]# ibping -v -S & > *[1] 5445 > [root at vre sdc]# ibwarn: [5445] ibping_serv: starting to serve... > > *[root at XXXX]# ps -elf | grep ibping* > 4 S root 5445 5424 0 77 0 - 1454 - 17:22 pts/1 > 00:00:00 *ibping -v -S* > 4 S root 5447 5424 0 77 0 - 13982 - 17:22 pts/1 > 00:00:00 grep ibping > > *[root at XXXX]# sminfo* > *sminfo: sm lid 0x3* sm guid 0x3ba00010027d9, activity count 203459 > priority 1 state SMINFO_MASTER 3 > > *[root at XXXX]# ibstat* > CA 'mthca0' > CA type: MT25208 (MT23108 compat mode) > Number of ports: 2 > Firmware version: 4.7.400 > Hardware version: a0 > Node GUID: 0x0003ba00010027d8 > System image GUID: 0x0003ba00010027db > *Port 1: > State: Active > Physical state: LinkUp > Rate: 10 > Base lid: 3* > LMC: 0 > SM lid: 3 > Capability mask: 0x02510a6a > Port GUID: 0x0003ba00010027d9 > *Port 2: > State: Active > Physical state: LinkUp > Rate: 10 > Base lid: 1* > LMC: 0 > SM lid: 3 > Capability mask: 0x02510a68 > Port GUID: 0x0003ba00010027da > > *[root at XXXX]# ibping -v 3* > ibwarn: [5452] ibping: Ping.. > ibwarn: [5445] ibping_serv: Pong: vre.(none) > *Pong from vre.(none) (Lid 0x3): time 0.111 ms* > ibwarn: [5452] ibping: Ping.. > ibwarn: [5445] ibping_serv: Pong: vre.(none) > Pong from vre.(none) (Lid 0x3): time 0.087 ms > ibwarn: [5452] ibping: Ping.. > ibwarn: [5445] ibping_serv: Pong: vre.(none) > Pong from vre.(none) (Lid 0x3): time 0.069 ms > ibwarn: [5452] report: out due signal 2 > > --- vre.(none) (Lid 0x3) ibping statistics --- > *3 packets transmitted, 3 received, 0% packet loss, time 2320 ms* > rtt min/avg/max = 0.069/0.089/0.111 ms > *[root at XXXX]# ibping -v 1* > ibwarn: [5449] ibping: Ping.. > *ibwarn: [5449] main: ibping to Lid 0x1 failed* > ibwarn: [5449] ibping: Ping.. > ibwarn: [5449] main: ibping to Lid 0x1 failed > ibwarn: [5449] ibping: Ping.. > ibwarn: [5449] main: ibping to Lid 0x1 failed > ibwarn: [5449] report: out due signal 2 > > --- (Lid 0x1) ibping statistics --- > *3 packets transmitted, 0 received, 100% packet loss, time 11358 ms* > rtt min/avg/max = 0.000/0.000/0.000 ms > > Thanks, > Kapil > !DSPAM:46003995205571657414402! > ------------------------------------------------------------------------ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > !DSPAM:46003995205571657414402! > -- Kyle Schochenmaier kschoche at scl.ameslab.gov Research Assistant, Dr. Brett Bode AmesLab - US Dept.Energy Scalable Computing Laboratory From kschoche at scl.ameslab.gov Tue Mar 20 14:43:03 2007 From: kschoche at scl.ameslab.gov (Kyle Schochenmaier) Date: Tue, 20 Mar 2007 16:43:03 -0500 Subject: [ofa-general] Tools for Infiniband Performance measurement In-Reply-To: References: Message-ID: <46005567.8030004@scl.ameslab.gov> Dukle, Kapil (GE Healthcare) wrote: > Hi all, > > I was wondering if there are any tools specific for > Infiniband-performance measurement - throughput, round trip time with > adjustable > packet sizes. > The latest NetPIPE benchmarking utility has Infiniband support: It will measure what you asked. http://source.scl.ameslab.gov/NetPIPE/NetPIPE_3.7.tar.gz Kyle > ------------------------------------------------------------------------ > *From:* Dukle, Kapil (GE Healthcare) > *Sent:* Monday, March 19, 2007 5:37 PM > *To:* general at lists.openfabrics.org; openib-general at openib.org > *Subject:* ibping fails in loopback mode > > Hi all, > > I'm having trouble getting ibping to work in loopback node. The > original config is to connect first port of Blade 1 with first port of > Blade 2. > For my loopback test, I have connected the Infiniband cable between > the 2 ports on the same Linux blade. > Both opensm and ibping (server mode) seem to be running on the blade. > Ping succeeds to the first active port, but fails > for the second one. See below... > > Why is the ibping to the second active (and linkup) port failing? Am > I missing something? The version is OFED1.0 > > > *[root at XXXX]# ps -elf | grep opensm* > 4 S root 3078 1 0 76 0 - 19039 stext Mar16 ? > 00:00:09 */usr/sbin/opensm* > 4 S root 5444 5424 0 77 0 - 13981 pipe_w 17:22 pts/1 > 00:00:00 grep opensm > *[root at XXXX]# ibping -v -S & > *[1] 5445 > [root at vre sdc]# ibwarn: [5445] ibping_serv: starting to serve... > > *[root at XXXX]# ps -elf | grep ibping* > 4 S root 5445 5424 0 77 0 - 1454 - 17:22 pts/1 > 00:00:00 *ibping -v -S* > 4 S root 5447 5424 0 77 0 - 13982 - 17:22 pts/1 > 00:00:00 grep ibping > > *[root at XXXX]# sminfo* > *sminfo: sm lid 0x3* sm guid 0x3ba00010027d9, activity count 203459 > priority 1 state SMINFO_MASTER 3 > > *[root at XXXX]# ibstat* > CA 'mthca0' > CA type: MT25208 (MT23108 compat mode) > Number of ports: 2 > Firmware version: 4.7.400 > Hardware version: a0 > Node GUID: 0x0003ba00010027d8 > System image GUID: 0x0003ba00010027db > *Port 1: > State: Active > Physical state: LinkUp > Rate: 10 > Base lid: 3* > LMC: 0 > SM lid: 3 > Capability mask: 0x02510a6a > Port GUID: 0x0003ba00010027d9 > *Port 2: > State: Active > Physical state: LinkUp > Rate: 10 > Base lid: 1* > LMC: 0 > SM lid: 3 > Capability mask: 0x02510a68 > Port GUID: 0x0003ba00010027da > > *[root at XXXX]# ibping -v 3* > ibwarn: [5452] ibping: Ping.. > ibwarn: [5445] ibping_serv: Pong: vre.(none) > *Pong from vre.(none) (Lid 0x3): time 0.111 ms* > ibwarn: [5452] ibping: Ping.. > ibwarn: [5445] ibping_serv: Pong: vre.(none) > Pong from vre.(none) (Lid 0x3): time 0.087 ms > ibwarn: [5452] ibping: Ping.. > ibwarn: [5445] ibping_serv: Pong: vre.(none) > Pong from vre.(none) (Lid 0x3): time 0.069 ms > ibwarn: [5452] report: out due signal 2 > > --- vre.(none) (Lid 0x3) ibping statistics --- > *3 packets transmitted, 3 received, 0% packet loss, time 2320 ms* > rtt min/avg/max = 0.069/0.089/0.111 ms > *[root at XXXX]# ibping -v 1* > ibwarn: [5449] ibping: Ping.. > *ibwarn: [5449] main: ibping to Lid 0x1 failed* > ibwarn: [5449] ibping: Ping.. > ibwarn: [5449] main: ibping to Lid 0x1 failed > ibwarn: [5449] ibping: Ping.. > ibwarn: [5449] main: ibping to Lid 0x1 failed > ibwarn: [5449] report: out due signal 2 > > --- (Lid 0x1) ibping statistics --- > *3 packets transmitted, 0 received, 100% packet loss, time 11358 ms* > rtt min/avg/max = 0.000/0.000/0.000 ms > > Thanks, > Kapil > !DSPAM:46003995205571657414402! > ------------------------------------------------------------------------ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > !DSPAM:46003995205571657414402! > -- Kyle Schochenmaier kschoche at scl.ameslab.gov Research Assistant, Dr. Brett Bode AmesLab - US Dept.Energy Scalable Computing Laboratory From jsquyres at cisco.com Tue Mar 20 14:44:38 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 20 Mar 2007 17:44:38 -0400 Subject: [ofa-general] Tools for Infiniband Performance measurement In-Reply-To: <46005567.8030004@scl.ameslab.gov> References: <46005567.8030004@scl.ameslab.gov> Message-ID: <7F3263C6-8EC8-4EFD-9CA0-B3177B0603AC@cisco.com> Note that NetPIPE 3.7 will only use the first port of the first HCA in your machine (which is a fairly common configuration). I have submitted a patch to the NetPIPE maintainers that allows you to specify on the command which port / HCA to use for testing. On Mar 20, 2007, at 5:43 PM, Kyle Schochenmaier wrote: > Dukle, Kapil (GE Healthcare) wrote: >> Hi all, >> I was wondering if there are any tools specific for Infiniband- >> performance measurement - throughput, round trip time with adjustable >> packet sizes. >> > The latest NetPIPE benchmarking utility has Infiniband support: > It will measure what you asked. > > http://source.scl.ameslab.gov/NetPIPE/NetPIPE_3.7.tar.gz > > Kyle >> --------------------------------------------------------------------- >> --- >> *From:* Dukle, Kapil (GE Healthcare) >> *Sent:* Monday, March 19, 2007 5:37 PM >> *To:* general at lists.openfabrics.org; openib-general at openib.org >> *Subject:* ibping fails in loopback mode >> >> Hi all, >> I'm having trouble getting ibping to work in loopback node. The >> original config is to connect first port of Blade 1 with first >> port of Blade 2. >> For my loopback test, I have connected the Infiniband cable >> between the 2 ports on the same Linux blade. >> Both opensm and ibping (server mode) seem to be running on the >> blade. Ping succeeds to the first active port, but fails >> for the second one. See below... >> Why is the ibping to the second active (and linkup) port >> failing? Am I missing something? The version is OFED1.0 >> *[root at XXXX]# ps -elf | grep opensm* >> 4 S root 3078 1 0 76 0 - 19039 stext Mar16 ? >> 00:00:09 */usr/sbin/opensm* >> 4 S root 5444 5424 0 77 0 - 13981 pipe_w 17:22 pts/1 >> 00:00:00 grep opensm >> *[root at XXXX]# ibping -v -S & >> *[1] 5445 >> [root at vre sdc]# ibwarn: [5445] ibping_serv: starting to serve... >> *[root at XXXX]# ps -elf | grep ibping* >> 4 S root 5445 5424 0 77 0 - 1454 - 17:22 pts/1 >> 00:00:00 *ibping -v -S* >> 4 S root 5447 5424 0 77 0 - 13982 - 17:22 pts/1 >> 00:00:00 grep ibping >> *[root at XXXX]# sminfo* >> *sminfo: sm lid 0x3* sm guid 0x3ba00010027d9, activity count >> 203459 priority 1 state SMINFO_MASTER 3 >> *[root at XXXX]# ibstat* >> CA 'mthca0' >> CA type: MT25208 (MT23108 compat mode) >> Number of ports: 2 >> Firmware version: 4.7.400 >> Hardware version: a0 >> Node GUID: 0x0003ba00010027d8 >> System image GUID: 0x0003ba00010027db >> *Port 1: >> State: Active >> Physical state: LinkUp >> Rate: 10 >> Base lid: 3* >> LMC: 0 >> SM lid: 3 >> Capability mask: 0x02510a6a >> Port GUID: 0x0003ba00010027d9 >> *Port 2: >> State: Active >> Physical state: LinkUp >> Rate: 10 >> Base lid: 1* >> LMC: 0 >> SM lid: 3 >> Capability mask: 0x02510a68 >> Port GUID: 0x0003ba00010027da >> *[root at XXXX]# ibping -v 3* >> ibwarn: [5452] ibping: Ping.. >> ibwarn: [5445] ibping_serv: Pong: vre.(none) >> *Pong from vre.(none) (Lid 0x3): time 0.111 ms* >> ibwarn: [5452] ibping: Ping.. >> ibwarn: [5445] ibping_serv: Pong: vre.(none) >> Pong from vre.(none) (Lid 0x3): time 0.087 ms >> ibwarn: [5452] ibping: Ping.. >> ibwarn: [5445] ibping_serv: Pong: vre.(none) >> Pong from vre.(none) (Lid 0x3): time 0.069 ms >> ibwarn: [5452] report: out due signal 2 >> --- vre.(none) (Lid 0x3) ibping statistics --- >> *3 packets transmitted, 3 received, 0% packet loss, time 2320 ms* >> rtt min/avg/max = 0.069/0.089/0.111 ms >> *[root at XXXX]# ibping -v 1* >> ibwarn: [5449] ibping: Ping.. >> *ibwarn: [5449] main: ibping to Lid 0x1 failed* >> ibwarn: [5449] ibping: Ping.. >> ibwarn: [5449] main: ibping to Lid 0x1 failed >> ibwarn: [5449] ibping: Ping.. >> ibwarn: [5449] main: ibping to Lid 0x1 failed >> ibwarn: [5449] report: out due signal 2 >> --- (Lid 0x1) ibping statistics --- >> *3 packets transmitted, 0 received, 100% packet loss, time 11358 ms* >> rtt min/avg/max = 0.000/0.000/0.000 ms >> Thanks, >> Kapil >> !DSPAM:46003995205571657414402! >> --------------------------------------------------------------------- >> --- >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/ >> openib-general >> >> !DSPAM:46003995205571657414402! >> > > > -- > Kyle Schochenmaier > kschoche at scl.ameslab.gov > Research Assistant, Dr. Brett Bode > AmesLab - US Dept.Energy > Scalable Computing Laboratory > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Cisco Systems From jsquyres at cisco.com Tue Mar 20 14:44:38 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 20 Mar 2007 17:44:38 -0400 Subject: [ofa-general] Tools for Infiniband Performance measurement In-Reply-To: <46005567.8030004@scl.ameslab.gov> References: <46005567.8030004@scl.ameslab.gov> Message-ID: <7F3263C6-8EC8-4EFD-9CA0-B3177B0603AC@cisco.com> Note that NetPIPE 3.7 will only use the first port of the first HCA in your machine (which is a fairly common configuration). I have submitted a patch to the NetPIPE maintainers that allows you to specify on the command which port / HCA to use for testing. On Mar 20, 2007, at 5:43 PM, Kyle Schochenmaier wrote: > Dukle, Kapil (GE Healthcare) wrote: >> Hi all, >> I was wondering if there are any tools specific for Infiniband- >> performance measurement - throughput, round trip time with adjustable >> packet sizes. >> > The latest NetPIPE benchmarking utility has Infiniband support: > It will measure what you asked. > > http://source.scl.ameslab.gov/NetPIPE/NetPIPE_3.7.tar.gz > > Kyle >> --------------------------------------------------------------------- >> --- >> *From:* Dukle, Kapil (GE Healthcare) >> *Sent:* Monday, March 19, 2007 5:37 PM >> *To:* general at lists.openfabrics.org; openib-general at openib.org >> *Subject:* ibping fails in loopback mode >> >> Hi all, >> I'm having trouble getting ibping to work in loopback node. The >> original config is to connect first port of Blade 1 with first >> port of Blade 2. >> For my loopback test, I have connected the Infiniband cable >> between the 2 ports on the same Linux blade. >> Both opensm and ibping (server mode) seem to be running on the >> blade. Ping succeeds to the first active port, but fails >> for the second one. See below... >> Why is the ibping to the second active (and linkup) port >> failing? Am I missing something? The version is OFED1.0 >> *[root at XXXX]# ps -elf | grep opensm* >> 4 S root 3078 1 0 76 0 - 19039 stext Mar16 ? >> 00:00:09 */usr/sbin/opensm* >> 4 S root 5444 5424 0 77 0 - 13981 pipe_w 17:22 pts/1 >> 00:00:00 grep opensm >> *[root at XXXX]# ibping -v -S & >> *[1] 5445 >> [root at vre sdc]# ibwarn: [5445] ibping_serv: starting to serve... >> *[root at XXXX]# ps -elf | grep ibping* >> 4 S root 5445 5424 0 77 0 - 1454 - 17:22 pts/1 >> 00:00:00 *ibping -v -S* >> 4 S root 5447 5424 0 77 0 - 13982 - 17:22 pts/1 >> 00:00:00 grep ibping >> *[root at XXXX]# sminfo* >> *sminfo: sm lid 0x3* sm guid 0x3ba00010027d9, activity count >> 203459 priority 1 state SMINFO_MASTER 3 >> *[root at XXXX]# ibstat* >> CA 'mthca0' >> CA type: MT25208 (MT23108 compat mode) >> Number of ports: 2 >> Firmware version: 4.7.400 >> Hardware version: a0 >> Node GUID: 0x0003ba00010027d8 >> System image GUID: 0x0003ba00010027db >> *Port 1: >> State: Active >> Physical state: LinkUp >> Rate: 10 >> Base lid: 3* >> LMC: 0 >> SM lid: 3 >> Capability mask: 0x02510a6a >> Port GUID: 0x0003ba00010027d9 >> *Port 2: >> State: Active >> Physical state: LinkUp >> Rate: 10 >> Base lid: 1* >> LMC: 0 >> SM lid: 3 >> Capability mask: 0x02510a68 >> Port GUID: 0x0003ba00010027da >> *[root at XXXX]# ibping -v 3* >> ibwarn: [5452] ibping: Ping.. >> ibwarn: [5445] ibping_serv: Pong: vre.(none) >> *Pong from vre.(none) (Lid 0x3): time 0.111 ms* >> ibwarn: [5452] ibping: Ping.. >> ibwarn: [5445] ibping_serv: Pong: vre.(none) >> Pong from vre.(none) (Lid 0x3): time 0.087 ms >> ibwarn: [5452] ibping: Ping.. >> ibwarn: [5445] ibping_serv: Pong: vre.(none) >> Pong from vre.(none) (Lid 0x3): time 0.069 ms >> ibwarn: [5452] report: out due signal 2 >> --- vre.(none) (Lid 0x3) ibping statistics --- >> *3 packets transmitted, 3 received, 0% packet loss, time 2320 ms* >> rtt min/avg/max = 0.069/0.089/0.111 ms >> *[root at XXXX]# ibping -v 1* >> ibwarn: [5449] ibping: Ping.. >> *ibwarn: [5449] main: ibping to Lid 0x1 failed* >> ibwarn: [5449] ibping: Ping.. >> ibwarn: [5449] main: ibping to Lid 0x1 failed >> ibwarn: [5449] ibping: Ping.. >> ibwarn: [5449] main: ibping to Lid 0x1 failed >> ibwarn: [5449] report: out due signal 2 >> --- (Lid 0x1) ibping statistics --- >> *3 packets transmitted, 0 received, 100% packet loss, time 11358 ms* >> rtt min/avg/max = 0.000/0.000/0.000 ms >> Thanks, >> Kapil >> !DSPAM:46003995205571657414402! >> --------------------------------------------------------------------- >> --- >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/ >> openib-general >> >> !DSPAM:46003995205571657414402! >> > > > -- > Kyle Schochenmaier > kschoche at scl.ameslab.gov > Research Assistant, Dr. Brett Bode > AmesLab - US Dept.Energy > Scalable Computing Laboratory > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Cisco Systems From rdreier at cisco.com Tue Mar 20 14:50:16 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 20 Mar 2007 14:50:16 -0700 Subject: [ofa-general] Re: [PATCH for-2.6.21] mthca: QP reset race fixup In-Reply-To: <20070320133932.GC18162@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 20 Mar 2007 15:39:32 +0200") References: <20070320133932.GC18162@mellanox.co.il> Message-ID: > This fixes openfabrics bugzilla 394: > - Use common EQ for command interface and async events > - Clean CQ after moving QP to reset This is a little terse -- an ideal changelog entry would explain what the bug is, what is being changed to fix it, and why that fixes the issue. I'll try to fix it up... > This also fixes a potential crash in ipoib cm: > - sync with completion event ISR after QP is reset > to prevent ULP from getting and using QP pointer and/or WRID > after they are freed. > We can rip the lines that sync with MTHCA_EQ_COMP out if you think > the issue needs to be dealt with in some other way - and I agree > this is only good for ULPs that do all their polling > inside the ISR, but at least this covers all in-kernel code. I don't really like this change, although maybe it's the right thing to do. But can you explain what IPoIB CM is doing that would cause it to run into trouble? I'd like to see if there's a better solution. It just seems strange to me to add the assumption that destroying a QP makes sure that all running CQ callbacks are done. If we change to NAPI (so that CQs are polled asynchronously) does that readd the same bug? > @@ -864,6 +865,11 @@ int mthca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask, > + if (dev->mthca_flags & MTHCA_FLAG_MSI_X) > + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_COMP].msi_x_vector); > + else > + synchronize_irq(dev->pdev->irq); ...and it seems doubly strange to have the modify QP operation wait for CQ callbacks. - R. From mike.heffner at evergrid.com Tue Mar 20 15:52:39 2007 From: mike.heffner at evergrid.com (Mike Heffner) Date: Tue, 20 Mar 2007 17:52:39 -0500 Subject: [ofa-general] Problem with dropped CQE's on RDMA CM channel In-Reply-To: <46002FAE.60405@ichips.intel.com> References: <46003AAC.3090505@evergrid.com> <46002FAE.60405@ichips.intel.com> Message-ID: <460065B7.5080903@evergrid.com> Sean Hefty wrote: >> Can you call rdma_disconnect() immediately after posting sends on the >> QP? I don't see any CQE's come back with errors but they appear to >> "disappear" and never get signaled on one peer side. Are there any >> potential race issues to avoid here (it only happens about one out of >> every 100 connections)? > > rdma_disconnect() will immediately transition the QP into the error > state, which can affect queued send operations. > > I think the situation that you're describing could happen if the side > that received the send transitioned the QP into the error state, but the > ACK sent back to the sender was lost. That's what I was worried about. Is there anyway to manually flush a QP before disconnecting it? > > You could try having one side initiate the rdma_disconnect, with the > other side waiting for the disconnect event before calling > rdma_disconnect itself. Unfortunately I still end up with a situation that whatever data was last sent on the connection may not get flushed correctly to the receiver if the sender shuts down the channel -- immediately calling rdma_disconnect() and transitioning the QP to error/reset state. The handshake with send messages was my attempt at avoiding that problem. ;-) > > - Sean > > Mike -- Mike Heffner EverGrid Software Blacksburg, VA USA Voice: (540) 443-3500 x603 From robert.j.woodruff at intel.com Tue Mar 20 14:56:32 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Tue, 20 Mar 2007 14:56:32 -0700 Subject: [ofa-general] IPoIB connected mode on RedHat EL5 Message-ID: I am running the OFED 1.2 on RedHat EL5 and it appears that IPoIB connected mode is not enable, since the mpu size is set to 2044 and cannot be set higher than 2044. On our RedHat EL4 systems, by default the mtu size is 65536 and it appears that connected mode is working. Is this expected behavior for the Beta release and will this be fixed before the release ? woody From rdreier at cisco.com Tue Mar 20 14:57:06 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 20 Mar 2007 14:57:06 -0700 Subject: [ofa-general] Problem with dropped CQE's on RDMA CM channel In-Reply-To: <460065B7.5080903@evergrid.com> (Mike Heffner's message of "Tue, 20 Mar 2007 17:52:39 -0500") References: <46003AAC.3090505@evergrid.com> <46002FAE.60405@ichips.intel.com> <460065B7.5080903@evergrid.com> Message-ID: > That's what I was worried about. Is there anyway to manually flush a > QP before disconnecting it? What do you mean by flushing the QP? There's no way to make the QP execute work requests any faster, but you can wait for all your requests to complete before you disconnect it. Or if you transition the QP to the error state, then all the outstanding work requests will be completed with a "flush error" status, and you can poll your CQ until you know there are no more outstanding work requests before you transition the QP to the reset state and/or destroy the QP. - R. From robert.j.woodruff at intel.com Tue Mar 20 14:56:32 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Tue, 20 Mar 2007 14:56:32 -0700 Subject: [ofa-general] IPoIB connected mode on RedHat EL5 Message-ID: I am running the OFED 1.2 on RedHat EL5 and it appears that IPoIB connected mode is not enable, since the mpu size is set to 2044 and cannot be set higher than 2044. On our RedHat EL4 systems, by default the mtu size is 65536 and it appears that connected mode is working. Is this expected behavior for the Beta release and will this be fixed before the release ? woody From sweitzen at cisco.com Tue Mar 20 15:00:19 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 20 Mar 2007 15:00:19 -0700 Subject: [ofa-general] IPoIB connected mode on RedHat EL5 In-Reply-To: References: Message-ID: It's working OK for me, are you sure you are using the OFED 1.2 ib_ipoib.ko and not the one supplied with RHEL5? [root at svbu-qa1950-5 ~]# uname -a Linux svbu-qa1950-5 2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:14 EST 2007 x86_64 x86_ 64 x86_64 GNU/Linux [root at svbu-qa1950-5 ~]# ifconfig ib0 ib0 Link encap:InfiniBand HWaddr 80:00:04:04:FE:80:00:00:00:00:00:00:00:0 0:00:00:00:00:00:00 inet addr:192.168.2.101 Bcast:192.168.3.255 Mask:255.255.252.0 inet6 addr: fe80::205:ad00:8:cbd9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:31702824 errors:0 dropped:0 overruns:0 frame:0 TX packets:1092608 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:49688139319 (46.2 GiB) TX bytes:63909394 (60.9 MiB) [root at svbu-qa1950-5 ~]# grep OFED /usr/local/ofed/BUILD_ID OFED-1.2-20070314-0600 Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Woodruff, Robert J > Sent: Tuesday, March 20, 2007 2:57 PM > To: Michael S. Tsirkin > Cc: general at lists.openfabrics.org; openib-general at openib.org > Subject: [ofa-general] IPoIB connected mode on RedHat EL5 > > I am running the OFED 1.2 on RedHat EL5 and it appears that IPoIB > connected mode is not enable, since the mpu size is set to 2044 > and cannot be set > higher than 2044. On our RedHat EL4 systems, by default the > mtu size is 65536 and it appears that connected mode is working. > Is this expected behavior for the Beta release and will this be > fixed before the release ? > > woody > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From sweitzen at cisco.com Tue Mar 20 15:00:19 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 20 Mar 2007 15:00:19 -0700 Subject: [ofa-general] IPoIB connected mode on RedHat EL5 In-Reply-To: References: Message-ID: It's working OK for me, are you sure you are using the OFED 1.2 ib_ipoib.ko and not the one supplied with RHEL5? [root at svbu-qa1950-5 ~]# uname -a Linux svbu-qa1950-5 2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:14 EST 2007 x86_64 x86_ 64 x86_64 GNU/Linux [root at svbu-qa1950-5 ~]# ifconfig ib0 ib0 Link encap:InfiniBand HWaddr 80:00:04:04:FE:80:00:00:00:00:00:00:00:0 0:00:00:00:00:00:00 inet addr:192.168.2.101 Bcast:192.168.3.255 Mask:255.255.252.0 inet6 addr: fe80::205:ad00:8:cbd9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:31702824 errors:0 dropped:0 overruns:0 frame:0 TX packets:1092608 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:49688139319 (46.2 GiB) TX bytes:63909394 (60.9 MiB) [root at svbu-qa1950-5 ~]# grep OFED /usr/local/ofed/BUILD_ID OFED-1.2-20070314-0600 Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Woodruff, Robert J > Sent: Tuesday, March 20, 2007 2:57 PM > To: Michael S. Tsirkin > Cc: general at lists.openfabrics.org; openib-general at openib.org > Subject: [ofa-general] IPoIB connected mode on RedHat EL5 > > I am running the OFED 1.2 on RedHat EL5 and it appears that IPoIB > connected mode is not enable, since the mpu size is set to 2044 > and cannot be set > higher than 2044. On our RedHat EL4 systems, by default the > mtu size is 65536 and it appears that connected mode is working. > Is this expected behavior for the Beta release and will this be > fixed before the release ? > > woody > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From mike.heffner at evergrid.com Tue Mar 20 16:38:25 2007 From: mike.heffner at evergrid.com (Mike Heffner) Date: Tue, 20 Mar 2007 18:38:25 -0500 Subject: [ofa-general] Problem with dropped CQE's on RDMA CM channel In-Reply-To: References: <46003AAC.3090505@evergrid.com> <46002FAE.60405@ichips.intel.com> <460065B7.5080903@evergrid.com> Message-ID: <46007071.2090607@evergrid.com> Roland Dreier wrote: > > That's what I was worried about. Is there anyway to manually flush a > > QP before disconnecting it? > > What do you mean by flushing the QP? There's no way to make the QP > execute work requests any faster, but you can wait for all your > requests to complete before you disconnect it. Yes, I would like to wait for the ACK's of messages sent to me to correctly arrive at the sender. > Or if you transition > the QP to the error state, then all the outstanding work requests will > be completed with a "flush error" status, and you can poll your CQ > until you know there are no more outstanding work requests before you > transition the QP to the reset state and/or destroy the QP. That would be fine. I'll try manually transitioning the QP to the error state before calling rdma_disconnect() in case the QP is going to "reset" and dropping ACK responses. > > - R. > > -- Mike Heffner EverGrid Software Blacksburg, VA USA Voice: (540) 443-3500 x603 From robert.j.woodruff at intel.com Tue Mar 20 15:39:20 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Tue, 20 Mar 2007 15:39:20 -0700 Subject: [ofa-general] IPoIB connected mode on RedHat EL5 In-Reply-To: Message-ID: Never mind. Just an installation/configuration problem on my part. woody -----Original Message----- From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Sent: Tuesday, March 20, 2007 3:00 PM To: Woodruff, Robert J; Michael S. Tsirkin Cc: general at lists.openfabrics.org; openib-general at openib.org Subject: RE: [ofa-general] IPoIB connected mode on RedHat EL5 It's working OK for me, are you sure you are using the OFED 1.2 ib_ipoib.ko and not the one supplied with RHEL5? [root at svbu-qa1950-5 ~]# uname -a Linux svbu-qa1950-5 2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:14 EST 2007 x86_64 x86_ 64 x86_64 GNU/Linux [root at svbu-qa1950-5 ~]# ifconfig ib0 ib0 Link encap:InfiniBand HWaddr 80:00:04:04:FE:80:00:00:00:00:00:00:00:0 0:00:00:00:00:00:00 inet addr:192.168.2.101 Bcast:192.168.3.255 Mask:255.255.252.0 inet6 addr: fe80::205:ad00:8:cbd9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:31702824 errors:0 dropped:0 overruns:0 frame:0 TX packets:1092608 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:49688139319 (46.2 GiB) TX bytes:63909394 (60.9 MiB) [root at svbu-qa1950-5 ~]# grep OFED /usr/local/ofed/BUILD_ID OFED-1.2-20070314-0600 Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Woodruff, Robert J > Sent: Tuesday, March 20, 2007 2:57 PM > To: Michael S. Tsirkin > Cc: general at lists.openfabrics.org; openib-general at openib.org > Subject: [ofa-general] IPoIB connected mode on RedHat EL5 > > I am running the OFED 1.2 on RedHat EL5 and it appears that IPoIB > connected mode is not enable, since the mpu size is set to 2044 > and cannot be set > higher than 2044. On our RedHat EL4 systems, by default the > mtu size is 65536 and it appears that connected mode is working. > Is this expected behavior for the Beta release and will this be > fixed before the release ? > > woody > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From robert.j.woodruff at intel.com Tue Mar 20 15:39:20 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Tue, 20 Mar 2007 15:39:20 -0700 Subject: [ofa-general] IPoIB connected mode on RedHat EL5 In-Reply-To: Message-ID: Never mind. Just an installation/configuration problem on my part. woody -----Original Message----- From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Sent: Tuesday, March 20, 2007 3:00 PM To: Woodruff, Robert J; Michael S. Tsirkin Cc: general at lists.openfabrics.org; openib-general at openib.org Subject: RE: [ofa-general] IPoIB connected mode on RedHat EL5 It's working OK for me, are you sure you are using the OFED 1.2 ib_ipoib.ko and not the one supplied with RHEL5? [root at svbu-qa1950-5 ~]# uname -a Linux svbu-qa1950-5 2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:14 EST 2007 x86_64 x86_ 64 x86_64 GNU/Linux [root at svbu-qa1950-5 ~]# ifconfig ib0 ib0 Link encap:InfiniBand HWaddr 80:00:04:04:FE:80:00:00:00:00:00:00:00:0 0:00:00:00:00:00:00 inet addr:192.168.2.101 Bcast:192.168.3.255 Mask:255.255.252.0 inet6 addr: fe80::205:ad00:8:cbd9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:31702824 errors:0 dropped:0 overruns:0 frame:0 TX packets:1092608 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:49688139319 (46.2 GiB) TX bytes:63909394 (60.9 MiB) [root at svbu-qa1950-5 ~]# grep OFED /usr/local/ofed/BUILD_ID OFED-1.2-20070314-0600 Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Woodruff, Robert J > Sent: Tuesday, March 20, 2007 2:57 PM > To: Michael S. Tsirkin > Cc: general at lists.openfabrics.org; openib-general at openib.org > Subject: [ofa-general] IPoIB connected mode on RedHat EL5 > > I am running the OFED 1.2 on RedHat EL5 and it appears that IPoIB > connected mode is not enable, since the mpu size is set to 2044 > and cannot be set > higher than 2044. On our RedHat EL4 systems, by default the > mtu size is 65536 and it appears that connected mode is working. > Is this expected behavior for the Beta release and will this be > fixed before the release ? > > woody > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From davem at davemloft.net Tue Mar 20 16:34:56 2007 From: davem at davemloft.net (David Miller) Date: Tue, 20 Mar 2007 16:34:56 -0700 (PDT) Subject: [ofa-general] Re: dst_ifdown breaks infiniband? In-Reply-To: <20070320160217.GC31495@mellanox.co.il> References: <20070319151336.GA24225@mellanox.co.il> <20070319232043.GA23359@ms2.inr.ac.ru> <20070320160217.GC31495@mellanox.co.il> Message-ID: <20070320.163456.41638776.davem@davemloft.net> From: "Michael S. Tsirkin" Date: Tue, 20 Mar 2007 18:02:17 +0200 > David, Alexey, what do you think about this patch? Is it right? > Could this patch be considered for 2.6.21? > > Acked-by: Michael S. Tsirkin I plan to apply it and merge. From mshefty at ichips.intel.com Tue Mar 20 17:10:33 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Mar 2007 17:10:33 -0700 Subject: [ofa-general] [PATCH] use LIDs from REQ LRH for inter-subnet connections In-Reply-To: <000301c76aab$0fba54d0$8ffc070a@amr.corp.intel.com> References: <000301c76aab$0fba54d0$8ffc070a@amr.corp.intel.com> Message-ID: <460077F9.4030203@ichips.intel.com> > When you get a chance, can you try out this patch? I tested that it worked > for a local subnet connection by commenting out the hop_limit check. So, > I'm interested to know if you run into any problems. If you do run into > issues, madeye may be able to help. I've reworked this patch, and added a couple more based on 2.6.21-rc4. The patches are available from git://git.openfabrics.org/~shefty/rdma-dev.git ib_router I started to put checks to validate the CM REQ data against the received LRH data, but dropped them. A couple of other notes: * If the LID in the CM REQ is permissive, then the LID/SL data from the LRH is used instead. The user of the ib_cm is responsible for setting the LID to permissive to control this, which should allow this functionality to work alongside the more general ib_remote_sa type of solution. * The CM architecture allows for primary and alternate paths where one is subnet local, and the other is routed. I don't think anything breaks this at the moment, and I believe that the changes to the ib_cm to support routing are done. (Support for non-reversible paths is still missing.) I'm still developing the ib_remote_sa solution. All of my testing done so far is limited to a single subnet, with hacks thrown into the code to force execution of the new code paths. - Sean From rdreier at cisco.com Tue Mar 20 17:19:52 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 20 Mar 2007 17:19:52 -0700 Subject: [ofa-general] Re: IsSMdisabled and user_mad.c In-Reply-To: <1173806051.5995.98321.camel@hal.voltaire.com> (Hal Rosenstock's message of "13 Mar 2007 12:14:14 -0500") References: <1173806051.5995.98321.camel@hal.voltaire.com> Message-ID: [err, lists.openfabrics.org, not voltaire.com...] > Currently user_mad.c does not currently support the IsSMdisabled > capability mask bit in PortInfo attribute. I propose adding support for > a per port issmdisabled similar to issm in user_mad.c. I also think an > API change may not be necessary as applications can deal with the lack > of this file gracefully. If this sounds acceptable, I will work on a > patch for this. Thanks. I guess it's OK, although I would also like to know how you plan to handle the interaction between IsSM and IsSMDisabled -- eg what if a process opens issm0 and then another process tries to open issmdisabled0? Or conversely if issmdisabled0 is open, what happens when someone opens issm0? - R. From sashak at voltaire.com Tue Mar 20 19:00:22 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 21 Mar 2007 04:00:22 +0200 Subject: [ofa-general] Re: [PATCHv2] osm: Clearing lid matrices before rebuilding them In-Reply-To: <45FFAEC4.1040300@dev.mellanox.co.il> References: <45FE7740.2080308@dev.mellanox.co.il> <1174313910.13051.25.camel@localhost> <45FEA350.7070605@dev.mellanox.co.il> <20070319185531.GN19999@sashak.voltaire.com> <45FFAEC4.1040300@dev.mellanox.co.il> Message-ID: <1174442422.3317.70.camel@localhost> On Tue, 2007-03-20 at 11:52 +0200, Yevgeny Kliteynik wrote: > > I had this problem on some copy of master that wasn't updated. > After updating it I can't see this problem happening again. > But the hop count in not cleared there too, so even if I can't > recreate this problem (or even if the new flow solves this particular > bug), I think we do agree that it would be better to clear hop count > anyway. Sure - this solves switch moving problem. Thanks for fixing it. Sasha From sashak at voltaire.com Tue Mar 20 18:23:55 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 21 Mar 2007 03:23:55 +0200 Subject: [ofa-general] [PATCH] opensm: complib_init() after block_signals() Message-ID: <20070321012355.GA20990@sashak.voltaire.com> Move complib_init() call where timer thread is created after block_signals(), so timer thread will not run signal handlers. Signed-off-by: Sasha Khapyorsky --- osm/opensm/main.c | 11 +++-------- 1 files changed, 3 insertions(+), 8 deletions(-) diff --git a/osm/opensm/main.c b/osm/opensm/main.c index 511df46..f572744 100644 --- a/osm/opensm/main.c +++ b/osm/opensm/main.c @@ -316,7 +316,7 @@ show_usage(void) printf( "-?\n" " Display this usage info then exit.\n\n" ); fflush( stdout ); - osm_exit_flag = 1; + exit(2); } /********************************************************************** @@ -593,8 +593,6 @@ main( { NULL, 0, NULL, 0 } /* Required at the end of the array */ }; - complib_init(); - /* Make sure that the opensm and complib were compiled using same modes (debug/free) */ if ( osm_is_debug() != cl_is_debug() ) @@ -872,11 +870,6 @@ main( } while(next_option != -1); - if (osm_exit_flag) { - complib_exit(); - return( 0 ); - } - if (opt.log_file != NULL ) printf(" Log File: %s\n", opt.log_file ); /* Done with options description */ @@ -889,6 +882,8 @@ main( block_signals(); + complib_init(); + status = osm_opensm_init( &osm, &opt ); if( status != IB_SUCCESS ) { -- 1.5.0.3.401.g27ebd From sashak at voltaire.com Tue Mar 20 18:27:02 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 21 Mar 2007 03:27:02 +0200 Subject: [ofa-general] [PATCH TRIVIAL] opensm/osmtest: remove unused show_menu() In-Reply-To: <20070321012355.GA20990@sashak.voltaire.com> References: <20070321012355.GA20990@sashak.voltaire.com> Message-ID: <20070321012702.GB20990@sashak.voltaire.com> Remove unused show_menu() func from opensm and osmtest. Signed-off-by: Sasha Khapyorsky --- osm/opensm/main.c | 10 ---------- osm/osmtest/main.c | 11 ----------- 2 files changed, 0 insertions(+), 21 deletions(-) diff --git a/osm/opensm/main.c b/osm/opensm/main.c index f572744..5f26638 100644 --- a/osm/opensm/main.c +++ b/osm/opensm/main.c @@ -321,16 +321,6 @@ show_usage(void) /********************************************************************** **********************************************************************/ - -void -show_menu(void) -{ - printf("\n------- Interactive Menu -------\n"); - printf("X - Exit.\n\n"); -} - -/********************************************************************** - **********************************************************************/ ib_net64_t get_port_guid( IN osm_opensm_t *p_osm, uint64_t port_guid ) diff --git a/osm/osmtest/main.c b/osm/osmtest/main.c index 5f402b7..e0da267 100644 --- a/osm/osmtest/main.c +++ b/osm/osmtest/main.c @@ -280,17 +280,6 @@ get_port_guid( /********************************************************************** **********************************************************************/ -void show_menu(void); - -void -show_menu( ) -{ - printf( "\n------- Interactive Menu -------\n" ); - printf( "X - Exit\n\n" ); -} - -/********************************************************************** - **********************************************************************/ int main( int argc, char *argv[] ) -- 1.5.0.3.401.g27ebd From sashak at voltaire.com Tue Mar 20 18:33:06 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 21 Mar 2007 03:33:06 +0200 Subject: [ofa-general] [PATCH] opensm: daemon mode In-Reply-To: <20070321012702.GB20990@sashak.voltaire.com> References: <20070321012355.GA20990@sashak.voltaire.com> <20070321012702.GB20990@sashak.voltaire.com> Message-ID: <20070321013306.GC20990@sashak.voltaire.com> This adds daemon mode support for OpenSM. The process will be detached from terminal and backgrounded. Use '-B' or '--daemon' options to activate. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_log.h | 1 + osm/include/opensm/osm_subnet.h | 4 ++ osm/opensm/main.c | 59 +++++++++++++++++++++++++++++++++++++-- osm/opensm/osm_log.c | 6 ++++ osm/opensm/osm_opensm.c | 3 ++ osm/opensm/osm_subnet.c | 13 ++++++++ 6 files changed, 83 insertions(+), 3 deletions(-) diff --git a/osm/include/opensm/osm_log.h b/osm/include/opensm/osm_log.h index ec79ca7..a556ad2 100644 --- a/osm/include/opensm/osm_log.h +++ b/osm/include/opensm/osm_log.h @@ -128,6 +128,7 @@ typedef struct _osm_log boolean_t flush; FILE* out_port; boolean_t accum_log_file; + boolean_t daemon; char* log_file_name; } osm_log_t; /*********/ diff --git a/osm/include/opensm/osm_subnet.h b/osm/include/opensm/osm_subnet.h index 5bfba44..3091f65 100644 --- a/osm/include/opensm/osm_subnet.h +++ b/osm/include/opensm/osm_subnet.h @@ -282,6 +282,7 @@ typedef struct _osm_subn_opt char * sa_db_file; boolean_t exit_on_fatal; boolean_t honor_guid2lid_file; + boolean_t daemon; osm_qos_options_t qos_options; osm_qos_options_t qos_ca_options; osm_qos_options_t qos_sw0_options; @@ -460,6 +461,9 @@ typedef struct _osm_subn_opt * means that the file will be honored when SM is coming out of * STANDBY. By default this is FALSE. * +* daemon +* OpenSM will run in daemon mode. +* * qos_options * Default set of QoS options * diff --git a/osm/opensm/main.c b/osm/opensm/main.c index 5f26638..3f465e9 100644 --- a/osm/opensm/main.c +++ b/osm/opensm/main.c @@ -50,6 +50,9 @@ #include #include #include +#include +#include +#include #include #include #include @@ -263,6 +266,9 @@ show_usage(void) " issues: if SM discovers duplicated guids or 12x link with\n" " lane reversal badly configured.\n" " By default, the SM will exit on these errors.\n\n"); + printf( "-B\n" + "--daemon\n" + " Run in daemon mode - OpenSM will run in the background.\n\n"); printf( "-v\n" "--verbose\n" " This option increases the log verbosity level.\n" @@ -516,6 +522,45 @@ parse_ignore_guids_file(IN char *guids_file_name, /********************************************************************** **********************************************************************/ + +static int daemonize(osm_opensm_t *osm) +{ + pid_t pid; + int fd; + + fd = open("/dev/null", O_WRONLY); + if (fd < 0) { + perror("open"); + return -1; + } + + if ((pid = fork()) < 0) { + perror("fork"); + exit(-1); + } else if (pid > 0) + exit(0); + + setsid(); + + if ((pid = fork()) < 0) { + perror("fork"); + exit(-1); + } else if (pid > 0) + exit(0); + + close(0); + close(1); + close(2); + + dup2(fd, 0); + dup2(fd, 1); + dup2(fd, 2); + + return 0; +} + +/********************************************************************** + **********************************************************************/ int main( int argc, @@ -536,7 +581,7 @@ main( boolean_t cache_options = FALSE; char *ignore_guids_file_name = NULL; uint32_t val; - const char * const short_option = "i:f:ed:g:l:L:s:t:a:R:M:U:S:P:NQvVhorcyx"; + const char * const short_option = "i:f:ed:g:l:L:s:t:a:R:M:U:S:P:NBQvVhorcyx"; /* In the array below, the 2nd parameter specified the number @@ -580,6 +625,7 @@ main( #ifdef ENABLE_OSM_CONSOLE_SOCKET { "console-port", 1, NULL, 'C'}, #endif + { "daemon", 0, NULL, 'B'}, { NULL, 0, NULL, 0 } /* Required at the end of the array */ }; @@ -846,6 +892,11 @@ main( printf (" Honor guid2lid file, if possible\n"); break; + case 'B': + opt.daemon = TRUE; + printf (" Daemon mode.\n"); + break; + case 'h': case '?': case ':': @@ -872,6 +923,9 @@ main( block_signals(); + if (opt.daemon) + daemonize(&osm); + complib_init(); status = osm_opensm_init( &osm, &opt ); @@ -996,8 +1050,7 @@ main( osm.mad_pool.mads_out); Exit: - osm_opensm_destroy( &osm ); - + osm_opensm_destroy(&osm); complib_exit(); exit( 0 ); diff --git a/osm/opensm/osm_log.c b/osm/opensm/osm_log.c index 5bb0b9e..a1b1777 100644 --- a/osm/opensm/osm_log.c +++ b/osm/opensm/osm_log.c @@ -286,6 +286,12 @@ open_out_port(IN osm_log_t *p_log) syslog(LOG_NOTICE, "%s log file opened\n", p_log->log_file_name); + if (p_log->daemon) { + dup2(fileno(p_log->out_port), 0); + dup2(fileno(p_log->out_port), 1); + dup2(fileno(p_log->out_port), 2); + } + return (0); } diff --git a/osm/opensm/osm_opensm.c b/osm/opensm/osm_opensm.c index 2344380..8430605 100644 --- a/osm/opensm/osm_opensm.c +++ b/osm/opensm/osm_opensm.c @@ -196,6 +196,9 @@ osm_opensm_init( /* Can't use log macros here, since we're initializing the log */ osm_opensm_construct( p_osm ); + if (p_opt->daemon) + p_osm->log.daemon = 1; + status = osm_log_init_v2( &p_osm->log, p_opt->force_log_flush, p_opt->log_flags, p_opt->log_file, p_opt->log_max_size, p_opt->accum_log_file ); diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c index cbb3549..5f1dae3 100644 --- a/osm/opensm/osm_subnet.c +++ b/osm/opensm/osm_subnet.c @@ -460,6 +460,7 @@ osm_subn_set_default_opt( p_opt->force_heavy_sweep = FALSE; p_opt->log_flags = 0; p_opt->honor_guid2lid_file = FALSE; + p_opt->daemon = FALSE; p_opt->dump_files_dir = getenv("OSM_TMP_DIR"); if (!p_opt->dump_files_dir || !(*p_opt->dump_files_dir)) @@ -1051,6 +1052,10 @@ osm_subn_parse_conf_file( "honor_guid2lid_file", p_key, p_val, &p_opts->honor_guid2lid_file); + __osm_subn_opts_unpack_boolean( + "daemon", + p_key, p_val, &p_opts->daemon); + subn_parse_qos_options("qos", p_key, p_val, &p_opts->qos_options); @@ -1281,6 +1286,14 @@ osm_subn_write_conf_file( p_opts->single_thread ? "TRUE" : "FALSE" ); + fprintf( + opts_file, + "#\n# MISC OPTIONS\n#\n" + "# Daemon mode\n" + "daemon %s\n\n", + p_opts->daemon ? "TRUE" : "FALSE" + ); + fprintf( opts_file, "#\n# DEBUG FEATURES\n#\n" -- 1.5.0.3.401.g27ebd From yaronh at voltaire.com Tue Mar 20 19:21:15 2007 From: yaronh at voltaire.com (Yaron Haviv) Date: Wed, 21 Mar 2007 04:21:15 +0200 Subject: [ofa-general] [RFC] host stack IB-to-IB router support In-Reply-To: <000301c76a6d$ab61ee90$c9d8180a@amr.corp.intel.com> References: <000301c76a6d$ab61ee90$c9d8180a@amr.corp.intel.com> Message-ID: > -----Original Message----- > From: general-bounces at lists.openfabrics.org [mailto:general- > bounces at lists.openfabrics.org] On Behalf Of Sean Hefty > Sent: Monday, March 19, 2007 5:29 PM > To: general at lists.openfabrics.org > Subject: [ofa-general] [RFC] host stack IB-to-IB router support > > Based on previous e-mail threads, this is my plan for implementing IB-to- > IB > router support in the host stack capable of supporting RC communication. > Note > that this work is part of the PathForward project aimed at supporting > early > IB-to-IB router development. It is not intended to define IB router > architecture. > > 1. Extend struct ib_cm_req_param: > > struct ib_cm_req_param { > struct ib_sa_path_rec *primary_path; > struct ib_sa_path_rec *alternate_path; > + struct ib_sa_path_rec *remote_primary_path; > + struct ib_sa_path_rec *remote_alternate_path; > > The remote path information would be valid only if the provided paths > had > a hop_limit > 1, but could also be used to support paths where > reversible = 0. > > 2. Add an ib_remote_sa module. > > This module would be responsible for obtaining remote path information. > Because the architecture does not define how this information is > obtained, > my intent is to encapsulate this functionality into a single module to > simplify out of tree maintenance. Its basic operation is: > > a. Local ib_remote_sa sends query request to remote ib_remote_sa. > b. Remote ib_remote_sa queries its local SA. > c. Remote ib_remote_sa sends query response to local ib_remote_sa. > > I expect the ib_remote_sa implementation to be a temporary solution > only. > It will layer above either the ib_mad or ib_cm services, whichever ends > up being easier. > > 3. Extend the rdma_cm route resolution to include remote route lookup. > Sean, I believe a much simpler and more generalized solution would be to imitate the behavior of IP routing rather than using the "remote sa" concept I understand that you tried to work within the iba spec boundaries (bugs), but those can change if needed An example would be: 1. client wants to open connection to DGID xx:yy (xx=Gid prefix) 2. CM/SA layers check if xx=local subnet, if not use a local routing table to map DGID to router DGID(xx) (can even lavarage on Linux IPv6 tables I assume) 3. issue path query to the router DGID 4. message is sent to destination 5. destination router either have the DGID->DLID/Path cached or if not handles an exception and lookup the path(DGID) (just like any IP router) 6. router send to destination 7. destination conducts a reverse lookup ... You can notice the above is just like the routing process in IP/Eth A path is a local attribute of a subnet/L2, much like a MAC address, a remote node shouldn't have visibility into L2 attributes of the remote subnet, such a model is not robust, not going to sustain failures very well, and can't scale We need to learn from the IP world and generations of experience rather than invent our own mechanisms Yaron > - Sean > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general From abhinav.vishnu at gmail.com Tue Mar 20 19:23:24 2007 From: abhinav.vishnu at gmail.com (Abhinav Vishnu) Date: Tue, 20 Mar 2007 22:23:24 -0400 Subject: [ofa-general] [0] Abort: Not enough port is in active state In-Reply-To: <1c16cdf90703200052g729add3x2d767378697f94f2@mail.gmail.com> References: <1c16cdf90703200052g729add3x2d767378697f94f2@mail.gmail.com> Message-ID: <87aa148d0703201923l2128872dk680446ffd2ad1937@mail.gmail.com> Hi Chev, Thanks for using MVAPICH2 and reporting the problem to us. We were wondering the MPI benchmark you are using and the configuration of the nodes which you are using for running the MPI benchmark. Once we have these details, we should be able to help you further. Thanks, :- Abhinav On 3/20/07, Chevchenkovic Chevchenkovic wrote: > > Hi, > I am getting the error: > [0] Abort: Not enough port is in active state > while using mvapich2-0.9.8. > I do not get similar error on using mvapich-0.9.7-mlx2.1.0 > Any specific reasons for this behaviour? > How do i get rid of this error? > Awaiting your reply, > -Chev > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > -- Abhinav Vishnu Graduate Student Computer Science and Engineering The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgunthorpe at obsidianresearch.com Tue Mar 20 21:31:52 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 20 Mar 2007 22:31:52 -0600 Subject: [ofa-general] [RFC] host stack IB-to-IB router support In-Reply-To: References: <000301c76a6d$ab61ee90$c9d8180a@amr.corp.intel.com> Message-ID: <20070321043152.GA20505@obsidianresearch.com> On Wed, Mar 21, 2007 at 04:21:15AM +0200, Yaron Haviv wrote: > I believe a much simpler and more generalized solution would be to > imitate the behavior of IP routing rather than using the "remote sa" > concept We talked about this at some length a while ago on this list. In short, we started with the position you outlined until it was discovered that the L2 address checking described by 9.6.1.5.1 C9-57 makes it unworkable. This existing IB behavior is sufficiently un-IP-like that existing IP solutions do not work. (The parallel to ethernet would be if each TCP connection checked that the SMAC in incoming frames matched some pre-determined value.) Sean is working on one of the simpler solutions that considers the effects of C9-57, which is to allow the active side to control all 4 path records that are involved. Notice this is similar to how IB CM works within a subnet, where the 2 required paths are selected by the active side and the passive side does no queries. This already is different than IP which would have the passive side doing ARP. Again this behavior in the spec is fundamentally required by the restriction in C9-57. I agree with you that this is not a good place to be, but with current hardware I think we are stuck with it.. Regards, Jason From mst at dev.mellanox.co.il Tue Mar 20 22:18:03 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 21 Mar 2007 07:18:03 +0200 Subject: [ofa-general] Re: [PATCH for-2.6.21] mthca: QP reset race fixup In-Reply-To: References: <20070320133932.GC18162@mellanox.co.il> Message-ID: <20070321051803.GE3409@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH for-2.6.21] mthca: QP reset race fixup > > > This fixes openfabrics bugzilla 394: > > - Use common EQ for command interface and async events > > - Clean CQ after moving QP to reset > > This is a little terse -- an ideal changelog entry would explain what > the bug is, what is being changed to fix it, and why that fixes the issue. > I'll try to fix it up... > > > This also fixes a potential crash in ipoib cm: > > - sync with completion event ISR after QP is reset > > to prevent ULP from getting and using QP pointer and/or WRID > > after they are freed. > > > We can rip the lines that sync with MTHCA_EQ_COMP out if you think > > the issue needs to be dealt with in some other way - and I agree > > this is only good for ULPs that do all their polling > > inside the ISR, but at least this covers all in-kernel code. > > I don't really like this change, although maybe it's the right thing > to do. But can you explain what IPoIB CM is doing that would cause it > to run into trouble? I'd like to see if there's a better solution. > It just seems strange to me to add the assumption that destroying a QP > makes sure that all running CQ callbacks are done. Look at ipoib_cm_stale_task: + ib_destroy_cm_id(p->id); + ib_destroy_qp(p->qp); and then + if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) { + p = wc->qp->qp_context; This wc->qp->qp_context might use QP after free. > > If we change to NAPI (so that CQs are polled asynchronously) does that > readd the same bug? Hmm. Yes. In hindsight, it was probably better to put qp_context directly in ib_wc instead of the qp pointer. Then ipoib could set some flag in the structure pointed to by qp_context. My guess this would be too big a change for 2.6.21. What do you think? -- MST From halr at voltaire.com Tue Mar 20 23:24:07 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Mar 2007 01:24:07 -0500 Subject: [ofa-general] Re: IsSMdisabled and user_mad.c Message-ID: <1174458246.6493.49979.camel@hal.voltaire.com> Sorry if you get this twice; somehow the original cc address was munged... On Tue, 2007-03-20 at 19:19, Roland Dreier wrote: > > Currently user_mad.c does not currently support the IsSMdisabled > > capability mask bit in PortInfo attribute. I propose adding support for > > a per port issmdisabled similar to issm in user_mad.c. I also think an > > API change may not be necessary as applications can deal with the lack > > of this file gracefully. If this sounds acceptable, I will work on a > > patch for this. Thanks. > > I guess it's OK, although I would also like to know how you plan to > handle the interaction between IsSM and IsSMDisabled -- eg what if a > process opens issm0 and then another process tries to open > issmdisabled0? Or conversely if issmdisabled0 is open, what happens > when someone opens issm0? I would think those are error cases. Does that make sense ? If so, what error makes most sense ? EINVAL or something else ? -- Hal > - R. From mst at dev.mellanox.co.il Tue Mar 20 22:26:43 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 21 Mar 2007 07:26:43 +0200 Subject: [ofa-general] Re: [PATCH for-2.6.21] mthca: QP reset race fixup In-Reply-To: <20070321051803.GE3409@mellanox.co.il> References: <20070320133932.GC18162@mellanox.co.il> <20070321051803.GE3409@mellanox.co.il> Message-ID: <20070321052643.GF3409@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: Re: [PATCH for-2.6.21] mthca: QP reset race fixup > > > Quoting Roland Dreier : > > Subject: Re: [PATCH for-2.6.21] mthca: QP reset race fixup > > > > > This fixes openfabrics bugzilla 394: > > > - Use common EQ for command interface and async events > > > - Clean CQ after moving QP to reset > > > > This is a little terse -- an ideal changelog entry would explain what > > the bug is, what is being changed to fix it, and why that fixes the issue. > > I'll try to fix it up... > > > > > This also fixes a potential crash in ipoib cm: > > > - sync with completion event ISR after QP is reset > > > to prevent ULP from getting and using QP pointer and/or WRID > > > after they are freed. > > > > > We can rip the lines that sync with MTHCA_EQ_COMP out if you think > > > the issue needs to be dealt with in some other way - and I agree > > > this is only good for ULPs that do all their polling > > > inside the ISR, but at least this covers all in-kernel code. > > > > I don't really like this change, although maybe it's the right thing > > to do. But can you explain what IPoIB CM is doing that would cause it > > to run into trouble? I'd like to see if there's a better solution. > > It just seems strange to me to add the assumption that destroying a QP > > makes sure that all running CQ callbacks are done. > > Look at ipoib_cm_stale_task: > + ib_destroy_cm_id(p->id); > + ib_destroy_qp(p->qp); > > and then > + if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) { > + p = wc->qp->qp_context; > > This wc->qp->qp_context might use QP after free. > > > > > If we change to NAPI (so that CQs are polled asynchronously) does that > > readd the same bug? > > Hmm. Yes. > In hindsight, it was probably better to put qp_context directly in ib_wc > instead of the qp pointer. > > Then ipoib could set some flag in the structure pointed to by qp_context. > > My guess this would be too big a change for 2.6.21. What do you think? To clarify: syncing with the completion IRQ will be needed anyway in for non-NAPI mode. For NAPI we will be able to sync with NAPI after we set the flag in qp_context. -- MST From sean.hefty at intel.com Tue Mar 20 23:52:13 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 20 Mar 2007 23:52:13 -0700 Subject: [ofa-general] [PATCH] IB/umad: fix GRH handling In-Reply-To: <1174400288.4684.463519.camel@hal.voltaire.com> Message-ID: <000001c76b85$74adfb50$18fd070a@amr.corp.intel.com> >> Unfortunately, at least opensm cannot respond to SA queries issued from a >> remote subnet. I'm not sure how much work this would take to fix, or if >> other SAs have this issue. Hal briefly looked at the problems, > >FWIW, I'll be looking some more at these again. I think the following patch corrects the GRH handling issues in ib_umad. (Tested loading of ib_umad module only, and not against openSM.) If this looks right, I'll add it to my rdma-dev.git ib_router branch Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index c069ebe..7774cf5 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -231,12 +231,17 @@ static void recv_handler(struct ib_mad_agent *agent, packet->mad.hdr.path_bits = mad_recv_wc->wc->dlid_path_bits; packet->mad.hdr.grh_present = !!(mad_recv_wc->wc->wc_flags & IB_WC_GRH); if (packet->mad.hdr.grh_present) { - /* XXX parse GRH */ - packet->mad.hdr.gid_index = 0; - packet->mad.hdr.hop_limit = 0; - packet->mad.hdr.traffic_class = 0; - memset(packet->mad.hdr.gid, 0, 16); - packet->mad.hdr.flow_label = 0; + struct ib_ah_attr ah_attr; + + ib_init_ah_from_wc(agent->device, agent->port_num, + mad_recv_wc->wc, mad_recv_wc->recv_buf.grh, + &ah_attr); + + packet->mad.hdr.gid_index = ah_attr.grh.sgid_index; + packet->mad.hdr.hop_limit = ah_attr.grh.hop_limit; + packet->mad.hdr.traffic_class = ah_attr.grh.traffic_class; + memcpy(packet->mad.hdr.gid, &ah_attr.grh.dgid, 16); + packet->mad.hdr.flow_label = cpu_to_be32(ah_attr.grh.flow_label); } if (queue_packet(file, agent, packet)) @@ -473,6 +478,7 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf, if (packet->mad.hdr.grh_present) { ah_attr.ah_flags = IB_AH_GRH; memcpy(ah_attr.grh.dgid.raw, packet->mad.hdr.gid, 16); + ah_attr.grh.sgid_index = packet->mad.hdr.gid_index; ah_attr.grh.flow_label = be32_to_cpu(packet->mad.hdr.flow_label); ah_attr.grh.hop_limit = packet->mad.hdr.hop_limit; ah_attr.grh.traffic_class = packet->mad.hdr.traffic_class; From dotanb at dev.mellanox.co.il Wed Mar 21 00:35:48 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Wed, 21 Mar 2007 09:35:48 +0200 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count to completion event channels In-Reply-To: References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> <45FFE47B.8000408@dev.mellanox.co.il> Message-ID: <4600E054.90504@dev.mellanox.co.il> Roland Dreier wrote: > It seems that userspace is the only place that has a chance of making > this work. We just need to get the locking correct, and at first > glance it looks possible to me. If you don't see a way to do it then > I'll work on it in the next day or so. > I believe that we can avoid problems only if we will have a lock in the ibv_context that handles the completion channels. Otherwise, we will always have a problem in the following scenario: create_channel thread 1 thread 2 --------- ----------- in create_cq: is (channel) destroy channel get channel lock (seg fault) If you think about a different solution i will be happy to ear about it. If you think that adding a lock to the ibv_context is good enough i will send you a patch that implements this solution. thanks Dotan From mst at dev.mellanox.co.il Wed Mar 21 01:12:33 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 21 Mar 2007 10:12:33 +0200 Subject: [ofa-general] Re: [PATCH for-2.6.21] mthca: QP reset race fixup In-Reply-To: <20070321052643.GF3409@mellanox.co.il> References: <20070320133932.GC18162@mellanox.co.il> <20070321051803.GE3409@mellanox.co.il> <20070321052643.GF3409@mellanox.co.il> Message-ID: <20070321081233.GC20583@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: Re: [PATCH for-2.6.21] mthca: QP reset race fixup > > > Quoting Michael S. Tsirkin : > > Subject: Re: [PATCH for-2.6.21] mthca: QP reset race fixup > > > > > Quoting Roland Dreier : > > > Subject: Re: [PATCH for-2.6.21] mthca: QP reset race fixup > > > > > > > This fixes openfabrics bugzilla 394: > > > > - Use common EQ for command interface and async events > > > > - Clean CQ after moving QP to reset > > > > > > This is a little terse -- an ideal changelog entry would explain what > > > the bug is, what is being changed to fix it, and why that fixes the issue. > > > I'll try to fix it up... > > > > > > > This also fixes a potential crash in ipoib cm: > > > > - sync with completion event ISR after QP is reset > > > > to prevent ULP from getting and using QP pointer and/or WRID > > > > after they are freed. > > > > > > > We can rip the lines that sync with MTHCA_EQ_COMP out if you think > > > > the issue needs to be dealt with in some other way - and I agree > > > > this is only good for ULPs that do all their polling > > > > inside the ISR, but at least this covers all in-kernel code. > > > > > > I don't really like this change, although maybe it's the right thing > > > to do. But can you explain what IPoIB CM is doing that would cause it > > > to run into trouble? I'd like to see if there's a better solution. > > > It just seems strange to me to add the assumption that destroying a QP > > > makes sure that all running CQ callbacks are done. > > > > Look at ipoib_cm_stale_task: > > + ib_destroy_cm_id(p->id); > > + ib_destroy_qp(p->qp); > > > > and then > > + if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) { > > + p = wc->qp->qp_context; > > > > This wc->qp->qp_context might use QP after free. > > > > > > > > If we change to NAPI (so that CQs are polled asynchronously) does that > > > readd the same bug? > > > > Hmm. Yes. > > In hindsight, it was probably better to put qp_context directly in ib_wc > > instead of the qp pointer. > > > > Then ipoib could set some flag in the structure pointed to by qp_context. > > > > My guess this would be too big a change for 2.6.21. What do you think? > > To clarify: syncing with the completion IRQ will be needed anyway in > for non-NAPI mode. > > For NAPI we will be able to sync with NAPI after we set the flag > in qp_context. To sync with NAPI, we can take the poll_lock. I imagine something like this before destroying QP spin_lock(&dev->npinfo->poll_lock); p->dead = 1; spin_unlock(&dev->npinfo->poll_lock); and after poll cq: if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) { p = wc->qp_context; if (!p->dead && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { } } -- MST From vlad at lists.openfabrics.org Wed Mar 21 02:34:54 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Wed, 21 Mar 2007 02:34:54 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070321-0200 daily build status Message-ID: <20070321093454.B4BDEE607F1@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.20 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.19 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.12 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.14 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.13 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From mst at dev.mellanox.co.il Wed Mar 21 02:51:01 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 21 Mar 2007 11:51:01 +0200 Subject: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <45F879B6.2040904@ichips.intel.com> References: <000001c765c0$7d3bdd70$8698070a@amr.corp.intel.com> <20070314051158.GA7997@mellanox.co.il> <45F879B6.2040904@ichips.intel.com> Message-ID: <20070321095101.GH20583@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes > > >>Vlad, please pull from: > >> > >> git://git.openfabrics.org/~shefty/ofed_1_2.git ofed_1_2 > >> > >>This should add some necessary fixes to the OFED code: > >> > >> RDMA/ucma: avoid sending reject if backlog is full > >> RDMA/cma: Request reversible paths only > >> IB/cm: remove broken MRA timeout patch > > > > > >Sean, before applying this, please discuss the MRA timeout patch > >on the general list with Ishai. > > > >Can you fix the patch instead of removing it? > >It helps him work-around bugs in his SRP target. > > I've updated the patches in my ofed tree. The version of the tree that I > was originally working on did not have Ishai's changes applied to them, and > I didn't realize that they were merged into OFED. (The broken MRA timeout > patch I was referring to was the one before Ishai's.) So, I ended up > creating a different replacement patch that: increases the default timeout, > exports the timeout as a module parameter, and fixes an issue setting the > SIDR REQ timeout. (This is the version of the patch that I will request > for 2.6.22.) > > - Sean Could you post the updated patch on the general list please, and Cc Ishai? I guess it would not hurt for him to review/test it. -- MST From halr at voltaire.com Wed Mar 21 05:03:48 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Mar 2007 07:03:48 -0500 Subject: [ofa-general] Re: [PATCH TRIVIAL] opensm/osmtest: remove unused show_menu() In-Reply-To: <20070321012702.GB20990@sashak.voltaire.com> References: <20070321012355.GA20990@sashak.voltaire.com> <20070321012702.GB20990@sashak.voltaire.com> Message-ID: <1174478627.6493.71907.camel@hal.voltaire.com> On Tue, 2007-03-20 at 20:27, Sasha Khapyorsky wrote: > Remove unused show_menu() func from opensm and osmtest. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to both master and ofed_1_2). -- Hal From mst at dev.mellanox.co.il Wed Mar 21 05:34:34 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 21 Mar 2007 14:34:34 +0200 Subject: [ofa-general] Re: [PATCH for-2.6.21] mthca: QP reset race fixup In-Reply-To: <20070321051803.GE3409@mellanox.co.il> References: <20070320133932.GC18162@mellanox.co.il> <20070321051803.GE3409@mellanox.co.il> Message-ID: <20070321123434.GJ20583@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: Re: [PATCH for-2.6.21] mthca: QP reset race fixup > > In hindsight, it was probably better to put qp_context directly in ib_wc > instead of the qp pointer. > > Then ipoib could set some flag in the structure pointed to by qp_context. > > My guess this would be too big a change for 2.6.21. What do you think? Here's how a patch to make the use after free fixable for ULPs that do polling out of interrupt context (e.g. IPoIB with NAPI) would look like. What do you think? Warning: untested. Signed-off-by: Michael S. Tsirkin --- diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 765589f..4dc769a 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -420,7 +420,8 @@ struct ib_wc { enum ib_wc_opcode opcode; u32 vendor_err; u32 byte_len; - struct ib_qp *qp; + void *qp_context; + u32 qp_num; __be32 imm_data; u32 src_qp; int wc_flags; diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 13efd41..7bad7ec 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -653,7 +653,8 @@ static void build_smp_wc(struct ib_qp *qp, wc->pkey_index = pkey_index; wc->byte_len = sizeof(struct ib_mad) + sizeof(struct ib_grh); wc->src_qp = IB_QP0; - wc->qp = qp; + wc->qp_num = qp->qp_num; + wc->qp_context = qp->qp_context; wc->slid = slid; wc->sl = 0; wc->dlid_path_bits = 0; diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 4fd75af..b90e4a6 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -935,7 +935,7 @@ ssize_t ib_uverbs_poll_cq(struct ib_uverbs_file *file, resp->wc[i].vendor_err = wc[i].vendor_err; resp->wc[i].byte_len = wc[i].byte_len; resp->wc[i].imm_data = (__u32 __force) wc[i].imm_data; - resp->wc[i].qp_num = wc[i].qp->qp_num; + resp->wc[i].qp_num = wc[i].qp_num; resp->wc[i].src_qp = wc[i].src_qp; resp->wc[i].wc_flags = wc[i].wc_flags; resp->wc[i].pkey_index = wc[i].pkey_index; diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c b/drivers/infiniband/hw/amso1100/c2_cq.c index 5175c99..835f7ed 100644 --- a/drivers/infiniband/hw/amso1100/c2_cq.c +++ b/drivers/infiniband/hw/amso1100/c2_cq.c @@ -153,7 +153,8 @@ static inline int c2_poll_one(struct c2_dev *c2dev, entry->status = c2_cqe_status_to_openib(c2_wr_get_result(ce)); entry->wr_id = ce->hdr.context; - entry->qp = &qp->ibqp; + entry->qp_num = qp->ibqp.qp_num; + entry->qp_context = qp->ibqp.qp_context; entry->wc_flags = 0; entry->slid = 0; entry->sl = 0; diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c index 08d3f89..9c4e579 100644 --- a/drivers/infiniband/hw/ehca/ehca_reqs.c +++ b/drivers/infiniband/hw/ehca/ehca_reqs.c @@ -579,7 +579,8 @@ poll_cq_one_read_cqe: } else wc->status = IB_WC_SUCCESS; - wc->qp = NULL; + wc->qp_num = cqe->local_qp_number; + wc->qp_context = NULL; wc->byte_len = cqe->nr_bytes_transferred; wc->pkey_index = cqe->pkey_index; wc->slid = cqe->rlid; diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c b/drivers/infiniband/hw/ipath/ipath_qp.c index 64f07b1..a00230f 100644 --- a/drivers/infiniband/hw/ipath/ipath_qp.c +++ b/drivers/infiniband/hw/ipath/ipath_qp.c @@ -379,7 +379,8 @@ void ipath_error_qp(struct ipath_qp *qp, enum ib_wc_status err) wc.vendor_err = 0; wc.byte_len = 0; wc.imm_data = 0; - wc.qp = &qp->ibqp; + wc.qp_num = qp->ibqp.qp_num; + wc.qp_context = qp->ibqp.qp_context; wc.src_qp = 0; wc.wc_flags = 0; wc.pkey_index = 0; diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c index 5ff20cb..74db045 100644 --- a/drivers/infiniband/hw/ipath/ipath_rc.c +++ b/drivers/infiniband/hw/ipath/ipath_rc.c @@ -702,7 +702,8 @@ void ipath_restart_rc(struct ipath_qp *qp, u32 psn, struct ib_wc *wc) wc->opcode = ib_ipath_wc_opcode[wqe->wr.opcode]; wc->vendor_err = 0; wc->byte_len = 0; - wc->qp = &qp->ibqp; + wc->qp_num = qp->ibqp.qp_num; + wc->qp_context = qp->ibqp.qp_context; wc->src_qp = qp->remote_qpn; wc->pkey_index = 0; wc->slid = qp->remote_ah_attr.dlid; diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c index e86cb17..ffe70ed 100644 --- a/drivers/infiniband/hw/ipath/ipath_ruc.c +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c @@ -137,7 +137,8 @@ bad_lkey: wc.vendor_err = 0; wc.byte_len = 0; wc.imm_data = 0; - wc.qp = &qp->ibqp; + wc.qp_num = qp->ibqp.qp_num; + wc.qp_context = qp->ibqp.qp_context; wc.src_qp = 0; wc.wc_flags = 0; wc.pkey_index = 0; diff --git a/drivers/infiniband/hw/ipath/ipath_uc.c b/drivers/infiniband/hw/ipath/ipath_uc.c index 325d663..a03f538 100644 --- a/drivers/infiniband/hw/ipath/ipath_uc.c +++ b/drivers/infiniband/hw/ipath/ipath_uc.c @@ -49,7 +49,8 @@ static void complete_last_send(struct ipath_qp *qp, struct ipath_swqe *wqe, wc->opcode = ib_ipath_wc_opcode[wqe->wr.opcode]; wc->vendor_err = 0; wc->byte_len = wqe->length; - wc->qp = &qp->ibqp; + wc->qp_num = qp->ibqp.qp_num; + wc->qp_context = qp->ibqp.qp_context; wc->src_qp = qp->remote_qpn; wc->pkey_index = 0; wc->slid = qp->remote_ah_attr.dlid; diff --git a/drivers/infiniband/hw/ipath/ipath_ud.c b/drivers/infiniband/hw/ipath/ipath_ud.c index 9a3e546..59fec70 100644 --- a/drivers/infiniband/hw/ipath/ipath_ud.c +++ b/drivers/infiniband/hw/ipath/ipath_ud.c @@ -66,7 +66,8 @@ bad_lkey: wc.vendor_err = 0; wc.byte_len = 0; wc.imm_data = 0; - wc.qp = &qp->ibqp; + wc.qp_num = qp->ibqp.qp_num; + wc.qp_context = qp->ibqp.qp_context; wc.src_qp = 0; wc.wc_flags = 0; wc.pkey_index = 0; diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c index 7131446..4374c67 100644 --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -1858,7 +1858,7 @@ int mthca_MAD_IFC(struct mthca_dev *dev, int ignore_mkey, int ignore_bkey, memset(inbox + 256, 0, 256); - MTHCA_PUT(inbox, in_wc->qp->qp_num, MAD_IFC_MY_QPN_OFFSET); + MTHCA_PUT(inbox, in_wc->qp_num, MAD_IFC_MY_QPN_OFFSET); MTHCA_PUT(inbox, in_wc->src_qp, MAD_IFC_RQPN_OFFSET); val = in_wc->sl << 4; diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c index e3c774b..808850a 100644 --- a/drivers/infiniband/hw/mthca/mthca_cq.c +++ b/drivers/infiniband/hw/mthca/mthca_cq.c @@ -538,7 +538,8 @@ static inline int mthca_poll_one(struct mthca_dev *dev, } } - entry->qp = &(*cur_qp)->ibqp; + entry->qp_num = (*cur_qp)->ibqp.qp_num; + entry->qp_context = (*cur_qp)->ibqp.qp_context; if (is_send) { wq = &(*cur_qp)->sq; diff --git a/drivers/infiniband/hw/cxgb3/iwch_cq.c b/drivers/infiniband/hw/cxgb3/iwch_cq.c index d7624c1..5a77dcc 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cq.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cq.c @@ -79,7 +79,8 @@ static int iwch_poll_cq_one(struct iwch_dev *rhp, struct iwch_cq *chp, ret = 1; wc->wr_id = cookie; - wc->qp = &qhp->ibqp; + wc->qp_num = qhp->ibqp.qp_num; + wc->qp_context = qhp->ibqp.qp_context; wc->vendor_err = CQE_STATUS(cqe); PDBG("%s qpid 0x%x type %d opcode %d status 0x%x wrid hi 0x%x " -- MST From Diego at Mellanox.com Wed Mar 21 06:00:19 2007 From: Diego at Mellanox.com (Diego Crupnicoff) Date: Wed, 21 Mar 2007 06:00:19 -0700 Subject: [ofa-general] [RFC] host stack IB-to-IB router support In-Reply-To: <20070321043152.GA20505@obsidianresearch.com> Message-ID: > -----Original Message----- > From: general-bounces at lists.openfabrics.org [mailto:general- > bounces at lists.openfabrics.org] On Behalf Of Jason Gunthorpe > Sent: Wednesday, March 21, 2007 1:32 AM > To: Yaron Haviv > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] [RFC] host stack IB-to-IB router support > > On Wed, Mar 21, 2007 at 04:21:15AM +0200, Yaron Haviv wrote: > > > I believe a much simpler and more generalized solution would be to > > imitate the behavior of IP routing rather than using the "remote sa" > > concept > I completely agree with Yaron. > We talked about this at some length a while ago on this list. In > short, we started with the position you outlined until it was > discovered that the L2 address checking described by 9.6.1.5.1 C9-57 > makes it unworkable. This existing IB behavior is sufficiently > un-IP-like that existing IP solutions do not work. (The parallel to > ethernet would be if each TCP connection checked that the SMAC in > incoming frames matched some pre-determined value.) If C9-57 is a problem then that can be addressed by the IBTA. BTW, the IBTA is going to present on IB routers spec during the upcoming OFA workshop in Sonoma. That should be a good opportunity for non-IBTA members to raise their concerns and potentially find ways to become involved in the router architecture specification process. > > Sean is working on one of the simpler solutions that considers the > effects of C9-57, which is to allow the active side to control all 4 > path records that are involved. > > Notice this is similar to how IB CM works within a subnet, where the 2 > required paths are selected by the active side and the passive side > does no queries. This already is different than IP which would have > the passive side doing ARP. Again this behavior in the spec is > fundamentally required by the restriction in C9-57. For the IB spec, the SM and SA are subnet local entities. The remote SA concept is at odds with the spirit of the IB spec. I would say that changing that is a much more significant departure from the spec than dealing with C9-57 if necessary. > > I agree with you that this is not a good place to be, but with current > hardware I think we are stuck with it.. I do not think so (with my ib hw asic vendor hat on). > > Regards, > Jason > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general From halr at voltaire.com Wed Mar 21 07:12:51 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Mar 2007 09:12:51 -0500 Subject: [ofa-general] Re: [PATCH] opensm: complib_init() after block_signals() In-Reply-To: <20070321012355.GA20990@sashak.voltaire.com> References: <20070321012355.GA20990@sashak.voltaire.com> Message-ID: <1174486370.6493.80071.camel@hal.voltaire.com> On Tue, 2007-03-20 at 20:23, Sasha Khapyorsky wrote: > Move complib_init() call where timer thread is created after > block_signals(), so timer thread will not run signal handlers. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to master only). Let me know if this should go on ofed_1_2 as well. -- Hal From halr at voltaire.com Wed Mar 21 07:22:30 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Mar 2007 09:22:30 -0500 Subject: [ofa-general] [RFC] host stack IB-to-IB router support In-Reply-To: References: Message-ID: <1174486948.6493.80653.camel@hal.voltaire.com> On Wed, 2007-03-21 at 08:00, Diego Crupnicoff wrote: > > -----Original Message----- > > From: general-bounces at lists.openfabrics.org [mailto:general- > > bounces at lists.openfabrics.org] On Behalf Of Jason Gunthorpe > > Sent: Wednesday, March 21, 2007 1:32 AM > > To: Yaron Haviv > > Cc: general at lists.openfabrics.org > > Subject: Re: [ofa-general] [RFC] host stack IB-to-IB router support > > > > On Wed, Mar 21, 2007 at 04:21:15AM +0200, Yaron Haviv wrote: > > > > > I believe a much simpler and more generalized solution would be to > > > imitate the behavior of IP routing rather than using the "remote sa" > > > concept > > > > I completely agree with Yaron. > > > We talked about this at some length a while ago on this list. In > > short, we started with the position you outlined until it was > > discovered that the L2 address checking described by 9.6.1.5.1 C9-57 > > makes it unworkable. This existing IB behavior is sufficiently > > un-IP-like that existing IP solutions do not work. (The parallel to > > ethernet would be if each TCP connection checked that the SMAC in > > incoming frames matched some pre-determined value.) > > If C9-57 is a problem then that can be addressed by the IBTA. > BTW, the IBTA is going to present on IB routers spec during the upcoming > OFA workshop in Sonoma. That should be a good opportunity for non-IBTA > members to raise their concerns and potentially find ways to become > involved in the router architecture specification process. > > > > > Sean is working on one of the simpler solutions that considers the > > effects of C9-57, which is to allow the active side to control all 4 > > path records that are involved. > > > > Notice this is similar to how IB CM works within a subnet, where the 2 > > required paths are selected by the active side and the passive side > > does no queries. This already is different than IP which would have > > the passive side doing ARP. Again this behavior in the spec is > > fundamentally required by the restriction in C9-57. > > For the IB spec, the SM and SA are subnet local entities. The remote SA > concept is at odds with the spirit of the IB spec. I would say that > changing that is a much more significant departure from the spec than > dealing with C9-57 if necessary. The SM is subnet local but it is unclear about the SA. Currently, if one knows the GID of the SA, it can be contacted remotely. To support routing, there are a number of things that SA supports that I think will need to be exchanged across SAs so in this sense it may not be that far off in that it separates this aspect which can evolve to wherever it ends up in the architecture. It is a temporary measure to get by with the lack of specificity in the routing space. As you can see, there is an immediate need within OpenFabrics to resolve these issues or at least take an experimental stab at them so more IB router protoyping can be done. -- Hal > > I agree with you that this is not a good place to be, but with current > > hardware I think we are stuck with it.. > > I do not think so (with my ib hw asic vendor hat on). > > > > > Regards, > > Jason > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib- > > general > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at dev.mellanox.co.il Wed Mar 21 06:45:05 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 21 Mar 2007 15:45:05 +0200 Subject: [ofa-general] [PATCH] IB/ipoib: fix thinko in packet length checks In-Reply-To: <20070321132015.3FE95E6080F@openfabrics.org> References: <20070321132015.3FE95E6080F@openfabrics.org> Message-ID: <20070321134505.GA23221@mellanox.co.il> Packet length checks in ipoib are broken: we add 4 bytes (IPoIB encapsulation header), not 20 bytes (hardware address length) to each packet. As a result, multicast is broken if message size is 2048 bytes. This patch fixes bug 418 in openfabrics bugzilla, submitted by Scott Weitzenkamp. Signed-off-by: Michael S. Tsirkin --- Roland, pls consider for 2.6.21. diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 3484e8b..ed3eadc 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -452,7 +452,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ skb->len, tx->mtu); ++priv->stats.tx_dropped; ++priv->stats.tx_errors; - ipoib_cm_skb_too_long(dev, skb, tx->mtu - INFINIBAND_ALEN); + ipoib_cm_skb_too_long(dev, skb, tx->mtu - IPOIB_ENCAP_LEN); return; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index f2aa923..ba0ee5c 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -328,9 +328,9 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_tx_buf *tx_req; u64 addr; - if (unlikely(skb->len > priv->mcast_mtu + INFINIBAND_ALEN)) { + if (unlikely(skb->len > priv->mcast_mtu + IPOIB_ENCAP_LEN)) { ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", - skb->len, priv->mcast_mtu + INFINIBAND_ALEN); + skb->len, priv->mcast_mtu + IPOIB_ENCAP_LEN); ++priv->stats.tx_dropped; ++priv->stats.tx_errors; ipoib_cm_skb_too_long(dev, skb, priv->mcast_mtu); -- MST From weikuan.yu at gmail.com Wed Mar 21 08:00:06 2007 From: weikuan.yu at gmail.com (Weikuan Yu) Date: Wed, 21 Mar 2007 11:00:06 -0400 Subject: [ofa-general] HotI 2007 Call for Papers -- 4th call. Deadline March 31st is approaching Message-ID: <46014876.3030308@gmail.com> Deadline March 31st is approaching. -------------------------------------------------------------------- Apologies if you received multiple copies of this posting. Please feel free to distribute it to those who might be interested. -------------------------------------------------------------------- Hot Interconnects 15 IEEE Symposium on High-Performance Interconnects August 22-24, 2007 Stanford University Palo Alto, California, USA Hot Interconnects is the premier international forum for researchers and developers of state-of-the-art hardware and software architectures and implementations for interconnection networks of all scales, ranging from on-chip processor-memory interconnects to wide-area networks. This yearly conference is very well attended by leaders in industry and academia. The atmosphere provides for a wealth of opportunities to interact with individuals at the forefront of this field. Themes include cross-cutting issues spanning computer systems, networking technologies, and communication protocols. This conference is directed particularly at new and exciting technology and product innovations in these areas. Contributions should focus on real experimental systems, prototypes, or leading-edge products and their performance evaluation. In addition to those subscribing to the main theme of the conference, contributions are also solicited in the topics listed below. * Novel and innovative interconnect architectures * Multi-core processor interconnects * System-on-Chip Interconnects * Advanced chip-to-chip communication technologies * Optical interconnects * Protocol and interfaces for interprocessor communication * Survivability and fault-tolerance of interconnects * High-speed packet processing engines and network processors * System and storage area network architectures and protocols * High-performance host-network interface architectures * High-bandwidth and low-latency I/O * Tb/s switching and routing technologies * Innovative architectures for supporting collective communication * Novel communication architectures to support grid computing Submission Guideline o Submission deadline: March 31, 2007 o Notification of acceptance: May 15, 2007 o Papers need sufficient technical detail to judge quality and suitability for presentation. o Submit title, author, abstract, and full paper (six pages, double-column, IEEE format). o Papers should be submitted electronically at the specified link location found on http://www.hoti.org o For further information please see http://www.hoti.org/hoti15/cfp.html About the Conference - Conference held at the William Hewlett Teaching Center at Stanford University. - Papers selected will be published in proceedings by the IEEE Computer Society. - Presentations are 30-minute talks in a single-track format. - Online information at http://www.hoti.org GENERAL CO-CHAIRS * John W. Lockwood, Washington University in St. Louis * Fabrizio Petrini, Pacific Northwest National Laboratory TECHNICAL CO-CHAIRS * Ron Brightwell, Sandia National Laboratories * Dhabaleswar (DK) Panda, The Ohio State University LOCAL ARRANGEMENTS CHAIR * Songkrant Muneenaem, Washington University in St. Louis PANEL CHAIR * Daniel Pitt, Santa Clara University PUBLICITY CO-CHAIRS * Weikuan Yu, Oak Ridge National Laboratory PUBLICATION CHAIR * Luca Valcarenghi, Scuola Superiore Sant'Anna FINANCE CHAIR * Herzel Ashkenazi, Xilinx TUTORIAL CO-CHAIRS - TBA REGISTRATION CHAIR * Songkrant Muneenaem, Washington University in St. Louis Webmaster * Liz Rogers, LRD Group Steering Committee o Allen Baum, Intel o Lily Jow, Hewlett Packard o Mark Laubach, Broadband Physics o John Lockwood, Stanford University o Daniel Pitt, Santa Clara University Technical Program Committee * Dennis Abts Cray, Inc. * Adnan Aziz University of Texas, Austin * Alan Benner IBM * Keren Bergman Columbia University * Andrea Bianco Politecnico di Torino * Piero Castoldi Scuola Superiore Sant'Anna * Sarang Dharmapurikar Nuova Systems * Hans Eberle Sun Microsystems Laboratories * Wu-chun Feng Virginia Tech * Juan Fernandez University of Murcia * Ada Gavrilovska Georgia Institute of Technology * Paolo Giaccone Politecnico di Torino * Mitchell Gusat IBM Zurich Research Laboratory * Ron Ho Sun Microsystems Laboratories * Doan Hoang University of Technology, Sydney * D. N. (Jay) Jayasimha Intel * Isaac Keslassy Technion * Venkata Krishnan Dolphin Interconnect Solutions * Tal Lavian Nortel Networks Labs, UC Berkeley * Bill Lin University of California, San Diego * Olav Lysne Simula Research Laboratory * Pankaj Mehra HP Labs * Rami Melhem University of Pittsburgh * Cyriel Minkenberg IBM Zurich Research Laboratory * Gregory Pfister IBM * Craig Stunkel IBM T.J. Watson Research Center * Anujan Varma University of California at Santa Cruz * Zuoguo (Joe) Wu Intel From swise at opengridcomputing.com Wed Mar 21 08:30:56 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 21 Mar 2007 10:30:56 -0500 Subject: [ofa-general] ofed 1.2 beta1 bug 468 - possible librdmacm bug? Message-ID: <1174491057.4734.17.camel@stevo-laptop> Hey Sean, I assigned this bug to you. After initial analysis, it appears to be a librdmacm issue. I think the rdma-cm passed up an event with a garbage or already-freed cm_id. Its readily reproducible. I've added some debug info to the bug for you. Let me know if I can help. Steve. From krause at cup.hp.com Wed Mar 21 08:41:48 2007 From: krause at cup.hp.com (Michael Krause) Date: Wed, 21 Mar 2007 08:41:48 -0700 Subject: [ofa-general] [RFC] host stack IB-to-IB router support In-Reply-To: <1174486948.6493.80653.camel@hal.voltaire.com> References: <1174486948.6493.80653.camel@hal.voltaire.com> Message-ID: <6.2.0.14.2.20070321083240.031da1d8@esmail.cup.hp.com> At 07:22 AM 3/21/2007, Hal Rosenstock wrote: >The SM is subnet local but it is unclear about the SA. The SA was intended to be subnet local. What is above the SM / SA were left to implementation discretion but these two were intended to be strictly subnet local. >Currently, if one knows the GID of the SA, it can be contacted remotely. One can communicate with anything that has a GID. While it is true that does not make it the right choice. > To support routing, there are a number of things that SA supports that I > think will >need to be exchanged across SAs so in this sense it may not be that far >off in that it separates this aspect which can evolve to wherever it >ends up in the architecture. The only thing one needs to comprehend are the of the destination. Ideally, one would create the equivalent of a DNS service agent that would operate in each subnet and provide a local look up for any endnode requesting a remote's information. The subnet local router would provide information into the SM / SA to enable endnode's to comprehend which router port to target their LRH structure. The router would strip the LRH and add a new one to the next hop - whether another router or the destination port. This gets you going without a lot of complexity. Now, if you want to add in QoS attributes to the path discovery, then there are two areas to explore: - Subnet local QoS to the subnet local router port. - Router protocol provided QoS information between routers. The first just uses the existing infrastructure. The second would be new and part of the router protocol to populate the SM/SA. If one associated a given GID prefix with one or more QoS attributes, then the source can make intelligent decisions without requiring any remote SA involvement. >It is a temporary measure to get by with the lack of specificity in the >routing space. As you can see, there is an immediate need within >OpenFabrics to resolve these issues or at least take an experimental stab >at them so more IB router protoyping can be done. Caution is recommended. If the IBTA can get its act together on the basics in a reasonable amount of time then the legacy problem can be avoided entirely. If not, well, customers have to make choice points, test matrix get larger, etc. Waiting until Sonoma isn't going to kill anyone and may provide better insight into what should be specified in the interim between now and the IBTA getting its act together. Mike >-- Hal > > > > I agree with you that this is not a good place to be, but with current > > > hardware I think we are stuck with it.. > > > > I do not think so (with my ib hw asic vendor hat on). > > > > > > > > Regards, > > > Jason > > > _______________________________________________ > > > general mailing list > > > general at lists.openfabrics.org > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib- > > > general > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > >_______________________________________________ >general mailing list >general at lists.openfabrics.org >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Wed Mar 21 10:27:36 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Mar 2007 12:27:36 -0500 Subject: [ofa-general] [RFC] host stack IB-to-IB router support In-Reply-To: <6.2.0.14.2.20070321083240.031da1d8@esmail.cup.hp.com> References: <1174486948.6493.80653.camel@hal.voltaire.com> <6.2.0.14.2.20070321083240.031da1d8@esmail.cup.hp.com> Message-ID: <1174498055.17678.2769.camel@hal.voltaire.com> On Wed, 2007-03-21 at 10:41, Michael Krause wrote: > At 07:22 AM 3/21/2007, Hal Rosenstock wrote: > >The SM is subnet local but it is unclear about the SA. > > The SA was intended to be subnet local. What about ServiceRecords ? ServiceIDs can have non local subnet scope per Annex A1. That's one issue. > What is above the SM / SA were > left to implementation discretion but these two were intended to be > strictly subnet local. > > >Currently, if one knows the GID of the SA, it can be contacted remotely. > > One can communicate with anything that has a GID. While it is true that > does not make it the right choice. > > > To support routing, there are a number of things that SA supports that I > > think will > >need to be exchanged across SAs so in this sense it may not be that far > >off in that it separates this aspect which can evolve to wherever it > >ends up in the architecture. > > The only thing one needs to comprehend are the of the > destination. Ideally, one would create the equivalent of a DNS service > agent that would operate in each subnet and provide a local look up for any > endnode requesting a remote's information. The subnet local router would > provide information into the SM / SA to enable endnode's to comprehend > which router port to target their LRH structure. The router would strip > the LRH and add a new one to the next hop - whether another router or the > destination port. This gets you going without a lot of complexity. > Now, if you want to add in QoS attributes to the path discovery, then there > are two areas to explore: > > - Subnet local QoS to the subnet local router port. > - Router protocol provided QoS information between routers. > > The first just uses the existing infrastructure. The second would be new > and part of the router protocol to populate the SM/SA. If one associated > a given GID prefix with one or more QoS attributes, then the source can > make intelligent decisions without requiring any remote SA involvement. > > >It is a temporary measure to get by with the lack of specificity in the > >routing space. As you can see, there is an immediate need within > >OpenFabrics to resolve these issues or at least take an experimental stab > >at them so more IB router protoyping can be done. > > Caution is recommended. If the IBTA can get its act together on the > basics in a reasonable amount of time then the legacy problem can be > avoided entirely. If not, well, customers have to make choice points, test > matrix get larger, etc. Waiting until Sonoma isn't going to kill anyone > and may provide better insight into what should be specified in the interim > between now and the IBTA getting its act together. That would be nice (and I wish it were otherwise) but I do not expect that solutions/directions sufficient for an implementation will be coming out of the Sonoma discussions. -- Hal > Mike > > > >-- Hal > > > > > > I agree with you that this is not a good place to be, but with current > > > > hardware I think we are stuck with it.. > > > > > > I do not think so (with my ib hw asic vendor hat on). > > > > > > > > > > > Regards, > > > > Jason > > > > _______________________________________________ > > > > general mailing list > > > > general at lists.openfabrics.org > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib- > > > > general > > > _______________________________________________ > > > general mailing list > > > general at lists.openfabrics.org > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > >_______________________________________________ > >general mailing list > >general at lists.openfabrics.org > >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > >To unsubscribe, please visit > >http://openib.org/mailman/listinfo/openib-general > > From jsquyres at cisco.com Wed Mar 21 09:35:54 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 21 Mar 2007 12:35:54 -0400 Subject: [ofa-general] Need new sysadmin for server Message-ID: <76A47ED7-283E-43CD-AD2F-FE0A55C4B9E4@cisco.com> Michael Lee will no longer be available for sysadmin issues on the OpenFabrics server as of June 1, 2007. Michael has done a fantastic job of slowly and carefully moving all the services from the Sandia server to the OFA server; this was a far more complex job than most people realized. He has been responsive, creative, and very helpful. Thank you, Michael! This means that we need someone else to step up and do the sysadmin work on the new server (Apache, etc.). Who can do this? -- Jeff Squyres Cisco Systems From mst at dev.mellanox.co.il Wed Mar 21 10:28:34 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 21 Mar 2007 19:28:34 +0200 Subject: [ofa-general] [PATCH -stable] prevent memory corruption in device unregister Message-ID: <20070321172824.GA5233@mellanox.co.il> dst_ifdown breaks infiniband by doing dst->neighbour->dev = &loopback_dev when the device is being unregistered. As the result, ipoib_neigh_destructor gets called for the loopback device, resulting in memory corruption. Luckily we know we've already freed all resources before unregistering the device, so to avoid a crash, it's enough to test the device type and exit. Unfortunately module unloading remains racy - it should get fixed in 2.6.21 by a bigger change in net/core/neighbour.c Signed-off-by: Michael S. Tsirkin --- We missed this previously, but sticking WARN_ON(n->dev->type != ARPHRD_INFINIBAND) inside ipoib_neigh_destructor shows that this memory corruption is easy to trigger in 2.6.19/2.6.20. So I suggest sending this patch to -stable for inclusion in these kernels. Roland, can you Ack this? diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index f9dbc6f..f801917 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -821,6 +821,9 @@ static void ipoib_neigh_destructor(struct neighbour *n) unsigned long flags; struct ipoib_ah *ah = NULL; + if (n->dev->type != ARPHRD_INFINIBAND) + return; + ipoib_dbg(priv, "neigh_destructor for %06x " IPOIB_GID_FMT "\n", IPOIB_QPN(n->ha), -- MST From halr at voltaire.com Wed Mar 21 11:27:58 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Mar 2007 13:27:58 -0500 Subject: [ofa-general] Re: [PATCH] opensm: daemon mode In-Reply-To: <20070321013306.GC20990@sashak.voltaire.com> References: <20070321012355.GA20990@sashak.voltaire.com> <20070321012702.GB20990@sashak.voltaire.com> <20070321013306.GC20990@sashak.voltaire.com> Message-ID: <1174501676.17678.6690.camel@hal.voltaire.com> On Tue, 2007-03-20 at 20:33, Sasha Khapyorsky wrote: > This adds daemon mode support for OpenSM. The process will be detached > from terminal and backgrounded. Use '-B' or '--daemon' options to > activate. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied (to master only). Let me know if this should go on ofed_1_2 as well. -- Hal From yaronh at voltaire.com Wed Mar 21 10:33:17 2007 From: yaronh at voltaire.com (Yaron Haviv) Date: Wed, 21 Mar 2007 19:33:17 +0200 Subject: [ofa-general] [RFC] host stack IB-to-IB router support In-Reply-To: <1174498055.17678.2769.camel@hal.voltaire.com> References: <1174486948.6493.80653.camel@hal.voltaire.com><6.2.0.14.2.20070321083240.031da1d8@esmail.cup.hp.com> <1174498055.17678.2769.camel@hal.voltaire.com> Message-ID: > -----Original Message----- > From: general-bounces at lists.openfabrics.org [mailto:general- > bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock > Sent: Wednesday, March 21, 2007 1:28 PM > To: Michael Krause > Cc: general at lists.openfabrics.org > Subject: RE: [ofa-general] [RFC] host stack IB-to-IB router support > > On Wed, 2007-03-21 at 10:41, Michael Krause wrote: > > At 07:22 AM 3/21/2007, Hal Rosenstock wrote: > > >The SM is subnet local but it is unclear about the SA. > > > > The SA was intended to be subnet local. > > What about ServiceRecords ? ServiceIDs can have non local subnet scope > per Annex A1. That's one issue. > Hal, the fact that we have bits saying something is global vs. local Doesn't mean the control plain for that needs to be in the SM/SA Not to mention that with OpenFabric fabric independent model ServiceIDs are now 16bits and map to TCP like ports We want to have the SM/SA be the monitoring/configuration tool for a specific subnet and not grant it with more authorities than it should Some IB mechanisms are too centralized already, we don't want to carry that legacy into an inter-subnet framework unless we have to. If there are holes in the spec that inhibit us from doing it in the right way (like in IP routing), we should identify them and drive them quickly via IBTA, after all IB usage model may have changed a bit since the draft was written few years ago. Ok, lets assume Sean would finish his experiments with remote_sa, how would that find its way into the commercial sm/sa versions that are mostly used, how would we guarantee interoperability between all implementations, .. ? How would that address future routing, security, QoS, .. enhancements ? can it ? Yaron From halr at voltaire.com Wed Mar 21 11:40:20 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Mar 2007 13:40:20 -0500 Subject: [ofa-general] [RFC] host stack IB-to-IB router support In-Reply-To: References: <1174486948.6493.80653.camel@hal.voltaire.com> <6.2.0.14.2.20070321083240.031da1d8@esmail.cup.hp.com> <1174498055.17678.2769.camel@hal.voltaire.com> Message-ID: <1174502419.17678.7433.camel@hal.voltaire.com> On Wed, 2007-03-21 at 12:33, Yaron Haviv wrote: > > -----Original Message----- > > From: general-bounces at lists.openfabrics.org [mailto:general- > > bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock > > Sent: Wednesday, March 21, 2007 1:28 PM > > To: Michael Krause > > Cc: general at lists.openfabrics.org > > Subject: RE: [ofa-general] [RFC] host stack IB-to-IB router support > > > > On Wed, 2007-03-21 at 10:41, Michael Krause wrote: > > > At 07:22 AM 3/21/2007, Hal Rosenstock wrote: > > > >The SM is subnet local but it is unclear about the SA. > > > > > > The SA was intended to be subnet local. > > > > What about ServiceRecords ? ServiceIDs can have non local subnet scope > > per Annex A1. That's one issue. > > > > Hal, the fact that we have bits saying something is global vs. local > Doesn't mean the control plain for that needs to be in the SM/SA > Not to mention that with OpenFabric fabric independent model ServiceIDs > are now 16bits and map to TCP like ports > > We want to have the SM/SA be the monitoring/configuration tool for a > specific subnet and not grant it with more authorities than it should > Some IB mechanisms are too centralized already, we don't want to carry > that legacy into an inter-subnet framework unless we have to. > > If there are holes in the spec that inhibit us from doing it in the > right way (like in IP routing), we should identify them and drive them > quickly via IBTA, after all IB usage model may have changed a bit since > the draft was written few years ago. > > Ok, lets assume Sean would finish his experiments with remote_sa, how > would that find its way into the commercial sm/sa versions that are > mostly used, how would we guarantee interoperability between all > implementations, .. ? It wouldn't necessarily. It is a vehicle for experimentation. I believe there is separation of what is spec compliant and non compliant and it allows for more flexible configurations than what can currently be done by the current spec. > How would that address future routing, security, QoS, .. enhancements ? > can it ? It can evolve but the controversial part of this was meant for more flexible configurations for experimentation. -- Hal > Yaron From pw at osc.edu Wed Mar 21 10:50:05 2007 From: pw at osc.edu (Pete Wyckoff) Date: Wed, 21 Mar 2007 13:50:05 -0400 Subject: [ofa-general] ib_umem_get always wants write access Message-ID: <20070321175005.GA8123@osc.edu> I've wondered about this for a while. In ib_umem_get, there is a call to get_user_pages that does the work of virtual to physical translation and increasing the ref counts. It is always invoked with write == 1, even if cmd.access_flags == 0 (read only registration). This is fine for anonymous private memory, or writeable shared memory. But consider pinning a read-only section of memory, such as shared read-only data or text segment, or a file mapping of a file that was opened O_RDONLY. Having write == 1 there forces a full copy of all these pages. The force argument is explicitly set to 1 only when access_flags does not specify write access, giving gup permission to do the copy-on-write, essentially. That seems correct, but always setting write to 1 has me confused. Is there some IB semantic reason for forcing the registered pages to be writable? -- Pete From mshefty at ichips.intel.com Wed Mar 21 10:51:07 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 21 Mar 2007 10:51:07 -0700 Subject: [ofa-general] [RFC] host stack IB-to-IB router support In-Reply-To: References: <1174486948.6493.80653.camel@hal.voltaire.com><6.2.0.14.2.20070321083240.031da1d8@esmail.cup.hp.com> <1174498055.17678.2769.camel@hal.voltaire.com> Message-ID: <4601708B.6090602@ichips.intel.com> > Ok, lets assume Sean would finish his experiments with remote_sa, how > would that find its way into the commercial sm/sa versions that are > mostly used, how would we guarantee interoperability between all > implementations, .. ? > How would that address future routing, security, QoS, .. enhancements ? > can it ? The 'remote sa' as simply a proprietary UD protocol. Whatever data two 'remote sa' services exchange shouldn't matter, nor should the fact that each issues local SA path records. There's nothing magical about this. If I have an app that can query its local SA, there's nothing that prevents that app from sending that data to whatever peer it can connect to. It can even send the data over TCP if it wants. Keeping the SA subnet local doesn't add any real security. Coming up with a solution that doesn't work with any existing hardware, targets, and SAs isn't very useful. - Sean From sean.hefty at intel.com Wed Mar 21 11:04:45 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 21 Mar 2007 11:04:45 -0700 Subject: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <20070321095101.GH20583@mellanox.co.il> Message-ID: <000001c76be3$68e00ad0$76248686@amr.corp.intel.com> Sure - this is the patch against 2.6.21-rc4. You can view the diff against OFED 1.2 here: http://www.openfabrics.org/git/?p=~shefty/ofed_1_2.git;a=summary (The 2.6.21-rc4 diff is simply dropped into the existing ofed patch file.) --- diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 842cd0b..706fdbf 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -54,6 +54,17 @@ MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("InfiniBand CM"); MODULE_LICENSE("Dual BSD/GPL"); +#define PFX "ib_cm: " + +/* + * Limit CM message timeouts to something reasonable: + * 32 seconds per message, with up to 15 retries + */ +static int max_timeout = 23; +module_param(max_timeout, int, 0644); +MODULE_PARM_DESC(max_timeout, "Maximum IB CM per message timeout " + "(default=23, or ~32 seconds)"); + static void cm_add_one(struct ib_device *device); static void cm_remove_one(struct ib_device *device); @@ -888,11 +899,23 @@ static void cm_format_req(struct cm_req_msg *req_msg, cm_req_set_init_depth(req_msg, param->initiator_depth); cm_req_set_remote_resp_timeout(req_msg, param->remote_cm_response_timeout); + if (param->remote_cm_response_timeout > (u8) max_timeout) { + printk(KERN_WARNING PFX "req remote_cm_response_timeout %d > " + "%d, decreasing\n", param->remote_cm_response_timeout, + max_timeout); + cm_req_set_remote_resp_timeout(req_msg, (u8) max_timeout); + } cm_req_set_qp_type(req_msg, param->qp_type); cm_req_set_flow_ctrl(req_msg, param->flow_control); cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn)); cm_req_set_local_resp_timeout(req_msg, param->local_cm_response_timeout); + if (param->local_cm_response_timeout > (u8) max_timeout) { + printk(KERN_WARNING PFX "req local_cm_response_timeout %d > " + "%d, decreasing\n", param->local_cm_response_timeout, + max_timeout); + cm_req_set_local_resp_timeout(req_msg, (u8) max_timeout); + } cm_req_set_retry_count(req_msg, param->retry_count); req_msg->pkey = param->primary_path->pkey; cm_req_set_path_mtu(req_msg, param->primary_path->mtu); @@ -1002,6 +1025,11 @@ int ib_send_cm_req(struct ib_cm_id *cm_id, param->primary_path->packet_life_time) * 2 + cm_convert_to_ms( param->remote_cm_response_timeout); + if (cm_id_priv->timeout_ms > cm_convert_to_ms(max_timeout)) { + printk(KERN_WARNING PFX "req timeout_ms %d > %d, decreasing\n", + cm_id_priv->timeout_ms, cm_convert_to_ms(max_timeout)); + cm_id_priv->timeout_ms = cm_convert_to_ms(max_timeout); + } cm_id_priv->max_cm_retries = param->max_cm_retries; cm_id_priv->initiator_depth = param->initiator_depth; cm_id_priv->responder_resources = param->responder_resources; @@ -1401,6 +1429,13 @@ static int cm_req_handler(struct cm_work *work) cm_id_priv->tid = req_msg->hdr.tid; cm_id_priv->timeout_ms = cm_convert_to_ms( cm_req_get_local_resp_timeout(req_msg)); + if (cm_req_get_local_resp_timeout(req_msg) > (u8) max_timeout) { + printk(KERN_WARNING PFX "rcvd cm_local_resp_timeout %d > %d, " + "decreasing used timeout_ms\n", + cm_req_get_local_resp_timeout(req_msg), max_timeout); + cm_id_priv->timeout_ms = cm_convert_to_ms(max_timeout); + } + cm_id_priv->max_cm_retries = cm_req_get_max_cm_retries(req_msg); cm_id_priv->remote_qpn = cm_req_get_local_qpn(req_msg); cm_id_priv->initiator_depth = cm_req_get_resp_res(req_msg); @@ -2304,6 +2339,12 @@ static int cm_mra_handler(struct cm_work *work) cm_mra_get_service_timeout(mra_msg); timeout = cm_convert_to_ms(cm_mra_get_service_timeout(mra_msg)) + cm_convert_to_ms(cm_id_priv->av.packet_life_time); + if (timeout > cm_convert_to_ms(max_timeout)) { + printk(KERN_WARNING PFX "calculated mra timeout %d > %d, " + "decreasing used timeout_ms\n", timeout, + cm_convert_to_ms(max_timeout)); + timeout = cm_convert_to_ms(max_timeout); + } spin_lock_irqsave(&cm_id_priv->lock, flags); switch (cm_id_priv->id.state) { @@ -2707,6 +2748,12 @@ int ib_send_cm_sidr_req(struct ib_cm_id *cm_id, cm_id->service_id = param->service_id; cm_id->service_mask = __constant_cpu_to_be64(~0ULL); cm_id_priv->timeout_ms = param->timeout_ms; + if (cm_id_priv->timeout_ms > cm_convert_to_ms(max_timeout)) { + printk(KERN_WARNING PFX "sidr req timeout_ms %d > %d, " + "decreasing used timeout_ms\n", param->timeout_ms, + cm_convert_to_ms(max_timeout)); + cm_id_priv->timeout_ms = cm_convert_to_ms(max_timeout); + } cm_id_priv->max_cm_retries = param->max_cm_retries; ret = cm_alloc_msg(cm_id_priv, &msg); if (ret) From rdreier at cisco.com Wed Mar 21 11:23:35 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 21 Mar 2007 11:23:35 -0700 Subject: [ofa-general] ib_umem_get always wants write access In-Reply-To: <20070321175005.GA8123@osc.edu> (Pete Wyckoff's message of "Wed, 21 Mar 2007 13:50:05 -0400") References: <20070321175005.GA8123@osc.edu> Message-ID: > I've wondered about this for a while. In ib_umem_get, there is a > call to get_user_pages that does the work of virtual to physical > translation and increasing the ref counts. It is always invoked > with write == 1, even if cmd.access_flags == 0 (read only > registration). > > This is fine for anonymous private memory, or writeable shared > memory. But consider pinning a read-only section of memory, such as > shared read-only data or text segment, or a file mapping of a file > that was opened O_RDONLY. Having write == 1 there forces a full > copy of all these pages. > > The force argument is explicitly set to 1 only when access_flags > does not specify write access, giving gup permission to do the > copy-on-write, essentially. That seems correct, but always setting > write to 1 has me confused. > > Is there some IB semantic reason for forcing the registered pages to > be writable? I'm having a hard time remembering the exact reasoning, but the basic idea is that we need to allow read-only memory to be registered but we also need to force allocated but not touched memory to be faulted in. - R. From Douglas.Fuller at asu.edu Wed Mar 21 11:29:07 2007 From: Douglas.Fuller at asu.edu (Douglas Fuller) Date: Wed, 21 Mar 2007 11:29:07 -0700 Subject: [ofa-general] osm error messages Message-ID: I'm seeing some sporadic error activity from OpnSM (OFED 1.1; osm.log below) that ay correlate with some ob failures -- I'm trying to get to the bottom of this. Before seeing this, I isolatedand disabled with ibportstate) what ppeared to be a ba internal port in one of our core switches. That leads me to suspectI have a switchmisbehaving somwhere. Without any other intervention, things seem to check out (wth ibdiagnet/ibchecknet). Any thought? Need any more nformatin? Thanks, --Doug Fuller Mar 19 18:8:50 000354 [AB000160] -> OpenSM Rev:openib-2.0.5 OpnIBsvn Exported revision Mar 19 18:28:50 000466 [AB00160] -> OpenSM Rev:openib-2.0.5 OpenIB svn Exported revision Mar 19 18:28:50 007666 [AB000160] -> om_vendor_bind: Binding to port 0x5ad0000024bb Mar 19 18:28:50 011279 [AB000160] ->osm_vendor_bind: Binding to port 0x5ad0000024bbb Mar 19 18:2850 438326 [44606960] -> Entering MASTER stte Mar 19 18:28:50 438628 [4606960] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe8000000000000,0x0005ad000024bbb Mar 19 1828:50 438661 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:6 from LID:0x0000 GID:0xf8000000000000,0x0005ad0000024bbb Mar 1 18:28:50 504176 [41401960] -> osm_cast_mgr_process: Min Hop Tabes onfigured on all switches Mar 19 18:28:50 639453 [44606960] -> SUNET UP Mar 19 18:28:50 853613 [1E02960 -> __osm_trap_rcv_process_reqest: Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x0092 TID:0x000000000000018 Mar 19 18:28:5 853813 [41E0960] -> osm_report_notice: Reporting Generic Notice typ:4 num:144 from LID:0x0092 GID:0xfe8000000000000,0x0005ad0000024bbb Mar 19 18:28:51 273470 [4460960] -> osm_ucast_mgr_process: Min HopTables configured on all switches Mar 19 18:28:51 33730 [43C05960] -> SUBNET UP Mar 19 18:3033 565682 [4320490] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x1 num:128 Poducer:2 from LID:0x0001 TID:0x0000000000000019 Mar 19 18:30:33 565958 [43204960] -> sm_report_notice: Reporting Generic Noicetype:1 num:128 from ID:0x0001 GID:0xfe80000000000000,0x005ad0000027c6 Mar 19 18:30:33 963901 [41401960] > osm_report_notice: Reporting Generic Notice typ:3 num:64 fro LID:x0092 GID:0xfe80000000000000,0x005ad0000024bbb Mar 19 18:30:33 963914 [41401960] -> Discovered nw port with GUID:0x0005ad00000297b LI range [0x3,0x37] of node:saguro-14-9 HCA-1 Mar 19 18:30:33 994698 [4401960] > om_ucast_mgr_process: Min Hp Tables configured n all switches Mar 19 18:30:34 054763 [45A08960]-> UBNET UP Mar 19 18:30:34 351397 [43C0560] -> __osm_trap_rcv_process_request: Received Generi Ntice type:0x04 num:144 Producer:1 fomLID:0x0037 TID:0x000000000000000 Mar 19 18:30:34 351615 [43C05960] -> osm_report_notice Reportig Generic otice type:4 num:144 from LID:0x0037 GID:0xfe80000000000000,0x0005ad00002497b ar 19 18:30:34 777488 [45A08960] -> osm_ucast_mgr_process:Min Hop Tables configured onall switces Mar 19 18:30:34 832664 [4A08960] -> SUBNET UP Mar 19 18:32:27 476136 [45A08960] -> __osm_trap_cv_process_request: Received Generic Notice typ:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000002b Mar 19 18:32:27 476340 [43204960] ->__osm_trap_rcv_process_request: Reeivd Gneric Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000037 Mar 19 18:32:2 476389 [45A08960] -> osm_report_notice: Reporting Generic Notice type: num:128 from LID:0x0148 GID:0xfe8000000000000,0x0005ad00000281b3 Mar 19 18:2:27 47485 [43204960] -> osm_report_ntice: Reporting Generic Notice tye:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad0000081a7Mar 19 18:32:27 817617 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 nm:65 frm LID:0x0092 GID:0xfe80000000000000,0x005ad000024bbb Mar 19 18:32:27 817637 [4280396] -> Remove port with GUID:0x0005ad0000024e0b LID range [0xB3,xB3] of node:saguaro-23-4 HCA-1 Mar19 18:32:27 817655 [42803960] -> sm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,00005ad0000024bbb Mar 9 18:32:27 81766 [42803960] -> Remove port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 1 18:32:2 817694 [42803960] -> osm_report_notice: Reporting Generic Ntice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,00005ad0000024bbb Mar 19 18:3:7 81769 [42803960] -> Removed port wth GUID0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 19 18:3227 817716 42803960] -> osm_report_ntice: ReportingGeneric Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:32:27 81771 [42803960] -> Remved port with GUID0x0005ad0000024b27 LID range [0xAF,0xAF] of node:sguaro-23-0 HCA-1 Mar 19 18:32:27 817738 [4280390] -> osm_report_notice: Reporting Generic Notic type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:32:27 817743 [4203960] -> Removed ort with GUID:0x000ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 19 18:32:27 817758 [42803960] - osm_report_notice: Reporing Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe8000000000000,0x0005ad0000024bbb Mar 19 18:32:27 817763 [42803960 -> Removed port with GUID:0x0005ad000024d7 LID range [0xB6,0xB6] of node:saguar-23-7 HCA-1 Mar 19 18:32:27 817780 [42803960] -> osm_rport_notice: Reporting Generic Notice type:3 nu:65 fromLID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:32:27 17785 [42803960] -> Remved port with GUID:0x0005ad0000024d6bLID range [0xB8,0xB8] of node:saguao-23-9 HCA-1 Mar 19 18:32:27 817803 [480396] -> osm_report_notce: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad000024bbb Mar 19 18:32:27 817808 [4283960] -> Removed por with GUID:0x0005ad0000024977 LID rane [0xA9,0xA9] of node:saguaro-22-4 HCA1 Mar 19 18:32:27 817932 [42803960] -> osm_report_notice: Rporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad000024bbb Mar 19 18:32:27 817938 [4280390] -> Removed port with GID:0x0005ad0000027c84 LID range [0x15,0x152] of node:Topspin Switch TS20 Mar 19 18:32:27 817970 [42803960] -> osm_report_notice: Reporing Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad000024bbb Mar 19 18:32:27 817977 [4280360] -> Removed port with GUID0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1Mar 19 1:32:27 817992 [42803960] -> osm_report_notice: Reporting Generic Notice tye:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad000004bbb Mar 19 18:32:27 817997 [42803960 -> Removed port with GUID:0x0005ad00000249f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Ma 19 18:32:27 81811 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xe80000000000000,0x0005ad0000024bb Mar 9 18:32:27 818016 [42803960] - Removed port with UID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-2-2 HCA-1 Mar 1 18:32:27 818032 [42803960] -> osm_report_notice: Reporing Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb a 19 18:32:27 818037 [42803960] -> Rmoved port with GUID:0x0005ad000004da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 19 1:32:27 818054 [4280360] -> osm_repot_notice: Reporting Generic Notice typ:3 num:65 from LID:0x0092 GID:0xfe80000000000000,x0005ad0000024bbb Mar19 18:32:27 818115 [42803960] -> Remoed port wth GUID:0x0005ad0000024cbb LID range [xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 1 18:3:27 81812 [42803960] -> osm_report_notice: Reorting Generic Notie type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad000002bbb Mar1918:32:27 818137 [42803960] -> Removedport with GUID:0x0005ad00000249d3 LID range [0xB1,0B1] of node:saguaro-23-2 HCA-1 Mar 19 8:32:2 818153 [4280360] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe800000000000,0x0005ad0000024bbb Mar 19 1832:7 818158 [42803960] -> Removedpot with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-2-5 HCA-1 Mar 19 18:32:7818173 [42803960] -> osm_report_notic: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad000024bbb Mar 19 18:322 818178 [42803960 -> Removed port wih GUID:0x0005ad0000024afb LID rnge [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 19 183227 851249[42803960] -> osm_ucast_mgr_procss: Min Hop Tables configured on all sitches Mar 19 18:32:27 898524 [43204960] -> SUBNET UP Mar 19 8:32:2828664 [45007960] -> osm_ucast_mgr_proessMin Hop Tables confiured on all switches Mar 19 18:32:28 341659 [4466960] -> SUBNET UP Mar 19 1:33:21814615 [41E0296] -> __osm_trap_rcv_process_request: Rceived Gneric Notice type:0x01 num:128 Produce:2 from LID:0x0148 TID:0x00000000000002c Mar 19 8:33:21 81484 [4E02960]-> osm_report_otice: Reporting Generi Notice type:1 num:128 fom LID:0x0148 GID:0xfe8000000000000,0x0005ad0000281b3 Mar19 18:3321 820835 [41E02960] -> __osm_tap_rcv_process_request: Received Genei Notce type:0x01 num:128 Producer:2 fro LI:0x001B TID:0x000000000000008 Ma 19 18:33:21 82090 [41E02960] -> smreport_notice: Repoting Generic Notice tye:1 num:128 fromLID:0x001B GD:0xfe80000000000000,0x0005ad0000281a7 ar 19 18:33:21 82038 [41E02960] -> __osm_trap_rcv_processrequest: Received Generic Notce tpe:0x num:128 Producer:2 from LID:00148 TID0x00000000000002d Mar 19 18:33:21 820992 [1E060] -> osm_report_notice:Reporting Gnric Notice type:1 num:128from LID:0x0148 GID:0xfe8000000000000,0x005ad00000281b3 Mar 19 18:3:21 826779 [4102960] -> __osm_trap_rcv_process_reqest: Receivd Generic Notice type0x0 num:128 Prducr:2 from LID:0x001B TID0x000000000000039 Mar 19 18:33:21 82742 [41E02960] - om_report_notice: Reporting Generc Ntice type:1 num:128 from LID:0x001B GID:0xfe8000000000000,0x0005a00000281a7Mar 19 18:33:22 164580 [45007960 > osm_drop_mgr_process: ERR 0108: Ukown remote side fo node 0x0005a0000027c84 port 18. Addi to light sweep sampling list Mar 191:3:22 164654 [45007960] -> Direced PathDump of 5 hop path: Path = [0][1][11][1][5][17] Mar19 18:33:22 164712 [45007960] -> osm_op_mgrprocess: ERR 0108: Unknown reote sde for node 0x0005ad00000281b3 port2. Adding to light swep sampling lit Mar 19 18:33:22164724 45007960] -> irected Path Dump of hop path: Path = [0][1[11][1][5] Mar 9 18:33:22 173634 [43C0960] ->osm_report_notic: Reporting Generic Notice type:3 num:64 fromID0x0092 GID:0xfe80000000000000,0x005ad0000024bbb Mar 19 18:33:22 17365 [43C05960] -> iscovered e port with GUID:0x0005ad000027c4 LIDrange [0x152,0x152] of node:Toppin Switch TS120 Mar 19 18:33:22 17365 [43C05960] -> osm_reportnotice:RepotingGeneric Notice type:3 num:64from LID:0x0092 GID:0xfe8000000000000,0x05ad0000024bbb Mar 19 18:33:22 173662 [43C05960] -> icovered newpot with GUID:0x005ad0000024b27 ID rage [0xAF,0xAF] o ode:saguaro-23- HCA-1 Mar 19 18:33:22 17366 [43C05960] -> osm_report_notice: Reprting Gneric otice type: num:64 from LID:0x0092 GI0xfe80000000000000,0x0005ad0000024bbbMar19 18:33:22 173671[43C05960] -> Discovered new port with GUID:0x005ad0000024da7 LID rage 0xB00xB0] ofnode:saguar-23-1 HCA-1 Mar 19 18:33:22 173675 [43C0596] -> osm_report_notice: Rerting Generic Notice typ:3 num:64 fro LID:0x0092 GID:0xfe800000000000,0x0005ad0000024bbb Mar 9 18:33:22173680 [43C05960] -> Discoveed new port withGUID:0x005ad00000249d3 LD range [0xB1,0xB1] of node:saguaro-2-2 HC1 Mar 1918:33:22 173684[43C05960] ->osm_report_notice: Reporting Generic Notice type:3 num:6 fro LI:0x002 GID:0xfe80000000000000,0x005ad000002bb Mar 19 18:33:22 173689 [43C05960] -> Dicovered new port with GUI:0x0005ad000024cbb LID range [0xB,0xB] of node:saguaro-23-3 HCA-1 Mar 19 1:33:22 173693 [3C0960] ->osm_report_notice: Rporting Gneric Notice type:3 num:64 fro LID:0x0092 GID:0xfe000000000000,0x005ad0000024bbb Mar 19 18:33:22 173697 43C0596] -> Discoverednew port with GUID:0x005ad000024e0b LI range [0xB3,0xB3] of nod:saguaro23-4 HCA-1 Mar 19 1833:22 173701 [43C5960] -> os_report_otice: Reporting Genric Notice type:3 num:4 from LID:0x0092 GID:0xfe80000000000000,0x005d0000024bbb Mr 19 1833:22 173706 [43C05960] -> Dscovered new port with GUID:0x0005ad00025043 LID ange [0xB4,0xB4] ofnode:saguaro-23-5 HA-1 Mar 1 18:33:2 173710 43C05960] -> osm_repo_notice: Reporting Generic Notice type:3 nm:64 from LID:0x0092 ID:0xfe800000000000,0x0005ad0000024bbb Ma 1918:33:22 173715 [43C0596] -> Discverednew port with GUID:0x0005ad00002510b LID range [0xB5,0xB5 of node:saguaro-23-6 HCA1 Mar 19 18:3:2 173719 [3C05960] -> osm_report_ntic: Reportin Generic Notice type:3 num:64 from LID:0x0092 GID0xfe8000000000000,0x0005ad0000024bbb Mar 19 18:33:22 1723 [43C05960]-> Disoveed new port wth GUID:0x0005ad000002447 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1Mar 1 18:33:22 173727 43C05960] -> osm_rert_notie: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0fe80000000000000,0x0005ad000004bbb Mar 9 18:33:22 17373 [43C05960] - Discoverednew port wit GUID:0x0005ad000024d8bLID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mr 19 18:33:22 173736 [4C05960] -> os_report_notice:Reporting Generic Notic type:3 num:64 from LID:0x0092 GID:0xf80000000000000,0x0005ad0000024bbb Mar 19 18:33:22 173741 43C05960] ->Dscovered ne port with GID:0x0005ad0000024d6b LI range [0xB8,0xB8] ofnode:saguaro-23-9 HCA-1 Mar19 18:33:22 173744 [43C0960] -> osm_report_notice: Reorting Generic otice type:3 num:64 from LID:0x0092 GI:0xfe8000000000000,0x0005ad0000024bbb Mar 19 18:3:22 173749 [4305960] -> Discovered new prt with GUI:0x0005d0000024afb LID rnge [0xA5,0xA5] of noe:saguaro-22-0 HCA-1 Mar 19 1833:22 13753 [43C05960] -> om_report_notice: Reporing Generic Notice type:3 num:64 from LID:0x002 GID:0xfe8000000000000,0x0005ad00004bbb Mar 19 18:33:22 17758 [43C0596] -> Discovered new portwith GUID:00005ad000002511b LID rang [0xA6,0A6] f node:saguao-22-1 HCA-1 Mar 19 18:33:22 173762 [43C05960] -> osm_reort_notice: Reportin Geneic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x005ad000024bbb Mar 19 18:33:22 17376 [3C0960] -> Discovered new port wihGUID:0x0005ad0000024c9b LID range [xA7,0xA7] of node:saguaro-222 HCA-1 Mar 19 8:33:22 173770 [43C05960] -> osm_report_notice Reporting Gneric Notie type:3 nm:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad000024bbb Mar 1918:33:22 173830 43C0590]-> Discovered new port with ID:0x0005ad000002498f LID range [0xA,0xA8]of node:saguaro-22-3 HCA-1 Mar 9 18:33:22 173834 [43C05960] -> osm_eport_notice: Reporting Geneic Notice type:3num:64 from LID:0x0092 GD:0xfe80000000000000,0x0005ad0000024bb Mar 1 18:33:22 173839 [43C05960] -> Discovered new port ith GUI:0x005ad0000024977 LID range [xA9,0A9 of node:saguaro-22-4 HCA-1 Mar 19 18:33:22 173843 [43C05960] ->osm_report_notice: Reporting GenericNotice ype: num:64 from LID:0x0092 GID:0xfe0000000000000,0x0005ad0000024bbb Mar 9 8:33:22 173848 [43C05960] -> Discovered new port with GUD:00005ad0000024feb LID range [0x153,x13 of node:saguaro-22-5 HCA-1 Mar 19 18:33:22 204620 [43C05960] -> osm_cast_mgr_process: Min Hop Tablesconfgued on all switches Mar 19 18:33:22 278567 [45A08960] -> SUNET UP Mar19 18:33:22 664286 [4141960] -> osm_ucast_mgr_process: Min Hop Table configured on all switces Mr 19 1833:22 734270 [45007960] -> SUBNET UP Mar 19 1833:37 650358 [41401960] -> __osm_trap_cv_process_request Rceived Geneic Noice type:0x01 num:128 Producer:2 from LID:0x0152 TID0x0000000000000000 Mar 19 18:33:37 65058 [41401960] -> os_report_notice Reporting Generic Noticetype:1 num:28 from LID:0x0152GID:0xfe8000000000000,0x005ad0000027c84 Mar 19 18:33:37 927263 [45A08960] -> __osm_rap_rcv_procs_request: Received Generic otice tye:0x01 num:128 Producer:2 fro LID:0x0152 TID:0x000000000000001 Mar 19 18:33:37 927420 [45A090] -> osm_report_notice: Reportig Geeric Notice type:1 num:128from LID:0x0152 GD:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:3:37 95572 [4A0896] -> __osm_trap_rcv_process_rquest: Received Generic Notice type001 num:128Producer:2 from LID:x0152 TID:0x000000000000002 Mar 1918:33:37 955657 [45A08960] -> osreprt_notice: Reporting Generic Noticetype:1num:128 from LD:0x0152 GID:0xfe80000000000000,0x0005ad000002c84 Mar 1 18:33:37 97718 [44606960] -> _osm_tap_rcv_process_request: Receivd Generic Notice type:0x01 nu:128 Produr:2 from LID:0x0152 TID:00000000000000003 Mar 19 18:33:3 97740 [44606960] -> osm_report_notice: Rporing Geneic Notice type:1 num:128 frm LID:0x0152 GID:0xfe800000000000,0x0005d0000027c84 Mar 19 18:3337 999319 [41E02960] -> __osm_trap_rc_process_rquest: Received Gneri Notice type:0x01 num:128 Producer:2 rom LID:0x052 TID:0x000000000000004 Mar 19 18:33:37 999447 [41E02960] > sm_report_notice: Reporting Generic otice type:1 num:128 from LID:x152GID:xfe800000000000000x000ad0000027c84 Mar 19 18:33:38 045171 [4606960] -> __osm_trap_rcvprcess_request: Received Gneric Notice type:0x0 num:128 Producer:2 frm LID:0x0152 TID:0x000000000000005 Mar 9 183:38 045271 [44606960] -> osm_report_notice: Reporting Generic Ntice ype:1 num:18 from LID:0x05 GID:0xfe800000000000000x0005ad000027c84 Mar 19 18:33:38 06305 [432060] -> __osm_trap_rcv_process_request: Received Generic Notice typ:0x01 nu:128 Producer:2 from ID:0x052 TID:0x000000000000006 Mar 1918:3:3 063102 [43204960] -> osm_reprt_notice: Rporting Generic Notice type:1 num:128 from LID:0x0152ID:0xfe8000000000000,0x0005a0000027c4 ar 9 18:3338 182624 [2803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01num:12 Produer2from LID:0x0152 TID:0x000000000000007 Mar 19 18:3338 18720 [4280360 -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x05 GID:xfe800000000000,0x0005ad000007c84 Mr 19 18:33:38 19435 [44606960] -> __osm_trap_rcv_process_request: Reeived Generic Notice tpe:0x0 num:128 Prducer:2 from LID:0x012 TID:x0000000000000008 Mar 1918:33:38 194209[44606960] -> osm_reportnotice: Reporting Generic Notic type:1 num:28 from LID:0x152 GID:0xfe000000000000,0x0005ad000007c4 Mar 1 18:33:38 379421 [43C05960] -> _om_trap_rcv_process_request: Received Generi Notice type:0x01 num:12Producr:2 from LID:0x0152 TID:0x000000000000009 Mar 19 18:33:38 37959 [4305960] -> osm_report_otice: Reporting eneric Ntice type:1 num:128 from LID:0x0152 GD:0xfe800000000000,0x0005ad0000027c84 Mar 19 1833:3 07685 [41401960] -> __osm_trap_rcv_rocss_request:Received GenericNotice type:0x01 num:128 Producer:2from LID:0x0152 TID:0x0000000000000a Mar 19 18:33:38 47758 [4140190] -> osm_report_notice: eprting Generic Notice type:1 num:128 rom LID:0x0152 GID:0xfe8000000000000,0000ad0000027c84 Mar 1 18:33:8 429658 [4A08960] -> __osm_trap_rcv_pocess_request: Received Generic Ntice type:001 num:128 Producer:2 fm LID0x0152 TID:0x000000000000000bMa 19 8:33:8 429700 [45A08960] -> __osm_trap_rcv_process_reqest: ERR 3804: Received trap 11 ties oecutiveyMar 19 18:33:38 544177 [45007960] - __osm_trap_rcv_process_reuest: Received Generic Notice type:0x0 num128 Prodcer:2 from LID:0x152 ID:0x000000000000000c Mar 9 18:3338 544221 [4507960] -> __osm_trp_rv_process_request: ERR 304: Received trap 12 times consecutiely Mar 1918:33:8 545235 [4280960] ->osm_repot_ntice:Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe800000000000,0x0005ad0000024b Mar 19 18:3338 54247 [42803960] -> Removed port with GUID:0x0005ad00024b27 LID range 0xAF,0xAF] of node:sauaro-23-0 HCA-1 Mar 19 18:33:3 545278 [42803960] -> osmreort_notice: Reporing Generic Noticetype3 num:65 from LD:0x0092 GI:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:33:38 54586 [428030] -> Removd port with GUI:0x0005ad000024da7 LID range [0xB0,0x0] of node:saguao-23-1 HC-1 Mar 19 18:33:38 545312 42803960] ->osm_report_noice: Reporting Generic Notice ype: num:65 from LID:0x0092 GID:0xfe800000000000,0x0005ad0000024bb Mar 19 18:33:38 545318 [42803960] -> Reoved portwth GUID:0x0005ad00000249d3 LID rang [xB1,0B1] of node:saguaro-23-2 HA-1 Mar 19 8:3:38 580005 [42803960 -> osm_ucast_mgr_process: Min Hop Tabes configured on all swiches Mar 19 18:33:38 66849[43C0590] -> SUBNET UP Mar 19 18:33:38 648520[45A08960] -> __om_tra_rcv_process_reques: ReceivedGeneric Notice type:0x01 num:128 Producer:2 from LID:0x015 D:0x00000000000000d Mar 19 18:33:38 48616 [45A08960] -> __osm_trap_cv_process_requet: ERR 3804: eceived trap 13 tmes consecutiely Mar 19 18:3338 676891[41E02960 -> __osm_trap_rcv_rocess_request: Recived Genei Notice type:0x01 num:128 Producer:2 fo LID:0x0152 TID:0x0000000000000e Mar 19 18:33:38 67670 [4102960] -> __osm_trap_rcv_proces_request: ERR 3804: Rceived trap 14 tes cosecutively Mar 19 18:33:38 698797 [4460960] ->__osm_trap_rcv_prcessrequest: Received Geneic Notice type:0x1 num:128 Producer:2frm LID:0x0152 TD:0x00000000000000f Mr 19 18:33:8 69860 [44606960] -> __osm_trap_rcv_process_equest: ERR 3804: Receved trap 15 times consecutivey Mar 19 18:33:38 20538 [43C05960] -> __s_trap_rcv_proces_request: Received Generic Notce type:0x01 num:128 Poducer:2from ID:0x0152 TID:0x0000000000000010Mar 19 18:33:38 720612 [43C0960] -> __osm_trp_rcv_process_reqest: RR 3804: eceived trap 16 times onsecutively ar 19 18:33:38 921253 [42803960] > __osm_trap_rcv_process_equest: eceived Generic Notice type:x01 num:128 Producer:2 from LID0x012 TIDx0000000000000011 Mar 19 18:33:8 92130 [42803960] -> __osm_trap_rcv_procss_reques: ERR 3804: Recived trap 17 imes consecutively Mar 198:33:38 97418 [43C05960] -> __osm_trap_rcv_proess_reqest: Received Generic Notice ype:0x01 num:128 Producer:2 rom LID:0x152 TID:0x0000000000000012 a19 18:33:38 967479 [43C05960] > __os_trap_rcv_process_request: RR 3804: Received trap 18 times onsecutively Mar 19 18:33:38 98519 [483960] -> _osm_trap_rcv_processreques: ReceivedGeneric Notice type:0x01 num:128 Producer:2 from LID:0x015 TID:0x000000000000013 Mar 19 18:33:3 98955[2803960] -> __osm_trap_rcv_process_rquest: ERR3804: Receivedtrap 19 times consecutively Mar 9 18:33:38 998342 [43204960] -> __osm_trap_rcv_poces_request: ecived Generic Notice type0x01 num:128 Poducer:2 from LID:0x0152 TD:0x000000000000014 Mar 19 18:33:38 998380 [4320496] -> __osm_ap_rcv_process_request ERR 384: Received trap 20 times consecutively ar 19 18:33:3 039293 43204960] -> __osm_trap_rcv_process_request: Recived Gneric Notice type:0x0 num:128 Producr:2 frm LID:0x0152 TID:x0000000000000015 Mar9 18:33:39 039334 [43204960] -> __os_trap_rcv_process_requs: ERR 3804:Received trap 21 times consecutively Mar19 18:33:39 061060 [3204960] -> __osm_trap_rcv_process_request: Receid Generic Notice type:01 num:128 Producer:2 from LID:0x0152TID:0x000000000000016 Mar 19 18:3:3906108 [43204960] -> __osm_trap_rcv_prcess_request: ERR 3804: Reeied tra 22 times consecutivel Mar 19 18:33:39 079032 [41E02960] -> __osm_tra_rcv_process_request: Received eneric Notice type:0x01 num:128 Prducer2 from LID:0x0152 TID:0x000000000000017 Mar 19 18:33:39 07972 [4E02960] -> _osm_trap_rcv_proces_request: ERR 3804: Receied trap23 times consecutively Mar 19 18:3:9 146006 [41E0960] -> osm_report_notice: Reporting Gneric Notice ype:3 num:65 from LID:0x0092 GD:0xfe8000000000000,0x0005ad0000024bbMar 19 18:33:39 146018 [4E02960] - Removed portwith GUID:0x005ad000002511b LID range [0xA6,xA6] of node:saguaro-2-1 HCA-1 Mar 19 18:33:39 14604 [41E02960] -> osm_eport_notice: Reporting Generic Noticetype:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x005ad0000024bb Mar 19 18:33:39 146050 [41E02960] -> Rmoved port with UID:0x0005d0000024db LID range [0xB80xB8] of node:saguaro-23-9 HCA-1 Mar 19 18:33:39 146082 [41E2960] -> osm_report_notice Reporting Generic Notic type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad000024bb Mar 19 18:33:39 146089 [41E02960] -> Removed port wth GUID:00005ad0000024afb LID range 0xA,0xA5] of node:saguaro-22-0 HCA-1 Mar 19 18:33:39 150720 [4140190 -> osm_report_notice: Reporting Gneric Notie type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad00000bbb Mar 19 18:33:39 150732 [41401960] -> Discovered new port with GUI0x0005ad0000024b27 LID rage [0xAF,0xAF] of node:saguaro-23-0 HCA-1 Mar 19 18:33:39 150736 [4140160] -> osm_report_notice: Reporting Geneic Notice ype:3 num:64 from LID:0x0092 GID:0xfe0000000000000,0x0005d000024bbb Mar 19 18:33:39 150742 [41401960] -> Discovered new port with GID:0x000ad0000024da LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 19 18:33:39 150745 [4141960] -> osm_report_notice: Reporting Geneic Notice tpe:3num:64 from LID:0x0092 GID:0xfe8000000000000,0x0005ad000024bbb Mar 19 18:3:39 150750 [41401960] -> Discovered new port with UID:0x0005ad0000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HA-1 Mar 19 18:33:39 181553 [4141960] -> osm_ucast_mgr_process: Min Ho Tables configured on al switches Mar 19 18:33:39 218130 [43C5960] -> __osm_trap_rcv_process_request: Received eneric Notice type:0x01 num:128 Producer:2 from LID:00152 TID:0x000000000000018 Mar 19 18:33:39 218197 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Receivd trap 24 times consecutively Mar 1918:33:39 375407 [42803960] -> __osm_trap_rcv_process_request: Received Generc Notice type:0x01 um:128 Producer:2 from LID:0x0152 TID:0x000000000000019 Mar 19 18:3339 375456 [42803960] -> __osm_trap_rcv_process_request: ERR 3804: Received tra 25 times consecutvely Mar 19 18:33:39 375588 [43C05960]-> __osm_trap_rcv_proces_request: Received Generic Notic type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x00000000000002e Mr 19 18:33:39 375630 [43C05960] -> osm_report_notice Reporting Generic Notice type:1 num:128 fro LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b Mar 19 18:33:39 637844 [41401960] -> UBNET UP Mar 19 18:33:9 664805 [45A08960] -> __osm_trap_rcv_process_request: Received Generc Notce type:0x01 num128 Producer:2 from LID:0x0148 TID:0x000000000000002f Mar 19 18:33:39 66490 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:xfe800000000000000x0005ad00000281b3 ar 19 18:33:39 666276 [45A08960] -> __osm_trap_rcvprocess_request: Received Generic Notice type:x01 num:128 Producer:2 from LID:0x001B TID:0x00000000000003a Mar 19 18:33:39 666364 [45A08960] -> osm_report_notice Reporting Generic Notice type1 num:128 from LID:0x001B GID:0xfe8000000000000,0x0005ad00000281a7Mar 19 18:33:39 710546 [41E02960] -> __osm_trap_rcv_proces_request Received Generic Notice type:0x01 num:128 Producer2 from LID:0x014 TID:0x000000000000003 Mar 19 18:33:39 710642 [41E02960] -> osm_report_notice Reporting Generic Notice type:1 num:28 from LID:0x0148 ID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:33:39 732425 [41E0960] ->__osm_trap_rcvprocess_request: Received Generic Notice type:0x01 num:128 Producer:2 from ID:0x0148 TID:0x0000000000000031 Mar 19 18:33:39732514 41E02960] -> osm_report_notice: Reporing Generic Notice type:1num:128 from LID:0x0148 GID:0xfe8000000000000,0x0005ad00000281b3 Mar 1 18:33:39 784151 [43204960] -> __osm_trap_rcv_process_request: ReceivedGeneric Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:000000000000032 Mar 19 18:33:39 784269 [43204960] -> osm_report_notice: Reporting neric Notice type:1 nu:128 from LID:x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:33:39 824170 [42803960] -> __osm_trap_rcv_procss_request: Received Generic Notice type:0x01 num:128 Produer:2 from LID:0x001B TID:0x00000000000003b Mar 19 18:33:39 824443 [42803960] -> osm_repot_notice: Reporting Generic Notice tye:1 num:128 frm LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:33:40 015052 [44606960] - osm_report_notice: Reporting Generic Notice type:3num:64 from ID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 9 18:33:40 015070[44606960] - Discovered new port with GUID:0x0005ad0000024d6b LID range [0xB80xB8] of node:saguaro-23-9 HCA-1 Mar 19 18:33:40 015074 [44606960] -> osm_repot_notice: Reporting Generic Notice type3 num:64 from LID:0x0092 GID:0fe80000000000000,0x0005ad0000024bbb Mar 19 18:33:40 015080 [44606960] -> Discovered new port wit GUID:0x0005ad000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 19 18:3:40 015083 [44606960] -> osm_repor_notice: Reporting Generic Notice type:3 num:64 from LID:0x002 GID:0xfe80000000000000,0x0005ad000002bbb Mar 19 18:33:40 015088 [44606960] -> Discovered new port with GUID:00005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar19 18:33:40 046164 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switchesMar 19 18:33:40 106627 [42803960] -> SBNET UP Mar 19 18:33:40 145952 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 um:128 Producer:2 from LID:0x0148 TID:0x0000000000000033 Mar 19 18:3340 146076 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x018 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:33:40 146486 [44606960] -> __osm_trap_rcv_process_request:Received Generic Notice ype:0x01 num:128 Producer:2 from LID:0x001B ID:0x000000000000003c Mar 19 18:33:40 146611 [44606960] -> osm_report_notice: Reporting Generic otice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:33:40 306176 [41401960]-> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 um:128 Producer:2 from LID:0x001B TID:0x000000000000003d Mar 19 18:33:40 306270 [41401960] -> os_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a Mar 19 18:33:40 420009 [43C05960] -> __osm_trap_rcv_process_rquest:Received Generic Notice typ:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000019 Mar 918:33:4420071 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 26 times conecutively r 19 18:33:40 433566 [4280390] -> __osm_trap_rcv_process_request: Reeived Geneic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x00000000000001a Mar 19 1:33:40 433596 [42803960] -> __osm_traprcv_proess_request: ERR 3804: Received trap 27 times consecutively Mar 19 18:33:40 434996 [45007960] -> _osm_trap_cv_process_request: Received Generic otice type:0x01 num:128 Producer:2 from LID:0x001BTID:0x000000000000003e Mar 19 18:33:40 435041 [4500790] -> osm_reportotice: Reporting Generic Notice ype:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad0000281a7 Mar 19 18:33:40 485454 [3204960 -> osm_ucast_mgr_procss: Mi Hop Tables configured on all switches Mar 19 18:33:40 528816 [43C05960] -> __osm_trap_cv_process_requet: Received Generic Noice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000003f Mar 19 18:33:40 528960 [43C05960] -> osm_reort_notie: Reporting Generic Notice type:1 nu:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:33:40 546019 [42803960] -> SUBNT UP Mar 19 18:33:40 551048 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:x0000000000000034 Mar 19 18:33:40 55119 [42803960] -> osm_report_notice: Reporting Generic Notice typ:1 num:128 from LID:0x0148 GID:0xfe8000000000000,0x0005ad00000281b3 Mar 19 18:33:40 594994 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer2 from LID:0x001B TID:0x0000000000000040 Mar 19 18:33:40 595074 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LD:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:33:40 83973 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Prodcer:2 from LID:0x001B TID:0x0000000000000041 Mar 19 18:33:40 840064 [43204960] -> osm_report_notice: Reporting Gneric Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x005ad0000281a7 Mar 19 18:33:40 861973 [43204960] -> __osm_trap_rcv_process_request: Received Genric Notice type:0x01 num:128 Producer: from LID:0x001B TID:0x0000000000000042 Mar 19 18:33:40 862075 [43204960]-> osm_report_notice: Reporting Generic Ntice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x005ad00000281a7 Mar 19 18:33:40 83777 [43204960] -> __osm_trap_rcv_process_request: Received Generic otice type:0x01 num:128 Producr:2 from LID:0x001B TID:0x0000000000000043 Mar 19 18:33:40 907658 [4803960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:33:40 947974 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:33:40 965203 [45007960] -> SUBNET UP Mar 19 18:33:41 350582 [45007960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:33:41 417662 [43204960] -> SUBNET UP Mar 19 18:33:41 571156 [45A08960] -> __osm_trap_rc_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001b Mar 19 18:33:41 571256 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 28 times consecutively Mar 19 18:35:50 971684 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000035 Mar 19 18:35:50 971926 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:35:50 972301 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000044 Mar 19 18:35:50 972378 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:35:51 342826 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 342845 [43204960] -> Removed port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 19 18:35:51 342866 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 342873 [43204960] -> Removed port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 19 18:35:51 342895 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 342901 [43204960] -> Removed port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 19 18:35:51 342923 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 342930 [43204960] -> Removed port with GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 Mar 19 18:35:51 342968 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 342972 [43204960] -> Removed port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 19 18:35:51 342989 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 342994 [43204960] -> Removed port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 19 18:35:51 343011 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 343016 [43204960] -> Removed port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 19 18:35:51 343033 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 343038 [43204960] -> Removed port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 19 18:35:51 343189 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 343194 [43204960] -> Removed port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 19 18:35:51 343234 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 343239 [43204960] -> Removed port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 19 18:35:51 343253 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 343258 [43204960] -> Removed port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 19 18:35:51 343273 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 343278 [43204960] -> Removed port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 19 18:35:51 343293 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 343298 [43204960] -> Removed port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 19 18:35:51 343314 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 343319 [43204960] -> Removed port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 19 18:35:51 343334 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 343393 [43204960] -> Removed port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 19 18:35:51 343410 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 343415 [43204960] -> Removed port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 19 18:35:51 343430 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:35:51 343435 [43204960] -> Removed port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 19 18:35:51 376525 [43204960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:35:51 433087 [43204960] -> SUBNET UP Mar 19 18:35:51 849193 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:35:51 901399 [42803960] -> SUBNET UP Mar 19 18:36:44 359407 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000036 Mar 19 18:36:44 359652 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:36:44 365352 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000037 Mar 19 18:36:44 365427 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:36:44 365432 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000045 Mar 19 18:36:44 365567 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:36:44 371481 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000046 Mar 19 18:36:44 371591 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:36:44 711649 [43204960] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0005ad0000027c84 port 19. Adding to light sweep sampling list Mar 19 18:36:44 711691 [43204960] -> Directed Path Dump of 5 hop path: Path = [0][1][11][1][6][18] Mar 19 18:36:44 711738 [43204960] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0005ad00000281b3 port 24. Adding to light sweep sampling list Mar 19 18:36:44 711748 [43204960] -> Directed Path Dump of 4 hop path: Path = [0][1][11][1][6] Mar 19 18:36:44 721719 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721730 [43204960] -> Discovered new port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 19 18:36:44 721736 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721744 [43204960] -> Discovered new port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 19 18:36:44 721749 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721756 [43204960] -> Discovered new port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 19 18:36:44 721761 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721767 [43204960] -> Discovered new port with GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 Mar 19 18:36:44 721772 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721779 [43204960] -> Discovered new port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 19 18:36:44 721784 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721790 [43204960] -> Discovered new port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 19 18:36:44 721795 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721802 [43204960] -> Discovered new port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 19 18:36:44 721826 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721831 [43204960] -> Discovered new port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 19 18:36:44 721845 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721850 [43204960] -> Discovered new port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 19 18:36:44 721854 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721859 [43204960] -> Discovered new port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 19 18:36:44 721862 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721867 [43204960] -> Discovered new port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 19 18:36:44 721871 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721876 [43204960] -> Discovered new port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 19 18:36:44 721880 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721884 [43204960] -> Discovered new port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 19 18:36:44 721888 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721893 [43204960] -> Discovered new port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 19 18:36:44 721897 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721923 [43204960] -> Discovered new port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 19 18:36:44 721927 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721932 [43204960] -> Discovered new port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 19 18:36:44 721936 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:36:44 721941 [43204960] -> Discovered new port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 19 18:36:44 752683 [43204960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:36:44 820881 [43C05960] -> SUBNET UP Mar 19 18:36:45 198990 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:36:45 258878 [44606960] -> SUBNET UP Mar 19 18:37:00 446068 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000000 Mar 19 18:37:00 446346 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:37:00 564122 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000001 Mar 19 18:37:00 564810 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:37:00 589920 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000002 Mar 19 18:37:00 590067 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:37:00 611770 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000003 Mar 19 18:37:00 611916 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:37:00 800652 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000004 Mar 19 18:37:00 817995 [45007960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:37:00 861575 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:37:00 983908 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000005 Mar 19 18:37:00 984004 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:37:01 012195 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000006 Mar 19 18:37:01 012283 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:37:01 034177 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000007 Mar 19 18:37:01 034272 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:37:01 056001 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000008 Mar 19 18:37:01 056068 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:37:01 074341 [43204960] -> SUBNET UP Mar 19 18:37:01 252871 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000009 Mar 19 18:37:01 253037 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:37:01 303407 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000a Mar 19 18:37:01 303490 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:37:01 325057 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000b Mar 19 18:37:01 325160 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 11 times consecutively Mar 19 18:37:01 334059 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000c Mar 19 18:37:01 334118 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 12 times consecutively Mar 19 18:37:01 474293 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:01 474317 [45007960] -> Removed port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 19 18:37:01 474341 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:01 474348 [45007960] -> Removed port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 19 18:37:01 474371 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:01 474378 [45007960] -> Removed port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 19 18:37:01 509205 [45007960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:37:01 557110 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000d Mar 19 18:37:01 557172 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 13 times consecutively Mar 19 18:37:01 565676 [43C05960] -> SUBNET UP Mar 19 18:37:01 576199 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000e Mar 19 18:37:01 576270 [41401960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 14 times consecutively Mar 19 18:37:01 599713 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000f Mar 19 18:37:01 599779 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 15 times consecutively Mar 19 18:37:01 707096 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000010 Mar 19 18:37:01 707150 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 16 times consecutively Mar 19 18:37:01 921406 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:01 921423 [45A08960] -> Removed port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 19 18:37:01 921448 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:01 921455 [45A08960] -> Removed port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 19 18:37:01 921495 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:01 921502 [45A08960] -> Removed port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 19 18:37:01 925845 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:01 925855 [41E02960] -> Discovered new port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 19 18:37:01 925859 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:01 925864 [41E02960] -> Discovered new port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 19 18:37:01 925868 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:01 925873 [41E02960] -> Discovered new port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 19 18:37:01 956691 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:37:01 999372 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000011 Mar 19 18:37:01 999470 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 17 times consecutively Mar 19 18:37:02 012194 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000012 Mar 19 18:37:02 012256 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 18 times consecutively Mar 19 18:37:02 014327 [41401960] -> SUBNET UP Mar 19 18:37:02 034202 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000013 Mar 19 18:37:02 034250 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 19 times consecutively Mar 19 18:37:02 056015 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000014 Mar 19 18:37:02 056060 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 20 times consecutively Mar 19 18:37:02 270731 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000015 Mar 19 18:37:02 270777 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 21 times consecutively Mar 19 18:37:02 271169 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000038 Mar 19 18:37:02 271347 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:37:02 462374 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000039 Mar 19 18:37:02 462511 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:37:02 496247 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000003a Mar 19 18:37:02 496310 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:37:02 624890 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:02 624902 [45A08960] -> Discovered new port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 19 18:37:02 624908 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:02 624914 [45A08960] -> Discovered new port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 19 18:37:02 624919 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:02 624926 [45A08960] -> Discovered new port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 19 18:37:02 655848 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:37:02 709115 [42803960] -> SUBNET UP Mar 19 18:37:03 082995 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000003b Mar 19 18:37:03 106373 [43204960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:37:03 136757 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:37:03 178027 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000047 Mar 19 18:37:03 178064 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000003c Mar 19 18:37:03 178139 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:37:03 178160 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:37:03 315226 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000015 Mar 19 18:37:03 315289 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 22 times consecutively Mar 19 18:37:03 341474 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000016 Mar 19 18:37:03 341549 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 23 times consecutively Mar 19 18:37:03 341616 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000003d Mar 19 18:37:03 342446 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:37:03 343169 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 19 18:37:03 343262 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x5 trans_id................0x14d08 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x11 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][6][16] Return path: [0][9][18][D][3][11] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 19 18:37:03 343371 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 19 18:37:03 343364 [45007960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 19 18:37:03 343415 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x5 trans_id................0x14d09 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x12 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][6][16] Return path: [0][9][18][D][3][11] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 19 18:37:03 343409 [45007960] -> PortInfo dump: port number.............0x11 node_guid...............0x0005ad0000027c84 port_guid...............0x0005ad0000027c84 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x11 link_width_enabled......0x2 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............INIT state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 19 18:37:03 343481 [45007960] -> Capabilities Mask: Mar 19 18:37:03 343532 [45007960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 19 18:37:03 343537 [45007960] -> PortInfo dump: port number.............0x12 node_guid...............0x0005ad0000027c84 port_guid...............0x0005ad0000027c84 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x11 link_width_enabled......0x2 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............INIT state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 19 18:37:03 343555 [45007960] -> Capabilities Mask: Mar 19 18:37:03 348684 [45007960] -> SUBNET UP Mar 19 18:37:03 461748 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000048 Mar 19 18:37:03 461958 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:37:03 484827 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000003e Mar 19 18:37:03 486448 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:37:03 528040 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000049 Mar 19 18:37:03 528154 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:37:03 580196 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000004a Mar 19 18:37:03 580534 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:37:03 599784 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000004b Mar 19 18:37:03 599879 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:37:03 621883 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000004c Mar 19 18:37:03 621940 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:37:03 707894 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:37:03 764678 [43204960] -> SUBNET UP Mar 19 18:37:03 783783 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000004d Mar 19 18:37:03 783844 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:37:04 000228 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000004e Mar 19 18:37:04 000628 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:37:04 022198 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000004f Mar 19 18:37:04 022299 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:37:04 043985 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000050 Mar 19 18:37:04 044052 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:37:04 155809 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:37:04 210448 [41401960] -> SUBNET UP Mar 19 18:37:04 504490 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000017 Mar 19 18:37:04 504569 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 24 times consecutively Mar 19 18:37:04 570084 [42803960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:37:04 626298 [43C05960] -> SUBNET UP Mar 19 18:37:54 424084 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000051 Mar 19 18:37:54 424430 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:37:54 424457 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000003f Mar 19 18:37:54 424522 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:37:54 722515 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 722536 [44606960] -> Removed port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 19 18:37:54 722558 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 722565 [44606960] -> Removed port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 19 18:37:54 722587 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 722594 [44606960] -> Removed port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 19 18:37:54 722636 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 722641 [44606960] -> Removed port with GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 Mar 19 18:37:54 722658 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 722663 [44606960] -> Removed port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 19 18:37:54 722679 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 722684 [44606960] -> Removed port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 19 18:37:54 722701 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 722706 [44606960] -> Removed port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 19 18:37:54 722723 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 722728 [44606960] -> Removed port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 19 18:37:54 722875 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 722880 [44606960] -> Removed port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 19 18:37:54 722909 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 722915 [44606960] -> Removed port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 19 18:37:54 722929 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 722934 [44606960] -> Removed port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 19 18:37:54 722949 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 722955 [44606960] -> Removed port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 19 18:37:54 722970 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 722975 [44606960] -> Removed port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 19 18:37:54 722992 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 722997 [44606960] -> Removed port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 19 18:37:54 723012 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 723073 [44606960] -> Removed port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 19 18:37:54 723090 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 723095 [44606960] -> Removed port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 19 18:37:54 723111 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:37:54 723116 [44606960] -> Removed port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 19 18:37:54 756302 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:37:54 806787 [45A08960] -> SUBNET UP Mar 19 18:37:55 149566 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:37:55 198855 [41401960] -> SUBNET UP Mar 19 18:38:48 131054 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000040 Mar 19 18:38:48 131349 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:38:48 137230 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000052 Mar 19 18:38:48 137268 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000041 Mar 19 18:38:48 137395 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:38:48 137432 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:38:48 143370 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000053 Mar 19 18:38:48 144327 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:38:48 529052 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529065 [41E02960] -> Discovered new port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 19 18:38:48 529071 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529078 [41E02960] -> Discovered new port with GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 Mar 19 18:38:48 529083 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529090 [41E02960] -> Discovered new port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 19 18:38:48 529095 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529101 [41E02960] -> Discovered new port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 19 18:38:48 529106 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529113 [41E02960] -> Discovered new port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 19 18:38:48 529118 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529124 [41E02960] -> Discovered new port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 19 18:38:48 529129 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529136 [41E02960] -> Discovered new port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 19 18:38:48 529141 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529147 [41E02960] -> Discovered new port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 19 18:38:48 529152 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529159 [41E02960] -> Discovered new port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 19 18:38:48 529164 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529170 [41E02960] -> Discovered new port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 19 18:38:48 529175 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529182 [41E02960] -> Discovered new port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 19 18:38:48 529186 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529193 [41E02960] -> Discovered new port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 19 18:38:48 529198 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529204 [41E02960] -> Discovered new port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 19 18:38:48 529209 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529216 [41E02960] -> Discovered new port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 19 18:38:48 529271 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529277 [41E02960] -> Discovered new port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 19 18:38:48 529281 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529286 [41E02960] -> Discovered new port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 19 18:38:48 529290 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:38:48 529294 [41E02960] -> Discovered new port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 19 18:38:48 560082 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:38:48 630464 [43204960] -> SUBNET UP Mar 19 18:38:49 018498 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:38:49 073355 [45007960] -> SUBNET UP Mar 19 18:39:04 189829 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000000 Mar 19 18:39:04 190072 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:39:04 307827 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000001 Mar 19 18:39:04 307940 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:39:04 330104 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000002 Mar 19 18:39:04 330210 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:39:04 468676 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000003 Mar 19 18:39:04 468758 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:39:04 680305 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000004 Mar 19 18:39:04 680400 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:39:04 702144 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000005 Mar 19 18:39:04 702286 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:39:04 704346 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:39:04 704354 [43204960] -> Removed port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 19 18:39:04 739059 [43204960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:39:04 739896 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000006 Mar 19 18:39:04 783807 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:39:04 797411 [44606960] -> SUBNET UP Mar 19 18:39:04 849970 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000007 Mar 19 18:39:04 850195 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:39:04 853735 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000008 Mar 19 18:39:04 853809 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:39:04 897727 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000009 Mar 19 18:39:04 897860 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:39:04 901577 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000a Mar 19 18:39:04 901719 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 19 18:39:04 923271 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000b Mar 19 18:39:04 923377 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 11 times consecutively Mar 19 18:39:05 106246 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000c Mar 19 18:39:05 106314 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 12 times consecutively Mar 19 18:39:05 178215 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000d Mar 19 18:39:05 178258 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 13 times consecutively Mar 19 18:39:05 272913 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000e Mar 19 18:39:05 272983 [42803960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 14 times consecutively Mar 19 18:39:05 339633 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000f Mar 19 18:39:05 339679 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 15 times consecutively Mar 19 18:39:05 469093 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000010 Mar 19 18:39:05 469145 [41401960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 16 times consecutively Mar 19 18:39:05 484587 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000011 Mar 19 18:39:05 484633 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 17 times consecutively Mar 19 18:39:05 574251 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000012 Mar 19 18:39:05 574301 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 18 times consecutively Mar 19 18:39:05 602665 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000013 Mar 19 18:39:05 602700 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 19 times consecutively Mar 19 18:39:05 646331 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000014 Mar 19 18:39:05 646369 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 20 times consecutively Mar 19 18:39:05 834613 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000015 Mar 19 18:39:05 834685 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 21 times consecutively Mar 19 18:39:05 851128 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000016 Mar 19 18:39:05 851166 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 22 times consecutively Mar 19 18:39:05 875540 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000017 Mar 19 18:39:05 875592 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 23 times consecutively Mar 19 18:39:05 897378 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000018 Mar 19 18:39:05 897424 [42803960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 24 times consecutively Mar 19 18:39:05 907232 [4780B960] -> umad_receiver: ERR 5409: send completed with error (method=0x1 attr=0x15 trans_id=0x124ef0001c2fe) -- dropping Mar 19 18:39:05 907249 [4780B960] -> umad_receiver: ERR 5411: DR SMP Mar 19 18:39:05 907259 [4780B960] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT) Mar 19 18:39:05 907295 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x6 trans_id................0x1c2fe attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x1 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][6][16][8] Return path: [0][0][0][0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Mar 19 18:39:05 907372 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:39:05 907384 [41401960] -> Removed port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 19 18:39:05 907407 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:39:05 907414 [41401960] -> Removed port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 19 18:39:05 907480 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:39:05 907485 [41401960] -> Removed port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 19 18:39:05 907577 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:39:05 907582 [41401960] -> Removed port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 19 18:39:05 907618 [41401960] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0005ad0000027c84 port 8. Adding to light sweep sampling list Mar 19 18:39:05 907657 [41401960] -> Directed Path Dump of 5 hop path: Path = [0][1][11][1][6][16] Mar 19 18:39:05 911559 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:39:05 911572 [43204960] -> Discovered new port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 19 18:39:05 927229 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000019 Mar 19 18:39:05 927285 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 25 times consecutively Mar 19 18:39:05 942538 [43204960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:39:06 000027 [41E02960] -> SUBNET UP Mar 19 18:39:06 130255 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001a Mar 19 18:39:06 130308 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 26 times consecutively Mar 19 18:39:06 131922 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000042 Mar 19 18:39:06 132063 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:39:06 154579 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001b Mar 19 18:39:06 154681 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 27 times consecutively Mar 19 18:39:06 176248 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001c Mar 19 18:39:06 176304 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 28 times consecutively Mar 19 18:39:06 198132 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001d Mar 19 18:39:06 198195 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 29 times consecutively Mar 19 18:39:06 230022 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001e Mar 19 18:39:06 230108 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 30 times consecutively Mar 19 18:39:06 230229 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000043 Mar 19 18:39:06 230311 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:39:06 399543 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:39:06 399556 [43C05960] -> Discovered new port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 19 18:39:06 399562 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:39:06 399569 [43C05960] -> Discovered new port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 19 18:39:06 399574 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:39:06 399580 [43C05960] -> Discovered new port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 19 18:39:06 399585 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 19 18:39:06 399592 [43C05960] -> Discovered new port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 19 18:39:06 430598 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:39:06 494689 [44606960] -> SUBNET UP Mar 19 18:39:06 837303 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001f Mar 19 18:39:06 837446 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 31 times consecutively Mar 19 18:39:06 838528 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000044 Mar 19 18:39:06 838636 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:39:06 876308 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:39:07 028376 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000020 Mar 19 18:39:07 028459 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 32 times consecutively Mar 19 18:39:07 028545 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000045 Mar 19 18:39:07 028652 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:39:07 030190 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000054 Mar 19 18:39:07 030277 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:39:07 096812 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000046 Mar 19 18:39:07 096959 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:39:07 111719 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 19 18:39:07 111759 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x5 trans_id................0x1dfac attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x11 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][4][16] Return path: [0][9][18][D][1][11] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 19 18:39:07 111810 [41E02960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 19 18:39:07 111814 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 19 18:39:07 111831 [41E02960] -> PortInfo dump: port number.............0x11 node_guid...............0x0005ad0000027c84 port_guid...............0x0005ad0000027c84 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x11 link_width_enabled......0x2 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............INIT state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 19 18:39:07 111868 [41E02960] -> Capabilities Mask: Mar 19 18:39:07 111844 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x5 trans_id................0x1dfad attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x12 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][4][16] Return path: [0][9][18][D][1][11] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 19 18:39:07 112011 [41401960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 19 18:39:07 112018 [41401960] -> PortInfo dump: port number.............0x12 node_guid...............0x0005ad0000027c84 port_guid...............0x0005ad0000027c84 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x11 link_width_enabled......0x2 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............INIT state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 19 18:39:07 112034 [41401960] -> Capabilities Mask: Mar 19 18:39:07 117211 [45A08960] -> SUBNET UP Mar 19 18:39:07 354540 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000047 Mar 19 18:39:07 354686 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 19 18:39:07 383453 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000055 Mar 19 18:39:07 383530 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:39:07 497601 [42803960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:39:07 548184 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000056 Mar 19 18:39:07 548217 [43C05960] -> SUBNET UP Mar 19 18:39:07 548427 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:39:07 878403 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000057 Mar 19 18:39:07 887312 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000058 Mar 19 18:39:07 888156 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:39:07 929819 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:39:07 929834 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:39:07 931166 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000059 Mar 19 18:39:07 931288 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 19 18:39:07 946406 [42803960] -> SUBNET UP Mar 19 18:39:08 073735 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000020 Mar 19 18:39:08 073811 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 33 times consecutively Mar 19 18:39:08 400790 [43204960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 18:39:08 467925 [45A08960] -> SUBNET UP Mar 19 20:24:07 009911 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0020 TID:0x0000000000000020 Mar 19 20:24:07 010153 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0020 GID:0xfe80000000000000,0x0005ad00000281ad Mar 19 20:24:07 010966 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0001 TID:0x000000000000001a Mar 19 20:24:07 011064 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0001 GID:0xfe80000000000000,0x0005ad0000027c6a Mar 19 20:24:07 390927 [43204960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 20:24:07 453747 [43204960] -> SUBNET UP Mar 19 20:24:07 839927 [45007960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 20:24:07 895694 [45A08960] -> SUBNET UP Mar 19 20:24:08 049066 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0001 TID:0x000000000000001a Mar 19 20:24:08 049322 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0001 GID:0xfe80000000000000,0x0005ad0000027c6a Mar 19 20:24:08 433979 [42803960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 20:24:08 487950 [43204960] -> SUBNET UP Mar 19 20:26:28 608381 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0020 TID:0x0000000000000021 Mar 19 20:26:28 608406 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0001 TID:0x000000000000001b Mar 19 20:26:28 608685 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0020 GID:0xfe80000000000000,0x0005ad00000281ad Mar 19 20:26:28 608693 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0001 GID:0xfe80000000000000,0x0005ad0000027c6a Mar 19 20:26:28 972140 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 20:26:29 028682 [43C05960] -> SUBNET UP Mar 19 20:26:29 399649 [43204960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 20:26:29 465737 [45007960] -> SUBNET UP Mar 19 21:30:38 775260 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0146 TID:0x000000000000002f Mar 19 21:30:38 775533 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0146 GID:0xfe80000000000000,0x0005ad00000281b6 Mar 19 21:30:38 777083 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0143 TID:0x0000000000000037 Mar 19 21:30:38 777242 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0143 GID:0xfe80000000000000,0x0005ad00000281b9 Mar 19 21:30:39 144779 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 21:30:39 200635 [43204960] -> SUBNET UP Mar 19 21:30:39 536003 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 19 21:30:39 591216 [42803960] -> SUBNET UP Mar 20 14:06:48 971082 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000021 Mar 20 14:06:48 971376 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:06:49 346734 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:06:49 346761 [42803960] -> Removed port with GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 Mar 20 14:06:49 381394 [42803960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:06:49 440803 [43204960] -> SUBNET UP Mar 20 14:07:09 098449 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000048 Mar 20 14:07:09 098708 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:07:09 098733 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000005a Mar 20 14:07:09 098777 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:07:09 417844 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 417862 [42803960] -> Removed port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:07:09 417879 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 417885 [42803960] -> Removed port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:07:09 417902 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 417907 [42803960] -> Removed port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 20 14:07:09 417924 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 417929 [42803960] -> Removed port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:07:09 417945 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 417951 [42803960] -> Removed port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:07:09 417967 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 417973 [42803960] -> Removed port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 20 14:07:09 417989 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 417994 [42803960] -> Removed port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:07:09 418131 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 418137 [42803960] -> Removed port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 20 14:07:09 418168 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 418173 [42803960] -> Removed port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:07:09 418188 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 418193 [42803960] -> Removed port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 20 14:07:09 418207 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 418212 [42803960] -> Removed port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:07:09 418227 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 418232 [42803960] -> Removed port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 20 14:07:09 418248 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 418253 [42803960] -> Removed port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:07:09 418285 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 418290 [42803960] -> Removed port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:07:09 418306 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 418362 [42803960] -> Removed port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:07:09 418378 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:07:09 418383 [42803960] -> Removed port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 20 14:07:09 451317 [42803960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:07:09 502755 [41401960] -> SUBNET UP Mar 20 14:07:09 902534 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:07:09 955229 [45A08960] -> SUBNET UP Mar 20 14:08:03 850926 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000049 Mar 20 14:08:03 851134 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:08:03 856880 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000004a Mar 20 14:08:03 856955 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:08:03 866819 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000005b Mar 20 14:08:03 866977 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:08:03 963024 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000005c Mar 20 14:08:03 963130 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:08:04 106856 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000005d Mar 20 14:08:04 106995 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:08:04 193747 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193766 [44606960] -> Discovered new port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 20 14:08:04 193771 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193777 [44606960] -> Discovered new port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 20 14:08:04 193781 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193786 [44606960] -> Discovered new port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:08:04 193790 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193795 [44606960] -> Discovered new port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:08:04 193799 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193804 [44606960] -> Discovered new port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:08:04 193808 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193813 [44606960] -> Discovered new port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:08:04 193817 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193822 [44606960] -> Discovered new port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:08:04 193826 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193830 [44606960] -> Discovered new port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:08:04 193834 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193839 [44606960] -> Discovered new port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:08:04 193843 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193848 [44606960] -> Discovered new port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 20 14:08:04 193852 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193857 [44606960] -> Discovered new port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 20 14:08:04 193861 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193866 [44606960] -> Discovered new port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 20 14:08:04 193870 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193874 [44606960] -> Discovered new port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:08:04 193878 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193883 [44606960] -> Discovered new port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 20 14:08:04 193938 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193944 [44606960] -> Discovered new port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:08:04 193948 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:04 193953 [44606960] -> Discovered new port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:08:04 224695 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:08:04 281046 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:08:04 281106 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x61eec attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x13 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][17][2][4] Return path: [0][9][14][E][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:08:04 281154 [41401960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:08:04 281159 [41401960] -> PortInfo dump: port number.............0x13 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:08:04 281172 [41401960] -> Capabilities Mask: Mar 20 14:08:04 281187 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:08:04 281213 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x61eed attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][17][2][4] Return path: [0][9][14][E][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:08:04 281279 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:08:04 281316 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x61eee attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][17][2][4] Return path: [0][9][14][E][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:08:04 281392 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:08:04 281416 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x61eef attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][6] Return path: [0][9][18][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:08:04 281515 [44606960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:08:04 281522 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:08:04 281542 [44606960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:08:04 281553 [44606960] -> Capabilities Mask: Mar 20 14:08:04 281561 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x61ef0 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][6] Return path: [0][9][18][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:08:04 281572 [44606960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:08:04 281590 [44606960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:08:04 281600 [44606960] -> Capabilities Mask: Mar 20 14:08:04 281623 [44606960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:08:04 281626 [44606960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:08:04 281635 [44606960] -> Capabilities Mask: Mar 20 14:08:04 281637 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:08:04 281652 [44606960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:08:04 281663 [44606960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:08:04 281673 [44606960] -> Capabilities Mask: Mar 20 14:08:04 281675 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x61ef1 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][6] Return path: [0][9][18][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:08:04 281721 [41E02960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:08:04 281726 [41E02960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:08:04 281736 [41E02960] -> Capabilities Mask: Mar 20 14:08:04 287136 [44606960] -> SUBNET UP Mar 20 14:08:04 711595 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:08:04 766488 [45A08960] -> SUBNET UP Mar 20 14:08:19 947200 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000000 Mar 20 14:08:19 947479 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:08:20 086909 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000001 Mar 20 14:08:20 087084 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:08:20 108865 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000002 Mar 20 14:08:20 109210 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:08:20 109996 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000003 Mar 20 14:08:20 110407 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:08:20 222523 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000004 Mar 20 14:08:20 222613 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:08:20 404596 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000005 Mar 20 14:08:20 404698 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:08:20 476804 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000006 Mar 20 14:08:20 476897 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:08:20 572434 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000007 Mar 20 14:08:20 572520 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:08:20 621715 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:20 621726 [42803960] -> Removed port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:08:20 656232 [42803960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:08:20 698700 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000008 Mar 20 14:08:20 698794 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:08:20 708598 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000009 Mar 20 14:08:20 708698 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:08:20 713653 [45007960] -> SUBNET UP Mar 20 14:08:20 730554 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000a Mar 20 14:08:20 730697 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:08:20 754139 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000b Mar 20 14:08:20 754251 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 11 times consecutively Mar 20 14:08:20 947339 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000c Mar 20 14:08:20 947426 [41401960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 12 times consecutively Mar 20 14:08:20 975965 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000d Mar 20 14:08:20 976024 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 13 times consecutively Mar 20 14:08:20 997569 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000e Mar 20 14:08:20 997648 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 14 times consecutively Mar 20 14:08:21 019465 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000f Mar 20 14:08:21 019512 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 15 times consecutively Mar 20 14:08:21 064967 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000010 Mar 20 14:08:21 065009 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 16 times consecutively Mar 20 14:08:21 082838 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000011 Mar 20 14:08:21 082877 [41401960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 17 times consecutively Mar 20 14:08:21 100567 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000012 Mar 20 14:08:21 100619 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 18 times consecutively Mar 20 14:08:21 188128 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:21 188144 [43C05960] -> Removed port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:08:21 188166 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:21 188172 [43C05960] -> Removed port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:08:21 188194 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:21 188199 [43C05960] -> Removed port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:08:21 192421 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:21 192436 [41E02960] -> Discovered new port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:08:21 208455 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000013 Mar 20 14:08:21 208499 [41401960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 19 times consecutively Mar 20 14:08:21 223240 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:08:21 394585 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000014 Mar 20 14:08:21 394665 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 20 times consecutively Mar 20 14:08:21 419333 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000015 Mar 20 14:08:21 419393 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 21 times consecutively Mar 20 14:08:21 441228 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000016 Mar 20 14:08:21 441276 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 22 times consecutively Mar 20 14:08:21 462915 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000017 Mar 20 14:08:21 462968 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 23 times consecutively Mar 20 14:08:21 475440 [45007960] -> SUBNET UP Mar 20 14:08:21 674045 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000018 Mar 20 14:08:21 674084 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000004b Mar 20 14:08:21 674137 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 24 times consecutively Mar 20 14:08:21 674294 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:08:21 965885 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000004c Mar 20 14:08:21 965992 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:08:22 092378 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:22 092395 [41401960] -> Removed port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:08:22 092415 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:22 092420 [41401960] -> Removed port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:08:22 092444 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:22 092449 [41401960] -> Removed port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:08:22 092625 [41401960] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0005ad00000281b3 port 22. Adding to light sweep sampling list Mar 20 14:08:22 092655 [41401960] -> Directed Path Dump of 4 hop path: Path = [0][1][11][1][4] Mar 20 14:08:22 092663 [41401960] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0005ad00000281b3 port 23. Adding to light sweep sampling list Mar 20 14:08:22 092672 [41401960] -> Directed Path Dump of 4 hop path: Path = [0][1][11][1][4] Mar 20 14:08:22 096789 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:22 096801 [41E02960] -> Discovered new port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:08:22 096805 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:22 096810 [41E02960] -> Discovered new port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:08:22 096814 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:22 096819 [41E02960] -> Discovered new port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:08:22 127266 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:08:22 184734 [45007960] -> SUBNET UP Mar 20 14:08:22 541974 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:22 541985 [41401960] -> Discovered new port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:08:22 541989 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:22 541995 [41401960] -> Discovered new port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:08:22 541998 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:08:22 542003 [41401960] -> Discovered new port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:08:22 572711 [41401960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:08:22 611570 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000004d Mar 20 14:08:22 611751 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:08:22 611770 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000005e Mar 20 14:08:22 612060 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:08:22 623766 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:08:22 623814 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x66134 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][5] Return path: [0][9][18][D][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:08:22 623876 [45007960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:08:22 623888 [45007960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:08:22 623907 [45007960] -> Capabilities Mask: Mar 20 14:08:22 623945 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:08:22 623973 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x66135 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][5] Return path: [0][9][18][D][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:08:22 624051 [44606960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:08:22 624056 [44606960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:08:22 624069 [44606960] -> Capabilities Mask: Mar 20 14:08:22 629289 [45A08960] -> SUBNET UP Mar 20 14:08:22 712180 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000018 Mar 20 14:08:22 712238 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 25 times consecutively Mar 20 14:08:22 869303 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000005f Mar 20 14:08:22 869527 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:08:22 892522 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000004e Mar 20 14:08:22 892707 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:08:22 957086 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000060 Mar 20 14:08:22 957189 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:08:23 080551 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000061 Mar 20 14:08:23 080621 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:08:23 102292 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000062 Mar 20 14:08:23 102372 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:08:23 124176 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000063 Mar 20 14:08:23 124278 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:08:23 285320 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000064 Mar 20 14:08:23 285393 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:08:23 403309 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000065 Mar 20 14:08:23 403388 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:08:23 425052 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000066 Mar 20 14:08:23 425117 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:08:23 447189 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000067 Mar 20 14:08:23 447266 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:08:23 535175 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:08:23 595127 [41401960] -> SUBNET UP Mar 20 14:08:23 750323 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000018 Mar 20 14:08:23 750432 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 26 times consecutively Mar 20 14:08:23 960490 [42803960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:08:24 014256 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:08:24 014323 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x67b9d attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][6] Return path: [0][9][18][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:08:24 014398 [41401960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:08:24 014408 [41401960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:08:24 014422 [41401960] -> Capabilities Mask: Mar 20 14:08:24 019479 [41401960] -> SUBNET UP Mar 20 14:11:00 201308 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F TID:0x0000000000000018 Mar 20 14:11:00 201580 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001F GID:0xfe80000000000000,0x0005ad0000027c56 Mar 20 14:11:00 554517 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:11:00 554538 [41E02960] -> Removed port with GUID:0x0005ad000002516f LID range [0xBA,0xBA] of node:saguaro-24-1 HCA-1 Mar 20 14:11:00 589140 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:11:00 641315 [45A08960] -> SUBNET UP Mar 20 14:14:16 904140 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000068 Mar 20 14:14:16 904369 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:14:16 904462 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000004f Mar 20 14:14:16 904600 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:14:17 210726 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 210747 [41401960] -> Removed port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:14:17 210796 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 210802 [41401960] -> Removed port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:14:17 210818 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 210836 [41401960] -> Removed port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 20 14:14:17 210864 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 210869 [41401960] -> Removed port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:14:17 210885 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 210890 [41401960] -> Removed port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:14:17 210908 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 210913 [41401960] -> Removed port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 20 14:14:17 210931 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 210936 [41401960] -> Removed port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:14:17 211090 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 211096 [41401960] -> Removed port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 20 14:14:17 211127 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 211133 [41401960] -> Removed port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:14:17 211147 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 211153 [41401960] -> Removed port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 20 14:14:17 211169 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 211174 [41401960] -> Removed port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:14:17 211189 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 211194 [41401960] -> Removed port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 20 14:14:17 211212 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 211216 [41401960] -> Removed port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:14:17 211232 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 211237 [41401960] -> Removed port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:14:17 211253 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 211317 [41401960] -> Removed port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:14:17 211333 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:14:17 211338 [41401960] -> Removed port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 20 14:14:17 244432 [41401960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:14:17 292747 [42803960] -> SUBNET UP Mar 20 14:14:17 698554 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:14:17 750419 [44606960] -> SUBNET UP Mar 20 14:15:11 300343 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000050 Mar 20 14:15:11 300577 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:15:11 306375 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000069 Mar 20 14:15:11 306439 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000051 Mar 20 14:15:11 306487 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:15:11 306514 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:15:11 312487 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000006a Mar 20 14:15:11 312581 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:15:11 636546 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636559 [45007960] -> Discovered new port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 20 14:15:11 636565 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636572 [45007960] -> Discovered new port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 20 14:15:11 636577 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636584 [45007960] -> Discovered new port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:15:11 636589 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636595 [45007960] -> Discovered new port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:15:11 636600 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636606 [45007960] -> Discovered new port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:15:11 636612 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636618 [45007960] -> Discovered new port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:15:11 636623 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636629 [45007960] -> Discovered new port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:15:11 636634 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636641 [45007960] -> Discovered new port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:15:11 636646 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636652 [45007960] -> Discovered new port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:15:11 636657 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636663 [45007960] -> Discovered new port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 20 14:15:11 636668 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636675 [45007960] -> Discovered new port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 20 14:15:11 636680 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636686 [45007960] -> Discovered new port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 20 14:15:11 636691 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636698 [45007960] -> Discovered new port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:15:11 636703 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636709 [45007960] -> Discovered new port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 20 14:15:11 636742 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636750 [45007960] -> Discovered new port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:15:11 636755 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:11 636761 [45007960] -> Discovered new port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:15:11 667436 [45007960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:15:11 731917 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:15:11 732017 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x6b507 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x13 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][16][1][4] Return path: [0][9][13][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:15:11 732102 [41401960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:15:11 732106 [41401960] -> PortInfo dump: port number.............0x13 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:15:11 732128 [41401960] -> Capabilities Mask: Mar 20 14:15:11 732160 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:15:11 732185 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x6b508 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][16][1][4] Return path: [0][9][13][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:15:11 732254 [44606960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:15:11 732258 [44606960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:15:11 732269 [44606960] -> Capabilities Mask: Mar 20 14:15:11 732300 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:15:11 732334 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x6b509 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][16][1][4] Return path: [0][9][13][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:15:11 732420 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:15:11 732419 [45007960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:15:11 732451 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x6b50a attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][16][1][4] Return path: [0][9][13][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:15:11 732447 [45007960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:15:11 732471 [45007960] -> Capabilities Mask: Mar 20 14:15:11 732511 [45007960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:15:11 732516 [45007960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:15:11 732529 [45007960] -> Capabilities Mask: Mar 20 14:15:11 732556 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:15:11 732591 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x6b50b attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][2][5] Return path: [0][9][18][E][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:15:11 732653 [43204960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:15:11 732662 [43204960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:15:11 732673 [43204960] -> Capabilities Mask: Mar 20 14:15:11 732705 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:15:11 732739 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x6b50c attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][2][5] Return path: [0][9][18][E][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:15:11 732809 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:15:11 732805 [41E02960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:15:11 732839 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x6b50d attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][2][5] Return path: [0][9][18][E][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:15:11 732837 [41E02960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:15:11 732856 [41E02960] -> Capabilities Mask: Mar 20 14:15:11 732898 [41E02960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:15:11 732911 [41E02960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:15:11 732925 [41E02960] -> Capabilities Mask: Mar 20 14:15:11 738354 [45A08960] -> SUBNET UP Mar 20 14:15:12 115658 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:15:12 172029 [44606960] -> SUBNET UP Mar 20 14:15:27 277617 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000000 Mar 20 14:15:27 277863 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:15:27 510410 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000001 Mar 20 14:15:27 510626 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:15:27 532239 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000002 Mar 20 14:15:27 532443 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:15:27 533517 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000003 Mar 20 14:15:27 533612 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:15:27 591171 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:27 591185 [41401960] -> Removed port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 20 14:15:27 591206 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:27 591211 [41401960] -> Removed port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:15:27 625811 [41401960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:15:27 668356 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000004 Mar 20 14:15:27 668485 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:15:27 682282 [43204960] -> SUBNET UP Mar 20 14:15:27 737313 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000005 Mar 20 14:15:27 737387 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:15:27 809341 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000006 Mar 20 14:15:27 809813 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:15:27 998181 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000007 Mar 20 14:15:27 998331 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:15:28 012193 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000008 Mar 20 14:15:28 012277 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:15:28 496329 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000009 Mar 20 14:15:28 496422 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:15:28 624912 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:28 624940 [43C05960] -> Removed port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:15:28 624965 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:28 624972 [43C05960] -> Removed port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:15:28 625001 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:28 625008 [43C05960] -> Removed port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:15:28 629507 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:28 629518 [42803960] -> Discovered new port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:15:28 649776 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000a Mar 20 14:15:28 660297 [42803960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:15:28 699777 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:15:28 716354 [41E02960] -> SUBNET UP Mar 20 14:15:28 744686 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000b Mar 20 14:15:28 744857 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 11 times consecutively Mar 20 14:15:28 811329 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000c Mar 20 14:15:28 811392 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 12 times consecutively Mar 20 14:15:28 999808 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000d Mar 20 14:15:28 999881 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 13 times consecutively Mar 20 14:15:29 029918 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000e Mar 20 14:15:29 029969 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 14 times consecutively Mar 20 14:15:29 031783 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000052 Mar 20 14:15:29 031900 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:15:29 037646 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:29 037662 [44606960] -> Removed port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 20 14:15:29 037683 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:29 037690 [44606960] -> Removed port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:15:29 037721 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:29 037726 [44606960] -> Removed port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 20 14:15:29 037741 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:29 037746 [44606960] -> Removed port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:15:29 037766 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:29 037771 [44606960] -> Removed port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 20 14:15:29 361560 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000053 Mar 20 14:15:29 361622 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:15:29 433665 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:29 433674 [43204960] -> Discovered new port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 20 14:15:29 433680 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:29 433687 [43204960] -> Discovered new port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:15:29 433692 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:29 433698 [43204960] -> Discovered new port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:15:29 433703 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:29 433709 [43204960] -> Discovered new port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:15:29 464434 [43204960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:15:29 522011 [42803960] -> SUBNET UP Mar 20 14:15:29 699605 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000006b Mar 20 14:15:29 699782 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:15:29 701115 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000054 Mar 20 14:15:29 701301 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:15:29 818974 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000055 Mar 20 14:15:29 819054 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:15:29 992006 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000056 Mar 20 14:15:29 992080 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:15:30 184132 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000006c Mar 20 14:15:30 184205 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:15:30 207030 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000057 Mar 20 14:15:30 207101 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:15:30 250541 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000006d Mar 20 14:15:30 250635 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:15:30 317366 [45A08960] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0005ad00000281a7 port 22. Adding to light sweep sampling list Mar 20 14:15:30 317409 [45A08960] -> Directed Path Dump of 4 hop path: Path = [0][1][17][1][4] Mar 20 14:15:30 494183 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000006e Mar 20 14:15:30 494247 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:15:30 521869 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:30 521879 [43C05960] -> Discovered new port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 20 14:15:30 521885 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:30 521891 [43C05960] -> Discovered new port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 20 14:15:30 521896 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:30 521903 [43C05960] -> Discovered new port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:15:30 521908 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:30 521914 [43C05960] -> Discovered new port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 20 14:15:30 521919 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:15:30 521926 [43C05960] -> Discovered new port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:15:30 552581 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:15:30 553014 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000006f Mar 20 14:15:30 592863 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:15:30 607595 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:15:30 607666 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x6f744 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][14][1][6] Return path: [0][9][15][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:15:30 607770 [44606960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:15:30 607777 [44606960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:15:30 607794 [44606960] -> Capabilities Mask: Mar 20 14:15:30 607914 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:15:30 607958 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x6f745 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][14][1][6] Return path: [0][9][15][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:15:30 608014 [43204960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:15:30 608018 [43204960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:15:30 608031 [43204960] -> Capabilities Mask: Mar 20 14:15:30 613309 [41E02960] -> SUBNET UP Mar 20 14:15:30 995108 [41401960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:15:31 050102 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:15:31 050180 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x70486 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][4] Return path: [0][9][18][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:15:31 050233 [45A08960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:15:31 050238 [45A08960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:15:31 050251 [45A08960] -> Capabilities Mask: Mar 20 14:15:31 055273 [42803960] -> SUBNET UP Mar 20 14:15:31 106129 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000e Mar 20 14:15:31 106193 [41401960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 15 times consecutively Mar 20 14:17:18 456260 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000058 Mar 20 14:17:18 456512 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:17:18 456649 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000070 Mar 20 14:17:18 456761 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:17:18 769730 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 769751 [45007960] -> Removed port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:17:18 769773 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 769780 [45007960] -> Removed port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:17:18 769803 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 769809 [45007960] -> Removed port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 20 14:17:18 769832 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 769838 [45007960] -> Removed port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:17:18 769858 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 769865 [45007960] -> Removed port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:17:18 769888 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 769895 [45007960] -> Removed port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 20 14:17:18 769927 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 769932 [45007960] -> Removed port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:17:18 770075 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 770081 [45007960] -> Removed port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 20 14:17:18 770109 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 770114 [45007960] -> Removed port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:17:18 770130 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 770135 [45007960] -> Removed port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 20 14:17:18 770150 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 770155 [45007960] -> Removed port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:17:18 770171 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 770176 [45007960] -> Removed port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 20 14:17:18 770193 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 770198 [45007960] -> Removed port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:17:18 770216 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 770221 [45007960] -> Removed port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:17:18 770238 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 770301 [45007960] -> Removed port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:17:18 770318 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:17:18 770323 [45007960] -> Removed port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 20 14:17:18 803377 [45007960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:17:18 855545 [44606960] -> SUBNET UP Mar 20 14:17:19 249722 [43204960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:17:19 300999 [45A08960] -> SUBNET UP Mar 20 14:18:11 663850 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000059 Mar 20 14:18:11 664195 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:18:11 670836 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000071 Mar 20 14:18:11 670964 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000005a Mar 20 14:18:11 671199 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:18:11 672933 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:18:11 677654 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000072 Mar 20 14:18:11 677826 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:18:12 026661 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026675 [44606960] -> Discovered new port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 20 14:18:12 026681 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026688 [44606960] -> Discovered new port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 20 14:18:12 026693 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026700 [44606960] -> Discovered new port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:18:12 026705 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026711 [44606960] -> Discovered new port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:18:12 026716 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026723 [44606960] -> Discovered new port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:18:12 026727 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026740 [44606960] -> Discovered new port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:18:12 026745 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026751 [44606960] -> Discovered new port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:18:12 026758 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026764 [44606960] -> Discovered new port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:18:12 026769 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026776 [44606960] -> Discovered new port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:18:12 026781 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026787 [44606960] -> Discovered new port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 20 14:18:12 026792 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026798 [44606960] -> Discovered new port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 20 14:18:12 026803 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026809 [44606960] -> Discovered new port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 20 14:18:12 026814 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026821 [44606960] -> Discovered new port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:18:12 026826 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026832 [44606960] -> Discovered new port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 20 14:18:12 026869 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026877 [44606960] -> Discovered new port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:18:12 026882 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:12 026888 [44606960] -> Discovered new port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:18:12 057534 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:18:12 133316 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:18:12 133419 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x72d97 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][14][3][6] Return path: [0][9][15][F][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:18:12 133466 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:18:12 133467 [43204960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:18:12 133478 [43204960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:18:12 133490 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x72d98 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][14][3][6] Return path: [0][9][15][F][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:18:12 133493 [43204960] -> Capabilities Mask: Mar 20 14:18:12 133566 [45A08960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:18:12 133595 [45A08960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:18:12 133614 [45A08960] -> Capabilities Mask: Mar 20 14:18:12 133583 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:18:12 133671 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x72d99 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][14][3][6] Return path: [0][9][15][F][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:18:12 133760 [43C05960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:18:12 133788 [43C05960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:18:12 133807 [43C05960] -> Capabilities Mask: Mar 20 14:18:12 139330 [41401960] -> SUBNET UP Mar 20 14:18:12 496444 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:18:12 558965 [41401960] -> SUBNET UP Mar 20 14:18:27 748551 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000000 Mar 20 14:18:27 748795 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:18:27 888669 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000001 Mar 20 14:18:27 888902 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:18:27 910605 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000002 Mar 20 14:18:27 910710 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:18:27 911951 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000003 Mar 20 14:18:27 912119 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:18:28 012957 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000004 Mar 20 14:18:28 013058 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:18:28 075266 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000005 Mar 20 14:18:28 075397 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:18:28 259000 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000006 Mar 20 14:18:28 259121 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:18:28 308865 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000007 Mar 20 14:18:28 309000 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:18:28 330606 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000008 Mar 20 14:18:28 330714 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:18:28 444107 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000009 Mar 20 14:18:28 444191 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:18:28 466156 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000a Mar 20 14:18:28 466234 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:18:28 478021 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000b Mar 20 14:18:28 478070 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 11 times consecutively Mar 20 14:18:28 489091 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:28 489106 [43204960] -> Removed port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:18:28 521430 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000c Mar 20 14:18:28 521499 [42803960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 12 times consecutively Mar 20 14:18:28 523658 [43204960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:18:28 580295 [43204960] -> SUBNET UP Mar 20 14:18:28 611805 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000d Mar 20 14:18:28 611893 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 13 times consecutively Mar 20 14:18:28 661292 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000e Mar 20 14:18:28 661351 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 14 times consecutively Mar 20 14:18:28 871670 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000f Mar 20 14:18:28 871732 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 15 times consecutively Mar 20 14:18:28 934440 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000010 Mar 20 14:18:28 934505 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 16 times consecutively Mar 20 14:18:28 941281 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:28 941303 [45A08960] -> Removed port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:18:28 941329 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:28 941336 [45A08960] -> Removed port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:18:28 941356 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:28 941363 [45A08960] -> Removed port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:18:28 941388 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:28 941395 [45A08960] -> Removed port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:18:28 941420 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:28 941426 [45A08960] -> Removed port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:18:28 945507 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:28 945515 [45A08960] -> Discovered new port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:18:28 956576 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000011 Mar 20 14:18:28 956665 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 17 times consecutively Mar 20 14:18:28 976211 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:18:29 033513 [42803960] -> SUBNET UP Mar 20 14:18:29 071283 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000012 Mar 20 14:18:29 071345 [41401960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 18 times consecutively Mar 20 14:18:29 352103 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000013 Mar 20 14:18:29 352155 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 19 times consecutively Mar 20 14:18:29 376386 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000014 Mar 20 14:18:29 376461 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 20 times consecutively Mar 20 14:18:29 420228 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000015 Mar 20 14:18:29 420282 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 21 times consecutively Mar 20 14:18:29 421294 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000016 Mar 20 14:18:29 421345 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 22 times consecutively Mar 20 14:18:29 461135 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000017 Mar 20 14:18:29 461179 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 23 times consecutively Mar 20 14:18:29 633008 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000018 Mar 20 14:18:29 633050 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000005b Mar 20 14:18:29 633062 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 24 times consecutively Mar 20 14:18:29 633350 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:18:29 733039 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000005c Mar 20 14:18:29 733238 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:18:29 947440 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:29 947452 [44606960] -> Discovered new port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:18:29 947457 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:29 947462 [44606960] -> Discovered new port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:18:29 947465 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:29 947470 [44606960] -> Discovered new port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:18:29 947474 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:29 947479 [44606960] -> Discovered new port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:18:29 947482 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:29 947487 [44606960] -> Discovered new port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:18:29 978182 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:18:30 027730 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:18:30 027819 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x762b8 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][4] Return path: [0][9][18][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:18:30 027897 [41401960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:18:30 027901 [41401960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:18:30 027914 [41401960] -> Capabilities Mask: Mar 20 14:18:30 027993 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:18:30 028043 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x762b9 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][4] Return path: [0][9][18][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:18:30 028098 [45A08960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:18:30 028109 [45A08960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:18:30 028124 [45A08960] -> Capabilities Mask: Mar 20 14:18:30 033824 [44606960] -> SUBNET UP Mar 20 14:18:30 418497 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:30 418522 [43C05960] -> Removed port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:18:30 453167 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:18:30 494719 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000005d Mar 20 14:18:30 494877 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:18:30 662496 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000073 Mar 20 14:18:30 662564 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:18:30 662645 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000005e Mar 20 14:18:30 662759 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:18:30 707085 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000005f Mar 20 14:18:30 707179 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:18:30 728948 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000060 Mar 20 14:18:30 729041 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:18:30 872332 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000061 Mar 20 14:18:30 872412 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:18:30 899764 [45A08960] -> SUBNET UP Mar 20 14:18:31 047423 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000062 Mar 20 14:18:31 047611 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:18:31 165201 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000063 Mar 20 14:18:31 165272 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:18:31 182461 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000074 Mar 20 14:18:31 182653 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:18:31 248834 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000075 Mar 20 14:18:31 248893 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:18:31 499830 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000076 Mar 20 14:18:31 499908 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:18:31 521824 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000077 Mar 20 14:18:31 521891 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:18:31 543713 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000078 Mar 20 14:18:31 543784 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:18:31 589490 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:18:31 589499 [43C05960] -> Discovered new port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:18:31 620166 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:18:31 672647 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:18:31 672739 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x77d11 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][4] Return path: [0][9][18][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:18:31 672817 [43C05960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:18:31 672823 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:18:31 672833 [43C05960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:18:31 672852 [43C05960] -> Capabilities Mask: Mar 20 14:18:31 672861 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x77d12 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][4] Return path: [0][9][18][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:18:31 672918 [45007960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:18:31 672922 [45007960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:18:31 672936 [45007960] -> Capabilities Mask: Mar 20 14:18:31 678085 [45A08960] -> SUBNET UP Mar 20 14:18:31 723715 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000018 Mar 20 14:18:31 723815 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 25 times consecutively Mar 20 14:18:32 061932 [41401960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:18:32 113545 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:18:32 113610 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x78a4d attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x13 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][15][4][4] Return path: [0][9][18][D][4] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:18:32 113712 [42803960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:18:32 113725 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:18:32 113730 [42803960] -> PortInfo dump: port number.............0x13 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x4 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:18:32 113745 [42803960] -> Capabilities Mask: Mar 20 14:18:32 113751 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x78a4e attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][15][4][4] Return path: [0][9][18][D][4] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:18:32 113803 [44606960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:18:32 113807 [44606960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x4 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:18:32 113820 [44606960] -> Capabilities Mask: Mar 20 14:18:32 113845 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:18:32 113907 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x78a4f attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][15][4][4] Return path: [0][9][18][D][4] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:18:32 113958 [41E02960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:18:32 113963 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:18:32 113969 [41E02960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x4 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:18:32 113986 [41E02960] -> Capabilities Mask: Mar 20 14:18:32 113992 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x78a50 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][15][4][4] Return path: [0][9][18][D][4] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:18:32 114052 [45A08960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:18:32 114066 [45A08960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x4 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:18:32 114089 [45A08960] -> Capabilities Mask: Mar 20 14:18:32 114052 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:18:32 114171 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x78a51 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][13][1][6] Return path: [0][9][13][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:18:32 114224 [42803960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:18:32 114228 [42803960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:18:32 114242 [42803960] -> Capabilities Mask: Mar 20 14:18:32 119326 [42803960] -> SUBNET UP Mar 20 14:23:02 506774 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000019 Mar 20 14:23:02 507064 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:23:02 861642 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:23:02 861653 [43204960] -> Discovered new port with GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:Topspin IB-DC Mar 20 14:23:02 893030 [43204960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:23:02 943693 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:23:02 943765 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x5 trans_id................0x79aff attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x1 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][5][18] Return path: [0][9][18][D][2][13] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 13 03 03 02 14 52 00 11 40 40 00 08 08 04 2C 4C 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:23:02 943854 [43C05960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:23:02 943870 [43C05960] -> PortInfo dump: port number.............0x1 node_guid...............0x0005ad0000027c84 port_guid...............0x0005ad0000027c84 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x13 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0x2C vl_enforce..............0x4C m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:23:02 943886 [43C05960] -> Capabilities Mask: Mar 20 14:23:02 948898 [43C05960] -> SUBNET UP Mar 20 14:23:03 237496 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x00AF TID:0x0000000000000000 Mar 20 14:23:03 237710 [42803960] -> osm_report_notice: Reporting Generic Notice type:4 num:144 from LID:0x00AF GID:0xfe80000000000000,0x0005ad0000024b27 Mar 20 14:23:03 605548 [45007960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:23:03 662757 [41401960] -> SUBNET UP Mar 20 14:24:54 675782 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000079 Mar 20 14:24:54 676077 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:24:54 677026 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000064 Mar 20 14:24:54 677118 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:24:55 047478 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 047501 [43204960] -> Removed port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:24:55 047520 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 047525 [43204960] -> Removed port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:24:55 047541 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 047546 [43204960] -> Removed port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 20 14:24:55 047563 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 047569 [43204960] -> Removed port with GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:Topspin IB-DC Mar 20 14:24:55 047586 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 047591 [43204960] -> Removed port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:24:55 047607 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 047612 [43204960] -> Removed port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:24:55 047630 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 047635 [43204960] -> Removed port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 20 14:24:55 047652 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 047657 [43204960] -> Removed port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:24:55 047798 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 047803 [43204960] -> Removed port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 20 14:24:55 047836 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 047842 [43204960] -> Removed port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:24:55 047857 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 047862 [43204960] -> Removed port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 20 14:24:55 047877 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 047882 [43204960] -> Removed port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:24:55 047896 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 047902 [43204960] -> Removed port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 20 14:24:55 047918 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 047923 [43204960] -> Removed port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:24:55 047939 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 047988 [43204960] -> Removed port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:24:55 048005 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 048010 [43204960] -> Removed port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:24:55 048025 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:24:55 048030 [43204960] -> Removed port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 20 14:24:55 081006 [43204960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:24:55 130875 [45A08960] -> SUBNET UP Mar 20 14:24:55 484995 [42803960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:24:55 535902 [42803960] -> SUBNET UP Mar 20 14:25:48 653788 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000065 Mar 20 14:25:48 654009 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:25:48 659749 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000007a Mar 20 14:25:48 659790 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000066 Mar 20 14:25:48 659814 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:25:48 659963 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:25:48 665972 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000007b Mar 20 14:25:48 666050 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:25:49 025384 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025396 [41E02960] -> Discovered new port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 20 14:25:49 025401 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025406 [41E02960] -> Discovered new port with GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 Mar 20 14:25:49 025410 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025416 [41E02960] -> Discovered new port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 20 14:25:49 025420 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025425 [41E02960] -> Discovered new port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:25:49 025428 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025433 [41E02960] -> Discovered new port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:25:49 025437 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025442 [41E02960] -> Discovered new port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:25:49 025446 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025451 [41E02960] -> Discovered new port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:25:49 025461 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025466 [41E02960] -> Discovered new port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:25:49 025470 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025475 [41E02960] -> Discovered new port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:25:49 025483 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025488 [41E02960] -> Discovered new port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 20 14:25:49 025491 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025496 [41E02960] -> Discovered new port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 20 14:25:49 025500 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025505 [41E02960] -> Discovered new port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 20 14:25:49 025508 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025513 [41E02960] -> Discovered new port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:25:49 025517 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025522 [41E02960] -> Discovered new port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 20 14:25:49 025556 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025562 [41E02960] -> Discovered new port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:25:49 025565 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025570 [41E02960] -> Discovered new port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:25:49 025574 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:25:49 025579 [41E02960] -> Discovered new port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:25:49 056324 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:25:49 126247 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:25:49 126356 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x7d165 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x13 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][15][1][6] Return path: [0][9][18][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:25:49 126409 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:25:49 126442 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x7d166 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][15][1][6] Return path: [0][9][18][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:25:49 126496 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:25:49 126489 [42803960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:25:49 126535 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x7d167 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][15][1][6] Return path: [0][9][18][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:25:49 126526 [42803960] -> PortInfo dump: port number.............0x13 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:25:49 126567 [42803960] -> Capabilities Mask: Mar 20 14:25:49 126613 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:25:49 126617 [42803960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:25:49 126658 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x7d168 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][15][1][6] Return path: [0][9][18][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:25:49 126653 [42803960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:25:49 126687 [42803960] -> Capabilities Mask: Mar 20 14:25:49 126703 [43204960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:25:49 126709 [43204960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:25:49 126744 [43204960] -> Capabilities Mask: Mar 20 14:25:49 126765 [43C05960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:25:49 126770 [43C05960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:25:49 126874 [43C05960] -> Capabilities Mask: Mar 20 14:25:49 126975 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:25:49 127015 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x7d169 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][13][1][6] Return path: [0][9][13][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:25:49 127066 [45A08960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:25:49 127072 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:25:49 127084 [45A08960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:25:49 127103 [45A08960] -> Capabilities Mask: Mar 20 14:25:49 127121 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x7d16a attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][13][1][6] Return path: [0][9][13][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:25:49 127188 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:25:49 127220 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x7d16b attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][13][1][6] Return path: [0][9][13][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:25:49 127326 [44606960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:25:49 127339 [44606960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:25:49 127357 [44606960] -> Capabilities Mask: Mar 20 14:25:49 127378 [45007960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:25:49 127397 [45007960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:25:49 127410 [45007960] -> Capabilities Mask: Mar 20 14:25:49 132961 [43204960] -> SUBNET UP Mar 20 14:25:49 523879 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:25:49 580522 [42803960] -> SUBNET UP Mar 20 14:26:04 718574 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000000 Mar 20 14:26:04 718819 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:26:04 836781 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000001 Mar 20 14:26:04 836881 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:26:04 858762 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000002 Mar 20 14:26:04 860242 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:26:04 997451 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000003 Mar 20 14:26:04 997647 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:26:05 180722 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000004 Mar 20 14:26:05 180855 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:26:05 209122 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000005 Mar 20 14:26:05 209200 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:26:05 347419 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000006 Mar 20 14:26:05 347488 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:26:05 378670 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000007 Mar 20 14:26:05 378739 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:26:05 409112 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:05 409121 [41401960] -> Removed port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:26:05 443639 [41401960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:26:05 483503 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000008 Mar 20 14:26:05 486002 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:26:05 499183 [44606960] -> SUBNET UP Mar 20 14:26:05 499856 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000009 Mar 20 14:26:05 499941 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:26:05 521857 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000a Mar 20 14:26:05 521971 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:26:05 532569 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000b Mar 20 14:26:05 532624 [41401960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 11 times consecutively Mar 20 14:26:05 633813 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000c Mar 20 14:26:05 633869 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 12 times consecutively Mar 20 14:26:05 655421 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000d Mar 20 14:26:05 655501 [41401960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 13 times consecutively Mar 20 14:26:05 702652 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000e Mar 20 14:26:05 702745 [42803960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 14 times consecutively Mar 20 14:26:05 817201 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:05 817216 [43204960] -> Removed port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:26:05 817235 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:05 817241 [43204960] -> Removed port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:26:05 817259 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:05 817264 [43204960] -> Removed port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:26:05 821322 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:05 821330 [41E02960] -> Discovered new port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:26:05 847950 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000f Mar 20 14:26:05 848031 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 15 times consecutively Mar 20 14:26:05 852036 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:26:05 893954 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000010 Mar 20 14:26:05 894021 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 16 times consecutively Mar 20 14:26:05 910489 [44606960] -> SUBNET UP Mar 20 14:26:05 999993 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000011 Mar 20 14:26:06 000039 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 17 times consecutively Mar 20 14:26:06 021880 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000012 Mar 20 14:26:06 021970 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 18 times consecutively Mar 20 14:26:06 043912 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000013 Mar 20 14:26:06 044001 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 19 times consecutively Mar 20 14:26:06 052878 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000014 Mar 20 14:26:06 052975 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 20 times consecutively Mar 20 14:26:06 147560 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000015 Mar 20 14:26:06 147616 [42803960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 21 times consecutively Mar 20 14:26:06 158945 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000016 Mar 20 14:26:06 158978 [41401960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 22 times consecutively Mar 20 14:26:06 346046 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000017 Mar 20 14:26:06 346106 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 23 times consecutively Mar 20 14:26:06 405311 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000018 Mar 20 14:26:06 405349 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 24 times consecutively Mar 20 14:26:06 632882 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000019 Mar 20 14:26:06 632923 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 25 times consecutively Mar 20 14:26:06 634031 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000067 Mar 20 14:26:06 634110 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:26:06 883831 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001a Mar 20 14:26:06 883879 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 26 times consecutively Mar 20 14:26:06 885475 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000068 Mar 20 14:26:06 885560 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:26:06 982877 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001b Mar 20 14:26:06 982926 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 27 times consecutively Mar 20 14:26:06 992809 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000069 Mar 20 14:26:06 992871 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:26:06 992909 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001c Mar 20 14:26:06 992943 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 28 times consecutively Mar 20 14:26:06 993058 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:06 993065 [41E02960] -> Discovered new port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:26:06 993069 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:06 993074 [41E02960] -> Discovered new port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:26:07 023890 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:26:07 085081 [41E02960] -> SUBNET UP Mar 20 14:26:07 348105 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001d Mar 20 14:26:07 348218 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 29 times consecutively Mar 20 14:26:07 348958 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000006a Mar 20 14:26:07 349041 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:26:07 540871 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000006b Mar 20 14:26:07 540983 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:26:07 541063 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000007c Mar 20 14:26:07 541131 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:26:07 585394 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000006c Mar 20 14:26:07 585464 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:26:07 607406 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000006d Mar 20 14:26:07 607486 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:26:07 850410 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000006e Mar 20 14:26:07 850483 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:26:07 956365 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000007d Mar 20 14:26:07 956404 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:07 956413 [42803960] -> Discovered new port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:26:07 987136 [42803960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:26:08 018887 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:26:08 032634 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:26:08 032679 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x813ce attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][12][4][5] Return path: [0][9][14][D][5] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 05 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:26:08 032749 [41E02960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:26:08 032757 [41E02960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x5 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:26:08 032774 [41E02960] -> Capabilities Mask: Mar 20 14:26:08 033119 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:26:08 033154 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x813cf attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][12][4][5] Return path: [0][9][14][D][5] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 05 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:26:08 033202 [43C05960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:26:08 033213 [43C05960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x5 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:26:08 033231 [43C05960] -> Capabilities Mask: Mar 20 14:26:08 038497 [45A08960] -> SUBNET UP Mar 20 14:26:08 055480 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000007e Mar 20 14:26:08 055587 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:26:08 372288 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000007f Mar 20 14:26:08 376158 [42803960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:26:08 418607 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000080 Mar 20 14:26:08 420668 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:26:08 420714 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:26:08 430046 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:26:08 430147 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x820fa attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][15][1][4] Return path: [0][9][18][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:26:08 430236 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:26:08 430236 [43C05960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:26:08 430267 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x820fb attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][12][1][6] Return path: [0][9][14][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:26:08 430262 [43C05960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:26:08 430286 [43C05960] -> Capabilities Mask: Mar 20 14:26:08 430350 [43C05960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:26:08 430362 [43C05960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:26:08 430375 [43C05960] -> Capabilities Mask: Mar 20 14:26:08 435317 [43C05960] -> SUBNET UP Mar 20 14:26:08 583769 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001e Mar 20 14:26:08 583903 [41401960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 30 times consecutively Mar 20 14:26:08 854841 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:26:08 913273 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:26:08 913349 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x82e32 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x13 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][17][2][5] Return path: [0][9][14][E][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:26:08 913415 [45A08960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:26:08 913432 [45A08960] -> PortInfo dump: port number.............0x13 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:26:08 913449 [45A08960] -> Capabilities Mask: Mar 20 14:26:08 913598 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:26:08 913676 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x82e33 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][17][2][5] Return path: [0][9][14][E][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:26:08 913727 [43C05960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:26:08 913732 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:26:08 913734 [43C05960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:26:08 913752 [43C05960] -> Capabilities Mask: Mar 20 14:26:08 913766 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x82e34 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][17][2][5] Return path: [0][9][14][E][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:26:08 913828 [41E02960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:26:08 913833 [41E02960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:26:08 913848 [41E02960] -> Capabilities Mask: Mar 20 14:26:08 918887 [41E02960] -> SUBNET UP Mar 20 14:26:48 657517 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000006f Mar 20 14:26:48 657779 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:26:48 658393 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000081 Mar 20 14:26:48 658465 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:26:48 979610 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 979629 [41401960] -> Removed port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:26:48 979652 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 979660 [41401960] -> Removed port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:26:48 979682 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 979688 [41401960] -> Removed port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 20 14:26:48 979721 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 979727 [41401960] -> Removed port with GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 Mar 20 14:26:48 979770 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 979782 [41401960] -> Removed port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:26:48 979799 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 979804 [41401960] -> Removed port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:26:48 979822 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 979827 [41401960] -> Removed port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 20 14:26:48 979845 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 979849 [41401960] -> Removed port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:26:48 980028 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 980033 [41401960] -> Removed port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 20 14:26:48 980061 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 980066 [41401960] -> Removed port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:26:48 980081 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 980087 [41401960] -> Removed port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 20 14:26:48 980102 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 980107 [41401960] -> Removed port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:26:48 980122 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 980127 [41401960] -> Removed port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 20 14:26:48 980143 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 980148 [41401960] -> Removed port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:26:48 980163 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 980239 [41401960] -> Removed port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:26:48 980256 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 980261 [41401960] -> Removed port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:26:48 980365 [41401960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:26:48 980371 [41401960] -> Removed port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 20 14:26:49 013365 [41401960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:26:49 065887 [43C05960] -> SUBNET UP Mar 20 14:26:49 407010 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:26:49 459477 [44606960] -> SUBNET UP Mar 20 14:27:42 754098 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000070 Mar 20 14:27:42 754349 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:27:42 760115 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000071 Mar 20 14:27:42 760178 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000082 Mar 20 14:27:42 760236 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:27:42 760406 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:27:42 766931 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000083 Mar 20 14:27:42 767049 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:27:43 085327 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085345 [43C05960] -> Discovered new port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 20 14:27:43 085349 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085355 [43C05960] -> Discovered new port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:27:43 085359 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085364 [43C05960] -> Discovered new port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:27:43 085368 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085373 [43C05960] -> Discovered new port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:27:43 085377 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085382 [43C05960] -> Discovered new port with GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 Mar 20 14:27:43 085386 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085390 [43C05960] -> Discovered new port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 20 14:27:43 085394 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085399 [43C05960] -> Discovered new port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:27:43 085403 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085407 [43C05960] -> Discovered new port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 20 14:27:43 085411 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085416 [43C05960] -> Discovered new port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:27:43 085420 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085425 [43C05960] -> Discovered new port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:27:43 085428 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085433 [43C05960] -> Discovered new port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:27:43 085437 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085442 [43C05960] -> Discovered new port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 20 14:27:43 085446 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085450 [43C05960] -> Discovered new port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 20 14:27:43 085454 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085459 [43C05960] -> Discovered new port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:27:43 085511 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085517 [43C05960] -> Discovered new port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 20 14:27:43 085521 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085526 [43C05960] -> Discovered new port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:27:43 085530 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:43 085534 [43C05960] -> Discovered new port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:27:43 116308 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:27:43 179935 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:27:43 179980 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x85669 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x13 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][16][1][4] Return path: [0][9][13][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:27:43 180019 [41401960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:27:43 180037 [41401960] -> PortInfo dump: port number.............0x13 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:27:43 180050 [41401960] -> Capabilities Mask: Mar 20 14:27:43 180092 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:27:43 180137 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x8566a attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][16][1][4] Return path: [0][9][13][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:27:43 180185 [44606960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:27:43 180189 [44606960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:27:43 180199 [44606960] -> Capabilities Mask: Mar 20 14:27:43 180239 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:27:43 180263 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x8566b attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][16][1][4] Return path: [0][9][13][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:27:43 180307 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:27:43 180319 [42803960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:27:43 180332 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x8566c attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][16][1][4] Return path: [0][9][13][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:27:43 180336 [42803960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:27:43 180364 [42803960] -> Capabilities Mask: Mar 20 14:27:43 180389 [42803960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:27:43 180410 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:27:43 180392 [42803960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:27:43 180415 [42803960] -> Capabilities Mask: Mar 20 14:27:43 180436 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x8566d attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][2][5] Return path: [0][9][18][E][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:27:43 180490 [41E02960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:27:43 180494 [41E02960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:27:43 180504 [41E02960] -> Capabilities Mask: Mar 20 14:27:43 180536 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:27:43 180560 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x8566e attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][2][5] Return path: [0][9][18][E][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:27:43 180606 [45007960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:27:43 180615 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:27:43 180634 [45007960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:27:43 180657 [45007960] -> Capabilities Mask: Mar 20 14:27:43 180678 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x8566f attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][2][5] Return path: [0][9][18][E][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:27:43 180769 [43C05960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:27:43 180775 [43C05960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:27:43 180789 [43C05960] -> Capabilities Mask: Mar 20 14:27:43 186228 [43204960] -> SUBNET UP Mar 20 14:27:43 557268 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:27:43 611082 [45A08960] -> SUBNET UP Mar 20 14:27:58 852744 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000000 Mar 20 14:27:58 852982 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:27:58 970772 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000001 Mar 20 14:27:58 970864 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:27:58 992628 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000002 Mar 20 14:27:58 992712 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:27:59 132331 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000003 Mar 20 14:27:59 132484 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:27:59 314893 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000004 Mar 20 14:27:59 315006 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:27:59 343241 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000005 Mar 20 14:27:59 343320 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:27:59 481698 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000006 Mar 20 14:27:59 481775 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:27:59 512746 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000007 Mar 20 14:27:59 512853 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:27:59 548851 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:59 548861 [41E02960] -> Removed port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:27:59 583414 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:27:59 583817 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000008 Mar 20 14:27:59 623971 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:27:59 626182 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000009 Mar 20 14:27:59 626329 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:27:59 634080 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000a Mar 20 14:27:59 634442 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:27:59 641962 [45A08960] -> SUBNET UP Mar 20 14:27:59 656231 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000b Mar 20 14:27:59 656307 [41401960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 11 times consecutively Mar 20 14:27:59 689788 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000c Mar 20 14:27:59 690249 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 12 times consecutively Mar 20 14:27:59 758521 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000d Mar 20 14:27:59 758646 [42803960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 13 times consecutively Mar 20 14:27:59 970740 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000e Mar 20 14:27:59 970812 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 14 times consecutively Mar 20 14:27:59 985557 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:59 985577 [41E02960] -> Removed port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:27:59 985601 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:59 985615 [41E02960] -> Removed port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:27:59 985649 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:59 985656 [41E02960] -> Removed port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:27:59 989767 [42803960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:27:59 989787 [42803960] -> Discovered new port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:28:00 014445 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000f Mar 20 14:28:00 014524 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 15 times consecutively Mar 20 14:28:00 020896 [42803960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:28:00 086824 [43204960] -> SUBNET UP Mar 20 14:28:00 124057 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000010 Mar 20 14:28:00 124108 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 16 times consecutively Mar 20 14:28:00 131596 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000011 Mar 20 14:28:00 131643 [41401960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 17 times consecutively Mar 20 14:28:00 412484 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000012 Mar 20 14:28:00 412528 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 18 times consecutively Mar 20 14:28:00 436877 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000013 Mar 20 14:28:00 436921 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 19 times consecutively Mar 20 14:28:00 458745 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000014 Mar 20 14:28:00 458816 [42803960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 20 times consecutively Mar 20 14:28:00 480551 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000015 Mar 20 14:28:00 480599 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 21 times consecutively Mar 20 14:28:00 695340 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000016 Mar 20 14:28:00 695386 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 22 times consecutively Mar 20 14:28:00 695726 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000072 Mar 20 14:28:00 695886 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:28:00 719764 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000017 Mar 20 14:28:00 719825 [41401960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 23 times consecutively Mar 20 14:28:00 743680 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000018 Mar 20 14:28:00 743775 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 24 times consecutively Mar 20 14:28:00 763599 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000019 Mar 20 14:28:00 763654 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 25 times consecutively Mar 20 14:28:00 813393 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001a Mar 20 14:28:00 813473 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 26 times consecutively Mar 20 14:28:00 831287 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001b Mar 20 14:28:00 831302 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000073 Mar 20 14:28:00 831383 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 27 times consecutively Mar 20 14:28:00 831424 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:28:00 841593 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001c Mar 20 14:28:00 841644 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 28 times consecutively Mar 20 14:28:01 050511 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:28:01 050529 [41E02960] -> Discovered new port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:28:01 050535 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:28:01 050542 [41E02960] -> Discovered new port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:28:01 050547 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:28:01 050554 [41E02960] -> Discovered new port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:28:01 081322 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:28:01 142873 [43204960] -> SUBNET UP Mar 20 14:28:01 460275 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001d Mar 20 14:28:01 460358 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 29 times consecutively Mar 20 14:28:01 488474 [45007960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:28:01 538634 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:28:01 538712 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x898d1 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][6] Return path: [0][9][18][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:28:01 538752 [42803960] -> osm_pi_rcv_process_set: ERR 0F10: Received error status for SetResp() Mar 20 14:28:01 538758 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:28:01 538767 [42803960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............DOWN state_info2.............0x42 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:28:01 538795 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x898d2 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][6] Return path: [0][9][18][D][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:28:01 538810 [42803960] -> Capabilities Mask: Mar 20 14:28:01 538849 [42803960] -> osm_pi_rcv_process_set: ERR 0F10: Received error status for SetResp() Mar 20 14:28:01 538856 [42803960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............DOWN state_info2.............0x42 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:28:01 538871 [42803960] -> Capabilities Mask: Mar 20 14:28:01 539658 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:28:01 539696 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x5 trans_id................0x898d3 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x11 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][15][1][4][17] Return path: [0][9][18][D][1][16] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 02 03 02 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:28:01 539778 [45A08960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:28:01 539784 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:28:01 539798 [45A08960] -> PortInfo dump: port number.............0x11 node_guid...............0x0005ad0000027c84 port_guid...............0x0005ad0000027c84 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x16 link_width_enabled......0x2 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............DOWN state_info2.............0x42 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:28:01 539834 [45A08960] -> Capabilities Mask: Mar 20 14:28:01 539844 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x5 trans_id................0x898d4 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x12 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][15][1][4][17] Return path: [0][9][18][D][1][16] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 02 03 02 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:28:01 539903 [45007960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:28:01 539908 [45007960] -> PortInfo dump: port number.............0x12 node_guid...............0x0005ad0000027c84 port_guid...............0x0005ad0000027c84 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x16 link_width_enabled......0x2 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............DOWN state_info2.............0x42 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:28:01 539924 [45007960] -> Capabilities Mask: Mar 20 14:28:01 545091 [45007960] -> SUBNET UP Mar 20 14:28:01 652647 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000084 Mar 20 14:28:01 652864 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:28:01 879631 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000074 Mar 20 14:28:01 880104 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:28:01 962839 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000075 Mar 20 14:28:01 965155 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:28:02 006432 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000085 Mar 20 14:28:02 030610 [42803960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:28:02 068999 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:28:02 081130 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:28:02 081198 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x8a604 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][4][4] Return path: [0][9][18][D][4] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:28:02 081249 [43204960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:28:02 081257 [43204960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x4 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:28:02 081275 [43204960] -> Capabilities Mask: Mar 20 14:28:02 081650 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:28:02 081713 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x8a605 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][4][4] Return path: [0][9][18][D][4] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:28:02 081782 [43C05960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:28:02 081787 [43C05960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x4 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:28:02 081802 [43C05960] -> Capabilities Mask: Mar 20 14:28:02 087435 [45A08960] -> SUBNET UP Mar 20 14:28:02 170696 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000086 Mar 20 14:28:02 170819 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:28:02 459228 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:28:02 500761 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000087 Mar 20 14:28:02 500979 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:28:02 510190 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:28:02 510258 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x8b330 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][17][1][5] Return path: [0][9][14][D][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:28:02 510366 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:28:02 510370 [45007960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:28:02 510384 [45007960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:28:02 510394 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x8b331 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][3][6] Return path: [0][9][18][F][3] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:28:02 510398 [45007960] -> Capabilities Mask: Mar 20 14:28:02 510481 [41401960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:28:02 510491 [41401960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x3 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:28:02 510509 [41401960] -> Capabilities Mask: Mar 20 14:28:02 510511 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000088 Mar 20 14:28:02 515576 [41401960] -> SUBNET UP Mar 20 14:28:02 515695 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:28:02 532552 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000089 Mar 20 14:28:02 538569 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:28:02 695997 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001e Mar 20 14:28:02 696096 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 30 times consecutively Mar 20 14:28:02 918226 [45007960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:28:02 975244 [43204960] -> SUBNET UP Mar 20 14:28:03 325494 [43204960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:28:03 379145 [41401960] -> SUBNET UP Mar 20 14:29:12 561841 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F TID:0x0000000000000019 Mar 20 14:29:12 562033 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001F GID:0xfe80000000000000,0x0005ad0000027c56 Mar 20 14:29:12 562751 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F TID:0x000000000000001a Mar 20 14:29:12 562902 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001F GID:0xfe80000000000000,0x0005ad0000027c56 Mar 20 14:29:12 571346 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0084 TID:0x0000000000000018 Mar 20 14:29:12 571569 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0084 GID:0xfe80000000000000,0x0005ad0000027c70 Mar 20 14:29:12 914371 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F TID:0x000000000000001b Mar 20 14:29:12 916287 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:12 916297 [44606960] -> Discovered new port with GUID:0x0005ad000002502f LID range [0x2,0x2] of node:Topspin IB-DC Mar 20 14:29:12 946985 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:29:12 976839 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001F GID:0xfe80000000000000,0x0005ad0000027c56 Mar 20 14:29:12 987963 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:29:12 988004 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x5 trans_id................0x8dbb2 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0xD m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][2][4][D] Return path: [0][9][18][E][1][12] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 12 03 03 02 14 52 00 11 40 40 00 08 08 04 2C 4C 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:29:12 988078 [43C05960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:29:12 988089 [43C05960] -> PortInfo dump: port number.............0xD node_guid...............0x0005ad0000027c70 port_guid...............0x0005ad0000027c70 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x12 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0x2C vl_enforce..............0x4C m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:29:12 988105 [43C05960] -> Capabilities Mask: Mar 20 14:29:12 993136 [44606960] -> SUBNET UP Mar 20 14:29:13 300755 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x0002 TID:0x0000000000000000 Mar 20 14:29:13 300874 [41E02960] -> osm_report_notice: Reporting Generic Notice type:4 num:144 from LID:0x0002 GID:0xfe80000000000000,0x0005ad000002502f Mar 20 14:29:13 338077 [41E02960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:13 338099 [41E02960] -> Discovered new port with GUID:0x0005ad000002516f LID range [0xBA,0xBA] of node:Topspin IB-DC Mar 20 14:29:13 368879 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:29:13 431763 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:29:13 431806 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x5 trans_id................0x8e8e9 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0xA m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][14][1][6][12] Return path: [0][9][15][D][3][11] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 11 03 03 02 14 52 00 11 40 40 00 08 08 04 2C 4C 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:29:13 432093 [43204960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:29:13 432116 [43204960] -> PortInfo dump: port number.............0xA node_guid...............0x0005ad0000027c56 port_guid...............0x0005ad0000027c56 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x11 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0x2C vl_enforce..............0x4C m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:29:13 432135 [43204960] -> Capabilities Mask: Mar 20 14:29:13 437219 [45007960] -> SUBNET UP Mar 20 14:29:13 690992 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x00BA TID:0x0000000000000000 Mar 20 14:29:13 691128 [42803960] -> osm_report_notice: Reporting Generic Notice type:4 num:144 from LID:0x00BA GID:0xfe80000000000000,0x0005ad000002516f Mar 20 14:29:13 835017 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:29:13 891082 [42803960] -> SUBNET UP Mar 20 14:29:14 235714 [42803960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:29:14 289127 [41E02960] -> SUBNET UP Mar 20 14:29:17 689267 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000008a Mar 20 14:29:17 689511 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:29:17 689975 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000076 Mar 20 14:29:17 690097 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:29:18 025237 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025255 [44606960] -> Removed port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:29:18 025273 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025279 [44606960] -> Removed port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:29:18 025296 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025300 [44606960] -> Removed port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 20 14:29:18 025317 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025323 [44606960] -> Removed port with GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 Mar 20 14:29:18 025340 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025345 [44606960] -> Removed port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:29:18 025362 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025367 [44606960] -> Removed port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:29:18 025385 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025390 [44606960] -> Removed port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 20 14:29:18 025406 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025411 [44606960] -> Removed port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:29:18 025571 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025576 [44606960] -> Removed port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 20 14:29:18 025612 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025619 [44606960] -> Removed port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:29:18 025634 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025639 [44606960] -> Removed port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 20 14:29:18 025655 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025660 [44606960] -> Removed port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:29:18 025678 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025683 [44606960] -> Removed port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 20 14:29:18 025700 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025705 [44606960] -> Removed port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:29:18 025721 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025777 [44606960] -> Removed port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:29:18 025794 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025800 [44606960] -> Removed port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:29:18 025816 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:29:18 025821 [44606960] -> Removed port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 20 14:29:18 058968 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:29:18 112970 [43C05960] -> SUBNET UP Mar 20 14:29:18 511156 [45007960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:29:18 573846 [41E02960] -> SUBNET UP Mar 20 14:30:11 599965 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000077 Mar 20 14:30:11 600182 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:30:11 606044 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000078 Mar 20 14:30:11 606078 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000008b Mar 20 14:30:11 606178 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:30:11 606207 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:30:11 612375 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000008c Mar 20 14:30:11 612499 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:30:11 947057 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947074 [45007960] -> Discovered new port with GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 Mar 20 14:30:11 947079 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947084 [45007960] -> Discovered new port with GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 Mar 20 14:30:11 947088 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947093 [45007960] -> Discovered new port with GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 Mar 20 14:30:11 947097 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947102 [45007960] -> Discovered new port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:30:11 947106 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947118 [45007960] -> Discovered new port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:30:11 947138 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947143 [45007960] -> Discovered new port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:30:11 947148 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947153 [45007960] -> Discovered new port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:30:11 947157 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947162 [45007960] -> Discovered new port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:30:11 947166 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947170 [45007960] -> Discovered new port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:30:11 947174 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947179 [45007960] -> Discovered new port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:30:11 947183 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947188 [45007960] -> Discovered new port with GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 Mar 20 14:30:11 947191 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947196 [45007960] -> Discovered new port with GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 Mar 20 14:30:11 947200 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947205 [45007960] -> Discovered new port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 20 14:30:11 947209 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947214 [45007960] -> Discovered new port with GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 Mar 20 14:30:11 947282 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947288 [45007960] -> Discovered new port with GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 Mar 20 14:30:11 947291 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947296 [45007960] -> Discovered new port with GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 Mar 20 14:30:11 947300 [45007960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:11 947305 [45007960] -> Discovered new port with GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 Mar 20 14:30:11 978149 [45007960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:30:12 042474 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:30:12 042577 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x92b38 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x13 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][16][1][5] Return path: [0][9][13][D][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:30:12 042668 [45007960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:30:12 042676 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:30:12 042682 [45007960] -> PortInfo dump: port number.............0x13 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:30:12 042701 [45007960] -> Capabilities Mask: Mar 20 14:30:12 042714 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x92b39 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][16][1][5] Return path: [0][9][13][D][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:30:12 042845 [41401960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:30:12 042856 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:30:12 042851 [41401960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:30:12 042897 [41401960] -> Capabilities Mask: Mar 20 14:30:12 042907 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x92b3a attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][16][1][5] Return path: [0][9][13][D][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:30:12 043013 [43204960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:30:12 043015 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:30:12 043038 [43204960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:30:12 043090 [43204960] -> Capabilities Mask: Mar 20 14:30:12 043094 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x92b3b attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][16][1][5] Return path: [0][9][13][D][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:30:12 043173 [44606960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:30:12 043178 [44606960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:30:12 043191 [44606960] -> Capabilities Mask: Mar 20 14:30:12 043222 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:30:12 043247 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x92b3c attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][12][1][4] Return path: [0][9][14][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:30:12 043318 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:30:12 043314 [41E02960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:30:12 043357 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x92b3d attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][12][1][4] Return path: [0][9][14][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:30:12 043367 [41E02960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:30:12 043422 [41E02960] -> Capabilities Mask: Mar 20 14:30:12 043513 [43C05960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:30:12 043518 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:30:12 043519 [43C05960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:30:12 043535 [43C05960] -> Capabilities Mask: Mar 20 14:30:12 043553 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x92b3e attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][12][1][4] Return path: [0][9][14][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:30:12 043658 [42803960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:30:12 043663 [42803960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:30:12 043678 [42803960] -> Capabilities Mask: Mar 20 14:30:12 049088 [43204960] -> SUBNET UP Mar 20 14:30:12 442903 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:30:12 497312 [45007960] -> SUBNET UP Mar 20 14:30:27 571421 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000000 Mar 20 14:30:27 571674 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:30:27 782498 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000001 Mar 20 14:30:27 782616 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:30:27 804302 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000002 Mar 20 14:30:27 805103 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:30:27 924983 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000003 Mar 20 14:30:27 925088 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:30:27 934314 [43204960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:27 934327 [43204960] -> Removed port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:30:27 969077 [43204960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:30:28 017451 [41E02960] -> SUBNET UP Mar 20 14:30:28 030947 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000004 Mar 20 14:30:28 031177 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:30:28 120040 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000005 Mar 20 14:30:28 120190 [42803960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:30:28 148805 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000006 Mar 20 14:30:28 149108 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:30:28 170453 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000007 Mar 20 14:30:28 170971 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:30:28 336861 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:28 336884 [43C05960] -> Removed port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:30:28 336910 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:28 336916 [43C05960] -> Removed port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:30:28 336945 [43C05960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:28 336951 [43C05960] -> Removed port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:30:28 371497 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:30:28 410709 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000008 Mar 20 14:30:28 410894 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:30:28 415926 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000009 Mar 20 14:30:28 419624 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:30:28 426978 [45A08960] -> SUBNET UP Mar 20 14:30:28 438003 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000a Mar 20 14:30:28 438182 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0152 GID:0xfe80000000000000,0x0005ad0000027c84 Mar 20 14:30:28 470141 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000b Mar 20 14:30:28 470197 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 11 times consecutively Mar 20 14:30:28 652535 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000c Mar 20 14:30:28 652623 [42803960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 12 times consecutively Mar 20 14:30:28 681514 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000d Mar 20 14:30:28 681636 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 13 times consecutively Mar 20 14:30:28 703052 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000e Mar 20 14:30:28 703092 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 14 times consecutively Mar 20 14:30:28 724753 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000000f Mar 20 14:30:28 724809 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 15 times consecutively Mar 20 14:30:28 855519 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000010 Mar 20 14:30:28 855671 [42803960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 16 times consecutively Mar 20 14:30:28 877289 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000011 Mar 20 14:30:28 877354 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 17 times consecutively Mar 20 14:30:28 899021 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000012 Mar 20 14:30:28 899062 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 18 times consecutively Mar 20 14:30:29 006886 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000013 Mar 20 14:30:29 006950 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 19 times consecutively Mar 20 14:30:29 099965 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000014 Mar 20 14:30:29 100020 [44606960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 20 times consecutively Mar 20 14:30:29 146532 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000015 Mar 20 14:30:29 146578 [41E02960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 21 times consecutively Mar 20 14:30:29 356891 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000016 Mar 20 14:30:29 356938 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 22 times consecutively Mar 20 14:30:29 383112 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000017 Mar 20 14:30:29 383157 [43204960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 23 times consecutively Mar 20 14:30:29 383710 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x0000000000000079 Mar 20 14:30:29 383790 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:30:29 407890 [42803960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000018 Mar 20 14:30:29 407935 [42803960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 24 times consecutively Mar 20 14:30:29 429653 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x0000000000000019 Mar 20 14:30:29 429700 [45A08960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 25 times consecutively Mar 20 14:30:29 451352 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001a Mar 20 14:30:29 451401 [45007960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 26 times consecutively Mar 20 14:30:29 479843 [4780B960] -> umad_receiver: ERR 5409: send completed with error (method=0x1 attr=0x11 trans_id=0x124ef00095cbf) -- dropping Mar 20 14:30:29 479855 [4780B960] -> umad_receiver: ERR 5411: DR SMP Mar 20 14:30:29 479865 [4780B960] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT) Mar 20 14:30:29 479901 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x6 trans_id................0x95cbf attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][5][17][C] Return path: [0][0][0][0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Mar 20 14:30:29 480017 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:29 480030 [44606960] -> Removed port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:30:29 480092 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:29 480099 [44606960] -> Removed port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 20 14:30:29 480121 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:29 480128 [44606960] -> Removed port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:30:29 480152 [44606960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:29 480160 [44606960] -> Removed port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:30:29 480325 [44606960] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0005ad0000027c84 port 12. Adding to light sweep sampling list Mar 20 14:30:29 480343 [44606960] -> Directed Path Dump of 5 hop path: Path = [0][1][11][1][5][17] Mar 20 14:30:29 665327 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000007a Mar 20 14:30:29 665355 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001b Mar 20 14:30:29 665397 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:30:29 665404 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 27 times consecutively Mar 20 14:30:29 680658 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:29 680668 [45A08960] -> Discovered new port with GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 Mar 20 14:30:29 680672 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:29 680678 [45A08960] -> Discovered new port with GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 Mar 20 14:30:29 680681 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:29 680686 [45A08960] -> Discovered new port with GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 Mar 20 14:30:29 680690 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:29 680695 [45A08960] -> Discovered new port with GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 Mar 20 14:30:29 711542 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:30:29 768280 [41401960] -> SUBNET UP Mar 20 14:30:30 113195 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:30 113206 [45A08960] -> Discovered new port with GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 Mar 20 14:30:30 113211 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:30 113216 [45A08960] -> Discovered new port with GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 Mar 20 14:30:30 113220 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:30 113225 [45A08960] -> Discovered new port with GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 Mar 20 14:30:30 113228 [45A08960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0092 GID:0xfe80000000000000,0x0005ad0000024bbb Mar 20 14:30:30 113233 [45A08960] -> Discovered new port with GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 Mar 20 14:30:30 144149 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:30:30 195765 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:30:30 195850 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x96dcd attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x16 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][14][2][4] Return path: [0][9][15][E][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:30:30 195929 [43C05960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:30:30 195942 [43C05960] -> PortInfo dump: port number.............0x16 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:30:30 195968 [43C05960] -> Capabilities Mask: Mar 20 14:30:30 196144 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:30:30 196179 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x96dce attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x17 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][14][2][4] Return path: [0][9][15][E][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:30:30 196248 [43204960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:30:30 196254 [43204960] -> PortInfo dump: port number.............0x17 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:30:30 196269 [43204960] -> Capabilities Mask: Mar 20 14:30:30 201633 [45007960] -> SUBNET UP Mar 20 14:30:30 278051 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001c Mar 20 14:30:30 278107 [43C05960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 28 times consecutively Mar 20 14:30:30 278656 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000007b Mar 20 14:30:30 278871 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:30:30 279653 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000008d Mar 20 14:30:30 279765 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:30:30 568539 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000008e Mar 20 14:30:30 568617 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:30:30 607916 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 TID:0x000000000000007c Mar 20 14:30:30 625139 [44606960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:30:30 663838 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0148 GID:0xfe80000000000000,0x0005ad00000281b3 Mar 20 14:30:30 664569 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x000000000000008f Mar 20 14:30:30 664747 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:30:30 679262 [45A08960] -> SUBNET UP Mar 20 14:30:30 784024 [43204960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000090 Mar 20 14:30:30 784123 [43204960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:30:30 804217 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000091 Mar 20 14:30:30 807807 [41401960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:30:30 825500 [45A08960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000092 Mar 20 14:30:30 825600 [45A08960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:30:30 988887 [43C05960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000093 Mar 20 14:30:30 988978 [43C05960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:30:31 059298 [41401960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches Mar 20 14:30:31 106840 [41E02960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000094 Mar 20 14:30:31 111335 [41E02960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:30:31 112465 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:30:31 112497 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x98837 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][16][1][5] Return path: [0][9][13][D][2] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:30:31 112593 [44606960] -> osm_pi_rcv_process_set: ERR 0F10: Received error status for SetResp() Mar 20 14:30:31 112627 [44606960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281a7 port_guid...............0x0005ad00000281a7 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x2 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............DOWN state_info2.............0x42 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:30:31 112673 [44606960] -> Capabilities Mask: Mar 20 14:30:31 113808 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00 Mar 20 14:30:31 113838 [4780B960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x4 trans_id................0x9883e attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x18 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][11][1][4] Return path: [0][9][18][D][1] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Mar 20 14:30:31 113925 [43204960] -> osm_pi_rcv_process_set: Received error status 0x1c for SetResp() during ACTIVE transition Mar 20 14:30:31 113930 [43204960] -> PortInfo dump: port number.............0x18 node_guid...............0x0005ad00000281b3 port_guid...............0x0005ad00000281b3 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x1 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 client_reregister.......0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Mar 20 14:30:31 113946 [43204960] -> Capabilities Mask: Mar 20 14:30:31 119007 [43204960] -> SUBNET UP Mar 20 14:30:31 128758 [45007960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000095 Mar 20 14:30:31 128851 [45007960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:30:31 150370 [44606960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B TID:0x0000000000000096 Mar 20 14:30:31 150468 [44606960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x001B GID:0xfe80000000000000,0x0005ad00000281a7 Mar 20 14:30:31 316422 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 TID:0x000000000000001c Mar 20 14:30:31 316498 [41401960] -> __osm_trap_rcv_process_request: ERR 3804: Received trap 29 times consecutively From halr at voltaire.com Wed Mar 21 12:53:23 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Mar 2007 14:53:23 -0500 Subject: [ofa-general] osm error messages In-Reply-To: References: Message-ID: <1174506796.17678.11941.camel@hal.voltaire.com> On Wed, 2007-03-21 at 13:29, Douglas Fuller wrote: > I'm seeing some sporadic error activity from OpnSM (OFED 1.1; osm.log > below) that ay correlate with some ob failures -- I'm trying to get to the > bottom of this. > > Before seeing this, I isolatedand disabled with ibportstate) what ppeared > to be a ba internal port in one of our core switches. That leads me to > suspectI have a switchmisbehaving somwhere. > > Without any other intervention, things seem to check out (wth > ibdiagnet/ibchecknet). Any thought? Need any more nformatin? Is something bouncing your subnet or was this just what ibportstate did ? It could be if this was a core switch. Also, you may have some SMAs which have gone nonresponsive to SMPs (IB_TIMEOUTs) but the links are up. I can't be sure not knowing what the exact scenario was. If you do, you will like want to chase these and do something about them if you haven't already. All the messages relating to ACTIVE -> ACTIVE transition can be ignored. Also, it looks like something is removing characters in the log. -- Hal > Thanks, > --Doug Fuller > > Mar 19 18:8:50 000354 [AB000160] -> OpenSM Rev:openib-2.0.5 OpnIBsvn > Exported revision > Mar 19 18:28:50 000466 [AB00160] -> OpenSM Rev:openib-2.0.5 OpenIB svn > Exported revision > Mar 19 18:28:50 007666 [AB000160] -> om_vendor_bind: Binding to port > 0x5ad0000024bb > Mar 19 18:28:50 011279 [AB000160] ->osm_vendor_bind: Binding to port > 0x5ad0000024bbb > Mar 19 18:2850 438326 [44606960] -> Entering MASTER stte > Mar 19 18:28:50 438628 [4606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:66 from LID:0x0000 > GID:0xfe8000000000000,0x0005ad000024bbb > Mar 19 1828:50 438661 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:6 from LID:0x0000 > GID:0xf8000000000000,0x0005ad0000024bbb > Mar 1 18:28:50 504176 [41401960] -> osm_cast_mgr_process: Min Hop Tabes > onfigured on all switches > Mar 19 18:28:50 639453 [44606960] -> SUNET UP > Mar 19 18:28:50 853613 [1E02960 -> __osm_trap_rcv_process_reqest: > Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x0092 > TID:0x000000000000018 > Mar 19 18:28:5 853813 [41E0960] -> osm_report_notice: Reporting Generic > Notice typ:4 num:144 from LID:0x0092 > GID:0xfe8000000000000,0x0005ad0000024bbb > Mar 19 18:28:51 273470 [4460960] -> osm_ucast_mgr_process: Min HopTables > configured on all switches > Mar 19 18:28:51 33730 [43C05960] -> SUBNET UP > Mar 19 18:3033 565682 [4320490] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x1 num:128 Poducer:2 from LID:0x0001 > TID:0x0000000000000019 > Mar 19 18:30:33 565958 [43204960] -> sm_report_notice: Reporting Generic > Noicetype:1 num:128 from ID:0x0001 > GID:0xfe80000000000000,0x005ad0000027c6 > Mar 19 18:30:33 963901 [41401960] > osm_report_notice: Reporting Generic > Notice typ:3 num:64 fro LID:x0092 > GID:0xfe80000000000000,0x005ad0000024bbb > Mar 19 18:30:33 963914 [41401960] -> Discovered nw port with > GUID:0x0005ad00000297b LI range [0x3,0x37] of node:saguro-14-9 HCA-1 > Mar 19 18:30:33 994698 [4401960] > om_ucast_mgr_process: Min Hp Tables > configured n all switches > Mar 19 18:30:34 054763 [45A08960]-> UBNET UP > Mar 19 18:30:34 351397 [43C0560] -> __osm_trap_rcv_process_request: > Received Generi Ntice type:0x04 num:144 Producer:1 fomLID:0x0037 > TID:0x000000000000000 > Mar 19 18:30:34 351615 [43C05960] -> osm_report_notice Reportig Generic > otice type:4 num:144 from LID:0x0037 > GID:0xfe80000000000000,0x0005ad00002497b > ar 19 18:30:34 777488 [45A08960] -> osm_ucast_mgr_process:Min Hop Tables > configured onall switces > Mar 19 18:30:34 832664 [4A08960] -> SUBNET UP > Mar 19 18:32:27 476136 [45A08960] -> __osm_trap_cv_process_request: > Received Generic Notice typ:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000002b > Mar 19 18:32:27 476340 [43204960] ->__osm_trap_rcv_process_request: > Reeivd Gneric Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000037 > Mar 19 18:32:2 476389 [45A08960] -> osm_report_notice: Reporting Generic > Notice type: num:128 from LID:0x0148 > GID:0xfe8000000000000,0x0005ad00000281b3 > Mar 19 18:2:27 47485 [43204960] -> osm_report_ntice: Reporting Generic > Notice tye:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad0000081a7Mar 19 18:32:27 817617 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 nm:65 frm LID:0x0092 > GID:0xfe80000000000000,0x005ad000024bbb > Mar 19 18:32:27 817637 [4280396] -> Remove port with > GUID:0x0005ad0000024e0b LID range [0xB3,xB3] of node:saguaro-23-4 HCA-1 > Mar19 18:32:27 817655 [42803960] -> sm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,00005ad0000024bbb > Mar 9 18:32:27 81766 [42803960] -> Remove port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 1 18:32:2 817694 [42803960] -> osm_report_notice: Reporting Generic > Ntice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,00005ad0000024bbb > Mar 19 18:3:7 81769 [42803960] -> Removed port wth > GUID0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 19 18:3227 817716 42803960] -> osm_report_ntice: ReportingGeneric > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:32:27 81771 [42803960] -> Remved port with > GUID0x0005ad0000024b27 LID range [0xAF,0xAF] of node:sguaro-23-0 HCA-1 > Mar 19 18:32:27 817738 [4280390] -> osm_report_notice: Reporting Generic > Notic type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:32:27 817743 [4203960] -> Removed ort with > GUID:0x000ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 19 18:32:27 817758 [42803960] - osm_report_notice: Reporing Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe8000000000000,0x0005ad0000024bbb > Mar 19 18:32:27 817763 [42803960 -> Removed port with > GUID:0x0005ad000024d7 LID range [0xB6,0xB6] of node:saguar-23-7 HCA-1 > Mar 19 18:32:27 817780 [42803960] -> osm_rport_notice: Reporting Generic > Notice type:3 nu:65 fromLID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:32:27 17785 [42803960] -> Remved port with > GUID:0x0005ad0000024d6bLID range [0xB8,0xB8] of node:saguao-23-9 HCA-1 > Mar 19 18:32:27 817803 [480396] -> osm_report_notce: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad000024bbb > Mar 19 18:32:27 817808 [4283960] -> Removed por with > GUID:0x0005ad0000024977 LID rane [0xA9,0xA9] of node:saguaro-22-4 HCA1 > Mar 19 18:32:27 817932 [42803960] -> osm_report_notice: Rporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad000024bbb > Mar 19 18:32:27 817938 [4280390] -> Removed port with > GID:0x0005ad0000027c84 LID range [0x15,0x152] of node:Topspin Switch TS20 > Mar 19 18:32:27 817970 [42803960] -> osm_report_notice: Reporing Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad000024bbb > Mar 19 18:32:27 817977 [4280360] -> Removed port with > GUID0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1Mar 19 1:32:27 817992 [42803960] -> osm_report_notice: Reporting Generic > Notice tye:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad000004bbb > Mar 19 18:32:27 817997 [42803960 -> Removed port with > GUID:0x0005ad00000249f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Ma 19 18:32:27 81811 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xe80000000000000,0x0005ad0000024bb > Mar 9 18:32:27 818016 [42803960] - Removed port with > UID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-2-2 HCA-1 > Mar 1 18:32:27 818032 [42803960] -> osm_report_notice: Reporing Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > a 19 18:32:27 818037 [42803960] -> Rmoved port with > GUID:0x0005ad000004da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 19 1:32:27 818054 [4280360] -> osm_repot_notice: Reporting Generic > Notice typ:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,x0005ad0000024bbb > Mar19 18:32:27 818115 [42803960] -> Remoed port wth > GUID:0x0005ad0000024cbb LID range [xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 1 18:3:27 81812 [42803960] -> osm_report_notice: Reorting Generic > Notie type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad000002bbb > Mar1918:32:27 818137 [42803960] -> Removedport with > GUID:0x0005ad00000249d3 LID range [0xB1,0B1] of node:saguaro-23-2 HCA-1 > Mar 19 8:32:2 818153 [4280360] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe800000000000,0x0005ad0000024bbb > Mar 19 1832:7 818158 [42803960] -> Removedpot with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-2-5 HCA-1 > Mar 19 18:32:7818173 [42803960] -> osm_report_notic: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad000024bbb > Mar 19 18:322 818178 [42803960 -> Removed port wih > GUID:0x0005ad0000024afb LID rnge [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 19 183227 851249[42803960] -> osm_ucast_mgr_procss: Min Hop Tables > configured on all sitches > Mar 19 18:32:27 898524 [43204960] -> SUBNET UP > Mar 19 8:32:2828664 [45007960] -> osm_ucast_mgr_proessMin Hop Tables > confiured on all switches > Mar 19 18:32:28 341659 [4466960] -> SUBNET UP > Mar 19 1:33:21814615 [41E0296] -> __osm_trap_rcv_process_request: > Rceived Gneric Notice type:0x01 num:128 Produce:2 from LID:0x0148 > TID:0x00000000000002c > Mar 19 8:33:21 81484 [4E02960]-> osm_report_otice: Reporting Generi > Notice type:1 num:128 fom LID:0x0148 > GID:0xfe8000000000000,0x0005ad0000281b3 > Mar19 18:3321 820835 [41E02960] -> __osm_tap_rcv_process_request: > Received Genei Notce type:0x01 num:128 Producer:2 fro LI:0x001B > TID:0x000000000000008 > Ma 19 18:33:21 82090 [41E02960] -> smreport_notice: Repoting Generic > Notice tye:1 num:128 fromLID:0x001B > GD:0xfe80000000000000,0x0005ad0000281a7 > ar 19 18:33:21 82038 [41E02960] -> __osm_trap_rcv_processrequest: > Received Generic Notce tpe:0x num:128 Producer:2 from LID:00148 > TID0x00000000000002d > Mar 19 18:33:21 820992 [1E060] -> osm_report_notice:Reporting Gnric > Notice type:1 num:128from LID:0x0148 > GID:0xfe8000000000000,0x005ad00000281b3 > Mar 19 18:3:21 826779 [4102960] -> __osm_trap_rcv_process_reqest: > Receivd Generic Notice type0x0 num:128 Prducr:2 from LID:0x001B > TID0x000000000000039 > Mar 19 18:33:21 82742 [41E02960] - om_report_notice: Reporting Generc > Ntice type:1 num:128 from LID:0x001B > GID:0xfe8000000000000,0x0005a00000281a7Mar 19 18:33:22 164580 [45007960 > osm_drop_mgr_process: ERR 0108: Ukown > remote side fo node 0x0005a0000027c84 port 18. Addi to light sweep > sampling list > Mar 191:3:22 164654 [45007960] -> Direced PathDump of 5 hop path: > Path = [0][1][11][1][5][17] > Mar19 18:33:22 164712 [45007960] -> osm_op_mgrprocess: ERR 0108: Unknown > reote sde for node 0x0005ad00000281b3 port2. Adding to light swep > sampling lit > Mar 19 18:33:22164724 45007960] -> irected Path Dump of hop path: > Path = [0][1[11][1][5] > Mar 9 18:33:22 173634 [43C0960] ->osm_report_notic: Reporting Generic > Notice type:3 num:64 fromID0x0092 > GID:0xfe80000000000000,0x005ad0000024bbb > Mar 19 18:33:22 17365 [43C05960] -> iscovered e port with > GUID:0x0005ad000027c4 LIDrange [0x152,0x152] of node:Toppin Switch TS120 > Mar 19 18:33:22 17365 [43C05960] -> osm_reportnotice:RepotingGeneric > Notice type:3 num:64from LID:0x0092 > GID:0xfe8000000000000,0x05ad0000024bbb > Mar 19 18:33:22 173662 [43C05960] -> icovered newpot with > GUID:0x005ad0000024b27 ID rage [0xAF,0xAF] o ode:saguaro-23- HCA-1 > Mar 19 18:33:22 17366 [43C05960] -> osm_report_notice: Reprting Gneric > otice type: num:64 from LID:0x0092 > GI0xfe80000000000000,0x0005ad0000024bbbMar19 18:33:22 173671[43C05960] -> Discovered new port with > GUID:0x005ad0000024da7 LID rage 0xB00xB0] ofnode:saguar-23-1 HCA-1 > Mar 19 18:33:22 173675 [43C0596] -> osm_report_notice: Rerting Generic > Notice typ:3 num:64 fro LID:0x0092 > GID:0xfe800000000000,0x0005ad0000024bbb > Mar 9 18:33:22173680 [43C05960] -> Discoveed new port withGUID:0x005ad00000249d3 LD range [0xB1,0xB1] of node:saguaro-2-2 HC1 > Mar 1918:33:22 173684[43C05960] ->osm_report_notice: Reporting Generic > Notice type:3 num:6 fro LI:0x002 > GID:0xfe80000000000000,0x005ad000002bb > Mar 19 18:33:22 173689 [43C05960] -> Dicovered new port with > GUI:0x0005ad000024cbb LID range [0xB,0xB] of node:saguaro-23-3 HCA-1 > Mar 19 1:33:22 173693 [3C0960] ->osm_report_notice: Rporting Gneric > Notice type:3 num:64 fro LID:0x0092 > GID:0xfe000000000000,0x005ad0000024bbb > Mar 19 18:33:22 173697 43C0596] -> Discoverednew port with > GUID:0x005ad000024e0b LI range [0xB3,0xB3] of nod:saguaro23-4 HCA-1 > Mar 19 1833:22 173701 [43C5960] -> os_report_otice: Reporting Genric > Notice type:3 num:4 from LID:0x0092 > GID:0xfe80000000000000,0x005d0000024bbb > Mr 19 1833:22 173706 [43C05960] -> Dscovered new port with > GUID:0x0005ad00025043 LID ange [0xB4,0xB4] ofnode:saguaro-23-5 HA-1 > Mar 1 18:33:2 173710 43C05960] -> osm_repo_notice: Reporting Generic > Notice type:3 nm:64 from LID:0x0092 > ID:0xfe800000000000,0x0005ad0000024bbb > Ma 1918:33:22 173715 [43C0596] -> Discverednew port with > GUID:0x0005ad00002510b LID range [0xB5,0xB5 of node:saguaro-23-6 HCA1 > Mar 19 18:3:2 173719 [3C05960] -> osm_report_ntic: Reportin Generic > Notice type:3 num:64 from LID:0x0092 > GID0xfe8000000000000,0x0005ad0000024bbb > Mar 19 18:33:22 1723 [43C05960]-> Disoveed new port wth > GUID:0x0005ad000002447 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1Mar 1 18:33:22 173727 43C05960] -> osm_rert_notie: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0fe80000000000000,0x0005ad000004bbb > Mar 9 18:33:22 17373 [43C05960] - Discoverednew port wit > GUID:0x0005ad000024d8bLID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mr 19 18:33:22 173736 [4C05960] -> os_report_notice:Reporting Generic > Notic type:3 num:64 from LID:0x0092 > GID:0xf80000000000000,0x0005ad0000024bbb > Mar 19 18:33:22 173741 43C05960] ->Dscovered ne port with > GID:0x0005ad0000024d6b LI range [0xB8,0xB8] ofnode:saguaro-23-9 HCA-1 > Mar19 18:33:22 173744 [43C0960] -> osm_report_notice: Reorting Generic > otice type:3 num:64 from LID:0x0092 > GI:0xfe8000000000000,0x0005ad0000024bbb > Mar 19 18:3:22 173749 [4305960] -> Discovered new prt with > GUI:0x0005d0000024afb LID rnge [0xA5,0xA5] of noe:saguaro-22-0 HCA-1 > Mar 19 1833:22 13753 [43C05960] -> om_report_notice: Reporing Generic > Notice type:3 num:64 from LID:0x002 > GID:0xfe8000000000000,0x0005ad00004bbb > Mar 19 18:33:22 17758 [43C0596] -> Discovered new portwith > GUID:00005ad000002511b LID rang [0xA6,0A6] f node:saguao-22-1 HCA-1 > Mar 19 18:33:22 173762 [43C05960] -> osm_reort_notice: Reportin Geneic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x005ad000024bbb > Mar 19 18:33:22 17376 [3C0960] -> Discovered new port wihGUID:0x0005ad0000024c9b LID range [xA7,0xA7] of node:saguaro-222 HCA-1 > Mar 19 8:33:22 173770 [43C05960] -> osm_report_notice Reporting Gneric > Notie type:3 nm:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad000024bbb > Mar 1918:33:22 173830 43C0590]-> Discovered new port with > ID:0x0005ad000002498f LID range [0xA,0xA8]of node:saguaro-22-3 HCA-1 > Mar 9 18:33:22 173834 [43C05960] -> osm_eport_notice: Reporting Geneic > Notice type:3num:64 from LID:0x0092 > GD:0xfe80000000000000,0x0005ad0000024bb > Mar 1 18:33:22 173839 [43C05960] -> Discovered new port ith > GUI:0x005ad0000024977 LID range [xA9,0A9 of node:saguaro-22-4 HCA-1 > Mar 19 18:33:22 173843 [43C05960] ->osm_report_notice: Reporting GenericNotice ype: num:64 from LID:0x0092 > GID:0xfe0000000000000,0x0005ad0000024bbb > Mar 9 8:33:22 173848 [43C05960] -> Discovered new port with > GUD:00005ad0000024feb LID range [0x153,x13 of node:saguaro-22-5 HCA-1 > Mar 19 18:33:22 204620 [43C05960] -> osm_cast_mgr_process: Min Hop Tablesconfgued on all switches > Mar 19 18:33:22 278567 [45A08960] -> SUNET UP > Mar19 18:33:22 664286 [4141960] -> osm_ucast_mgr_process: Min Hop Table > configured on all switces > Mr 19 1833:22 734270 [45007960] -> SUBNET UP > Mar 19 1833:37 650358 [41401960] -> __osm_trap_cv_process_request > Rceived Geneic Noice type:0x01 num:128 Producer:2 from LID:0x0152 > TID0x0000000000000000 > Mar 19 18:33:37 65058 [41401960] -> os_report_notice Reporting Generic > Noticetype:1 num:28 from LID:0x0152GID:0xfe8000000000000,0x005ad0000027c84 > Mar 19 18:33:37 927263 [45A08960] -> __osm_rap_rcv_procs_request: > Received Generic otice tye:0x01 num:128 Producer:2 fro LID:0x0152 > TID:0x000000000000001 > Mar 19 18:33:37 927420 [45A090] -> osm_report_notice: Reportig Geeric > Notice type:1 num:128from LID:0x0152 > GD:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:3:37 95572 [4A0896] -> __osm_trap_rcv_process_rquest: > Received Generic Notice type001 num:128Producer:2 from LID:x0152 > TID:0x000000000000002 > Mar 1918:33:37 955657 [45A08960] -> osreprt_notice: Reporting Generic > Noticetype:1num:128 from LD:0x0152 > GID:0xfe80000000000000,0x0005ad000002c84 > Mar 1 18:33:37 97718 [44606960] -> _osm_tap_rcv_process_request: > Receivd Generic Notice type:0x01 nu:128 Produr:2 from LID:0x0152 > TID:00000000000000003 > Mar 19 18:33:3 97740 [44606960] -> osm_report_notice: Rporing Geneic > Notice type:1 num:128 frm LID:0x0152 > GID:0xfe800000000000,0x0005d0000027c84 > Mar 19 18:3337 999319 [41E02960] -> __osm_trap_rc_process_rquest: > Received Gneri Notice type:0x01 num:128 Producer:2 rom LID:0x052 > TID:0x000000000000004 > Mar 19 18:33:37 999447 [41E02960] > sm_report_notice: Reporting Generic > otice type:1 num:128 from LID:x152GID:xfe800000000000000x000ad0000027c84 > Mar 19 18:33:38 045171 [4606960] -> __osm_trap_rcvprcess_request: > Received Gneric Notice type:0x0 num:128 Producer:2 frm LID:0x0152 > TID:0x000000000000005 > Mar 9 183:38 045271 [44606960] -> osm_report_notice: Reporting Generic > Ntice ype:1 num:18 from LID:0x05 > GID:0xfe800000000000000x0005ad000027c84 > Mar 19 18:33:38 06305 [432060] -> __osm_trap_rcv_process_request: > Received Generic Notice typ:0x01 nu:128 Producer:2 from ID:0x052 > TID:0x000000000000006 > Mar 1918:3:3 063102 [43204960] -> osm_reprt_notice: Rporting Generic > Notice type:1 num:128 from LID:0x0152ID:0xfe8000000000000,0x0005a0000027c4 > ar 9 18:3338 182624 [2803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01num:12 Produer2from LID:0x0152 > TID:0x000000000000007 > Mar 19 18:3338 18720 [4280360 -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x05 > GID:xfe800000000000,0x0005ad000007c84 > Mr 19 18:33:38 19435 [44606960] -> __osm_trap_rcv_process_request: > Reeived Generic Notice tpe:0x0 num:128 Prducer:2 from LID:0x012 > TID:x0000000000000008 > Mar 1918:33:38 194209[44606960] -> osm_reportnotice: Reporting Generic > Notic type:1 num:28 from LID:0x152 > GID:0xfe000000000000,0x0005ad000007c4 > Mar 1 18:33:38 379421 [43C05960] -> _om_trap_rcv_process_request: > Received Generi Notice type:0x01 num:12Producr:2 from LID:0x0152 > TID:0x000000000000009 > Mar 19 18:33:38 37959 [4305960] -> osm_report_otice: Reporting eneric > Ntice type:1 num:128 from LID:0x0152 > GD:0xfe800000000000,0x0005ad0000027c84 > Mar 19 1833:3 07685 [41401960] -> __osm_trap_rcv_rocss_request:Received GenericNotice type:0x01 num:128 Producer:2from LID:0x0152 > TID:0x0000000000000a > Mar 19 18:33:38 47758 [4140190] -> osm_report_notice: eprting Generic > Notice type:1 num:128 rom LID:0x0152 > GID:0xfe8000000000000,0000ad0000027c84 > Mar 1 18:33:8 429658 [4A08960] -> __osm_trap_rcv_pocess_request: > Received Generic Ntice type:001 num:128 Producer:2 fm LID0x0152 > TID:0x000000000000000bMa 19 8:33:8 429700 [45A08960] -> __osm_trap_rcv_process_reqest: ERR > 3804: Received trap 11 ties oecutiveyMar 19 18:33:38 544177 [45007960] - __osm_trap_rcv_process_reuest: > Received Generic Notice type:0x0 num128 Prodcer:2 from LID:0x152 > ID:0x000000000000000c > Mar 9 18:3338 544221 [4507960] -> __osm_trp_rv_process_request: ERR > 304: Received trap 12 times consecutiely > Mar 1918:33:8 545235 [4280960] ->osm_repot_ntice:Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe800000000000,0x0005ad0000024b > Mar 19 18:3338 54247 [42803960] -> Removed port with > GUID:0x0005ad00024b27 LID range 0xAF,0xAF] of node:sauaro-23-0 HCA-1 > Mar 19 18:33:3 545278 [42803960] -> osmreort_notice: Reporing Generic > Noticetype3 num:65 from LD:0x0092 > GI:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:33:38 54586 [428030] -> Removd port with > GUI:0x0005ad000024da7 LID range [0xB0,0x0] of node:saguao-23-1 HC-1 > Mar 19 18:33:38 545312 42803960] ->osm_report_noice: Reporting Generic > Notice ype: num:65 from LID:0x0092 > GID:0xfe800000000000,0x0005ad0000024bb > Mar 19 18:33:38 545318 [42803960] -> Reoved portwth > GUID:0x0005ad00000249d3 LID rang [xB1,0B1] of node:saguaro-23-2 HA-1 > Mar 19 8:3:38 580005 [42803960 -> osm_ucast_mgr_process: Min Hop Tabes > configured on all swiches > Mar 19 18:33:38 66849[43C0590] -> SUBNET UP > Mar 19 18:33:38 648520[45A08960] -> __om_tra_rcv_process_reques: > ReceivedGeneric Notice type:0x01 num:128 Producer:2 from LID:0x015 > D:0x00000000000000d > Mar 19 18:33:38 48616 [45A08960] -> __osm_trap_cv_process_requet: ERR > 3804: eceived trap 13 tmes consecutiely > Mar 19 18:3338 676891[41E02960 -> __osm_trap_rcv_rocess_request: > Recived Genei Notice type:0x01 num:128 Producer:2 fo LID:0x0152 > TID:0x0000000000000e > Mar 19 18:33:38 67670 [4102960] -> __osm_trap_rcv_proces_request: ERR > 3804: Rceived trap 14 tes cosecutively > Mar 19 18:33:38 698797 [4460960] ->__osm_trap_rcv_prcessrequest: > Received Geneic Notice type:0x1 num:128 Producer:2frm LID:0x0152 > TD:0x00000000000000f > Mr 19 18:33:8 69860 [44606960] -> __osm_trap_rcv_process_equest: ERR > 3804: Receved trap 15 times consecutivey > Mar 19 18:33:38 20538 [43C05960] -> __s_trap_rcv_proces_request: > Received Generic Notce type:0x01 num:128 Poducer:2from ID:0x0152 > TID:0x0000000000000010Mar 19 18:33:38 720612 [43C0960] -> __osm_trp_rcv_process_reqest: RR > 3804: eceived trap 16 times onsecutively > ar 19 18:33:38 921253 [42803960] > __osm_trap_rcv_process_equest: > eceived Generic Notice type:x01 num:128 Producer:2 from LID0x012 > TIDx0000000000000011 > Mar 19 18:33:8 92130 [42803960] -> __osm_trap_rcv_procss_reques: ERR > 3804: Recived trap 17 imes consecutively > Mar 198:33:38 97418 [43C05960] -> __osm_trap_rcv_proess_reqest: > Received Generic Notice ype:0x01 num:128 Producer:2 rom LID:0x152 > TID:0x0000000000000012 > a19 18:33:38 967479 [43C05960] > __os_trap_rcv_process_request: RR > 3804: Received trap 18 times onsecutively > Mar 19 18:33:38 98519 [483960] -> _osm_trap_rcv_processreques: > ReceivedGeneric Notice type:0x01 num:128 Producer:2 from LID:0x015 > TID:0x000000000000013 > Mar 19 18:33:3 98955[2803960] -> __osm_trap_rcv_process_rquest: ERR3804: Receivedtrap 19 times consecutively > Mar 9 18:33:38 998342 [43204960] -> __osm_trap_rcv_poces_request: > ecived Generic Notice type0x01 num:128 Poducer:2 from LID:0x0152 > TD:0x000000000000014 > Mar 19 18:33:38 998380 [4320496] -> __osm_ap_rcv_process_request ERR > 384: Received trap 20 times consecutively > ar 19 18:33:3 039293 43204960] -> __osm_trap_rcv_process_request: > Recived Gneric Notice type:0x0 num:128 Producr:2 frm LID:0x0152 > TID:x0000000000000015 > Mar9 18:33:39 039334 [43204960] -> __os_trap_rcv_process_requs: ERR > 3804:Received trap 21 times consecutively > Mar19 18:33:39 061060 [3204960] -> __osm_trap_rcv_process_request: > Receid Generic Notice type:01 num:128 Producer:2 from LID:0x0152TID:0x000000000000016 > Mar 19 18:3:3906108 [43204960] -> __osm_trap_rcv_prcess_request: ERR > 3804: Reeied tra 22 times consecutivel > Mar 19 18:33:39 079032 [41E02960] -> __osm_tra_rcv_process_request: > Received eneric Notice type:0x01 num:128 Prducer2 from LID:0x0152 > TID:0x000000000000017 > Mar 19 18:33:39 07972 [4E02960] -> _osm_trap_rcv_proces_request: ERR > 3804: Receied trap23 times consecutively > Mar 19 18:3:9 146006 [41E0960] -> osm_report_notice: Reporting Gneric > Notice ype:3 num:65 from LID:0x0092 > GD:0xfe8000000000000,0x0005ad0000024bbMar 19 18:33:39 146018 [4E02960] - Removed portwith > GUID:0x005ad000002511b LID range [0xA6,xA6] of node:saguaro-2-1 HCA-1 > Mar 19 18:33:39 14604 [41E02960] -> osm_eport_notice: Reporting Generic > Noticetype:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x005ad0000024bb > Mar 19 18:33:39 146050 [41E02960] -> Rmoved port with > UID:0x0005d0000024db LID range [0xB80xB8] of node:saguaro-23-9 HCA-1 > Mar 19 18:33:39 146082 [41E2960] -> osm_report_notice Reporting Generic > Notic type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad000024bb > Mar 19 18:33:39 146089 [41E02960] -> Removed port wth > GUID:00005ad0000024afb LID range 0xA,0xA5] of node:saguaro-22-0 HCA-1 > Mar 19 18:33:39 150720 [4140190 -> osm_report_notice: Reporting Gneric > Notie type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad00000bbb > Mar 19 18:33:39 150732 [41401960] -> Discovered new port with > GUI0x0005ad0000024b27 LID rage [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > Mar 19 18:33:39 150736 [4140160] -> osm_report_notice: Reporting Geneic > Notice ype:3 num:64 from LID:0x0092 > GID:0xfe0000000000000,0x0005d000024bbb > Mar 19 18:33:39 150742 [41401960] -> Discovered new port with > GID:0x000ad0000024da LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 19 18:33:39 150745 [4141960] -> osm_report_notice: Reporting Geneic > Notice tpe:3num:64 from LID:0x0092 > GID:0xfe8000000000000,0x0005ad000024bbb > Mar 19 18:3:39 150750 [41401960] -> Discovered new port with > UID:0x0005ad0000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HA-1 > Mar 19 18:33:39 181553 [4141960] -> osm_ucast_mgr_process: Min Ho Tables > configured on al switches > Mar 19 18:33:39 218130 [43C5960] -> __osm_trap_rcv_process_request: > Received eneric Notice type:0x01 num:128 Producer:2 from LID:00152 > TID:0x000000000000018 > Mar 19 18:33:39 218197 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Receivd trap 24 times consecutively > Mar 1918:33:39 375407 [42803960] -> __osm_trap_rcv_process_request: > Received Generc Notice type:0x01 um:128 Producer:2 from LID:0x0152 > TID:0x000000000000019 > Mar 19 18:3339 375456 [42803960] -> __osm_trap_rcv_process_request: ERR > 3804: Received tra 25 times consecutvely > Mar 19 18:33:39 375588 [43C05960]-> __osm_trap_rcv_proces_request: > Received Generic Notic type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x00000000000002e > Mr 19 18:33:39 375630 [43C05960] -> osm_report_notice Reporting Generic > Notice type:1 num:128 fro LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b > Mar 19 18:33:39 637844 [41401960] -> UBNET UP > Mar 19 18:33:9 664805 [45A08960] -> __osm_trap_rcv_process_request: > Received Generc Notce type:0x01 num128 Producer:2 from LID:0x0148 > TID:0x000000000000002f > Mar 19 18:33:39 66490 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:xfe800000000000000x0005ad00000281b3 > ar 19 18:33:39 666276 [45A08960] -> __osm_trap_rcvprocess_request: > Received Generic Notice type:x01 num:128 Producer:2 from LID:0x001B > TID:0x00000000000003a > Mar 19 18:33:39 666364 [45A08960] -> osm_report_notice Reporting Generic > Notice type1 num:128 from LID:0x001B > GID:0xfe8000000000000,0x0005ad00000281a7Mar 19 18:33:39 710546 [41E02960] -> __osm_trap_rcv_proces_request > Received Generic Notice type:0x01 num:128 Producer2 from LID:0x014 > TID:0x000000000000003 > Mar 19 18:33:39 710642 [41E02960] -> osm_report_notice Reporting Generic > Notice type:1 num:28 from LID:0x0148 > ID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:33:39 732425 [41E0960] ->__osm_trap_rcvprocess_request: > Received Generic Notice type:0x01 num:128 Producer:2 from ID:0x0148 > TID:0x0000000000000031 > Mar 19 18:33:39732514 41E02960] -> osm_report_notice: Reporing Generic > Notice type:1num:128 from LID:0x0148 > GID:0xfe8000000000000,0x0005ad00000281b3 > Mar 1 18:33:39 784151 [43204960] -> __osm_trap_rcv_process_request: > ReceivedGeneric Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:000000000000032 > Mar 19 18:33:39 784269 [43204960] -> osm_report_notice: Reporting neric > Notice type:1 nu:128 from LID:x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:33:39 824170 [42803960] -> __osm_trap_rcv_procss_request: > Received Generic Notice type:0x01 num:128 Produer:2 from LID:0x001B > TID:0x00000000000003b > Mar 19 18:33:39 824443 [42803960] -> osm_repot_notice: Reporting Generic > Notice tye:1 num:128 frm LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:33:40 015052 [44606960] - osm_report_notice: Reporting Generic > Notice type:3num:64 from ID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 9 18:33:40 015070[44606960] - Discovered new port with > GUID:0x0005ad0000024d6b LID range [0xB80xB8] of node:saguaro-23-9 HCA-1 > Mar 19 18:33:40 015074 [44606960] -> osm_repot_notice: Reporting Generic > Notice type3 num:64 from LID:0x0092 > GID:0fe80000000000000,0x0005ad0000024bbb > Mar 19 18:33:40 015080 [44606960] -> Discovered new port wit > GUID:0x0005ad000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 19 18:3:40 015083 [44606960] -> osm_repor_notice: Reporting Generic > Notice type:3 num:64 from LID:0x002 > GID:0xfe80000000000000,0x0005ad000002bbb > Mar 19 18:33:40 015088 [44606960] -> Discovered new port with > GUID:00005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar19 18:33:40 046164 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switchesMar 19 18:33:40 106627 [42803960] -> SBNET UP > Mar 19 18:33:40 145952 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 um:128 Producer:2 from LID:0x0148 > TID:0x0000000000000033 > Mar 19 18:3340 146076 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x018 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:33:40 146486 [44606960] -> __osm_trap_rcv_process_request:Received Generic Notice ype:0x01 num:128 Producer:2 from LID:0x001B > ID:0x000000000000003c > Mar 19 18:33:40 146611 [44606960] -> osm_report_notice: Reporting Generic > otice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:33:40 306176 [41401960]-> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 um:128 Producer:2 from LID:0x001B > TID:0x000000000000003d > Mar 19 18:33:40 306270 [41401960] -> os_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a > Mar 19 18:33:40 420009 [43C05960] -> __osm_trap_rcv_process_rquest:Received Generic Notice typ:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000019 > Mar 918:33:4420071 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 26 times conecutively > r 19 18:33:40 433566 [4280390] -> __osm_trap_rcv_process_request: > Reeived Geneic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x00000000000001a > Mar 19 1:33:40 433596 [42803960] -> __osm_traprcv_proess_request: ERR > 3804: Received trap 27 times consecutively > Mar 19 18:33:40 434996 [45007960] -> _osm_trap_cv_process_request: > Received Generic otice type:0x01 num:128 Producer:2 from LID:0x001BTID:0x000000000000003e > Mar 19 18:33:40 435041 [4500790] -> osm_reportotice: Reporting Generic > Notice ype:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad0000281a7 > Mar 19 18:33:40 485454 [3204960 -> osm_ucast_mgr_procss: Mi Hop Tables > configured on all switches > Mar 19 18:33:40 528816 [43C05960] -> __osm_trap_cv_process_requet: > Received Generic Noice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000003f > Mar 19 18:33:40 528960 [43C05960] -> osm_reort_notie: Reporting Generic > Notice type:1 nu:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:33:40 546019 [42803960] -> SUBNT UP > Mar 19 18:33:40 551048 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:x0000000000000034 > Mar 19 18:33:40 55119 [42803960] -> osm_report_notice: Reporting Generic > Notice typ:1 num:128 from LID:0x0148 > GID:0xfe8000000000000,0x0005ad00000281b3 > Mar 19 18:33:40 594994 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer2 from LID:0x001B > TID:0x0000000000000040 > Mar 19 18:33:40 595074 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LD:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:33:40 83973 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Prodcer:2 from LID:0x001B > TID:0x0000000000000041 > Mar 19 18:33:40 840064 [43204960] -> osm_report_notice: Reporting Gneric > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x005ad0000281a7 > Mar 19 18:33:40 861973 [43204960] -> __osm_trap_rcv_process_request: > Received Genric Notice type:0x01 num:128 Producer: from LID:0x001B > TID:0x0000000000000042 > Mar 19 18:33:40 862075 [43204960]-> osm_report_notice: Reporting Generic > Ntice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x005ad00000281a7 > Mar 19 18:33:40 83777 [43204960] -> __osm_trap_rcv_process_request: > Received Generic otice type:0x01 num:128 Producr:2 from LID:0x001B > TID:0x0000000000000043 > Mar 19 18:33:40 907658 [4803960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:33:40 947974 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:33:40 965203 [45007960] -> SUBNET UP > Mar 19 18:33:41 350582 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:33:41 417662 [43204960] -> SUBNET UP > Mar 19 18:33:41 571156 [45A08960] -> __osm_trap_rc_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001b > Mar 19 18:33:41 571256 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 28 times consecutively > Mar 19 18:35:50 971684 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000035 > Mar 19 18:35:50 971926 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:35:50 972301 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000044 > Mar 19 18:35:50 972378 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:35:51 342826 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 342845 [43204960] -> Removed port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 19 18:35:51 342866 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 342873 [43204960] -> Removed port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 19 18:35:51 342895 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 342901 [43204960] -> Removed port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 19 18:35:51 342923 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 342930 [43204960] -> Removed port with > GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > Mar 19 18:35:51 342968 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 342972 [43204960] -> Removed port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 19 18:35:51 342989 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 342994 [43204960] -> Removed port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 19 18:35:51 343011 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 343016 [43204960] -> Removed port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 19 18:35:51 343033 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 343038 [43204960] -> Removed port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 19 18:35:51 343189 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 343194 [43204960] -> Removed port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 19 18:35:51 343234 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 343239 [43204960] -> Removed port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 19 18:35:51 343253 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 343258 [43204960] -> Removed port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 19 18:35:51 343273 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 343278 [43204960] -> Removed port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 19 18:35:51 343293 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 343298 [43204960] -> Removed port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 19 18:35:51 343314 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 343319 [43204960] -> Removed port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 19 18:35:51 343334 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 343393 [43204960] -> Removed port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 19 18:35:51 343410 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 343415 [43204960] -> Removed port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 19 18:35:51 343430 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:35:51 343435 [43204960] -> Removed port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 19 18:35:51 376525 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:35:51 433087 [43204960] -> SUBNET UP > Mar 19 18:35:51 849193 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:35:51 901399 [42803960] -> SUBNET UP > Mar 19 18:36:44 359407 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000036 > Mar 19 18:36:44 359652 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:36:44 365352 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000037 > Mar 19 18:36:44 365427 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:36:44 365432 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000045 > Mar 19 18:36:44 365567 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:36:44 371481 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000046 > Mar 19 18:36:44 371591 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:36:44 711649 [43204960] -> osm_drop_mgr_process: ERR 0108: Unknown > remote side for node 0x0005ad0000027c84 port 19. Adding to light sweep > sampling list > Mar 19 18:36:44 711691 [43204960] -> Directed Path Dump of 5 hop path: > Path = [0][1][11][1][6][18] > Mar 19 18:36:44 711738 [43204960] -> osm_drop_mgr_process: ERR 0108: Unknown > remote side for node 0x0005ad00000281b3 port 24. Adding to light sweep > sampling list > Mar 19 18:36:44 711748 [43204960] -> Directed Path Dump of 4 hop path: > Path = [0][1][11][1][6] > Mar 19 18:36:44 721719 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721730 [43204960] -> Discovered new port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 19 18:36:44 721736 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721744 [43204960] -> Discovered new port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 19 18:36:44 721749 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721756 [43204960] -> Discovered new port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 19 18:36:44 721761 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721767 [43204960] -> Discovered new port with > GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > Mar 19 18:36:44 721772 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721779 [43204960] -> Discovered new port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 19 18:36:44 721784 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721790 [43204960] -> Discovered new port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 19 18:36:44 721795 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721802 [43204960] -> Discovered new port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 19 18:36:44 721826 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721831 [43204960] -> Discovered new port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 19 18:36:44 721845 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721850 [43204960] -> Discovered new port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 19 18:36:44 721854 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721859 [43204960] -> Discovered new port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 19 18:36:44 721862 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721867 [43204960] -> Discovered new port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 19 18:36:44 721871 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721876 [43204960] -> Discovered new port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 19 18:36:44 721880 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721884 [43204960] -> Discovered new port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 19 18:36:44 721888 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721893 [43204960] -> Discovered new port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 19 18:36:44 721897 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721923 [43204960] -> Discovered new port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 19 18:36:44 721927 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721932 [43204960] -> Discovered new port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 19 18:36:44 721936 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:36:44 721941 [43204960] -> Discovered new port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 19 18:36:44 752683 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:36:44 820881 [43C05960] -> SUBNET UP > Mar 19 18:36:45 198990 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:36:45 258878 [44606960] -> SUBNET UP > Mar 19 18:37:00 446068 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000000 > Mar 19 18:37:00 446346 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:37:00 564122 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000001 > Mar 19 18:37:00 564810 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:37:00 589920 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000002 > Mar 19 18:37:00 590067 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:37:00 611770 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000003 > Mar 19 18:37:00 611916 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:37:00 800652 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000004 > Mar 19 18:37:00 817995 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:37:00 861575 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:37:00 983908 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000005 > Mar 19 18:37:00 984004 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:37:01 012195 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000006 > Mar 19 18:37:01 012283 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:37:01 034177 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000007 > Mar 19 18:37:01 034272 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:37:01 056001 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000008 > Mar 19 18:37:01 056068 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:37:01 074341 [43204960] -> SUBNET UP > Mar 19 18:37:01 252871 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000009 > Mar 19 18:37:01 253037 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:37:01 303407 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000a > Mar 19 18:37:01 303490 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:37:01 325057 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000b > Mar 19 18:37:01 325160 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 11 times consecutively > Mar 19 18:37:01 334059 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000c > Mar 19 18:37:01 334118 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 12 times consecutively > Mar 19 18:37:01 474293 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:01 474317 [45007960] -> Removed port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 19 18:37:01 474341 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:01 474348 [45007960] -> Removed port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 19 18:37:01 474371 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:01 474378 [45007960] -> Removed port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 19 18:37:01 509205 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:37:01 557110 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000d > Mar 19 18:37:01 557172 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 13 times consecutively > Mar 19 18:37:01 565676 [43C05960] -> SUBNET UP > Mar 19 18:37:01 576199 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000e > Mar 19 18:37:01 576270 [41401960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 14 times consecutively > Mar 19 18:37:01 599713 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000f > Mar 19 18:37:01 599779 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 15 times consecutively > Mar 19 18:37:01 707096 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000010 > Mar 19 18:37:01 707150 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 16 times consecutively > Mar 19 18:37:01 921406 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:01 921423 [45A08960] -> Removed port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 19 18:37:01 921448 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:01 921455 [45A08960] -> Removed port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 19 18:37:01 921495 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:01 921502 [45A08960] -> Removed port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 19 18:37:01 925845 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:01 925855 [41E02960] -> Discovered new port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 19 18:37:01 925859 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:01 925864 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 19 18:37:01 925868 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:01 925873 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 19 18:37:01 956691 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:37:01 999372 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000011 > Mar 19 18:37:01 999470 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 17 times consecutively > Mar 19 18:37:02 012194 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000012 > Mar 19 18:37:02 012256 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 18 times consecutively > Mar 19 18:37:02 014327 [41401960] -> SUBNET UP > Mar 19 18:37:02 034202 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000013 > Mar 19 18:37:02 034250 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 19 times consecutively > Mar 19 18:37:02 056015 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000014 > Mar 19 18:37:02 056060 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 20 times consecutively > Mar 19 18:37:02 270731 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000015 > Mar 19 18:37:02 270777 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 21 times consecutively > Mar 19 18:37:02 271169 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000038 > Mar 19 18:37:02 271347 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:37:02 462374 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000039 > Mar 19 18:37:02 462511 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:37:02 496247 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000003a > Mar 19 18:37:02 496310 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:37:02 624890 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:02 624902 [45A08960] -> Discovered new port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 19 18:37:02 624908 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:02 624914 [45A08960] -> Discovered new port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 19 18:37:02 624919 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:02 624926 [45A08960] -> Discovered new port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 19 18:37:02 655848 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:37:02 709115 [42803960] -> SUBNET UP > Mar 19 18:37:03 082995 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000003b > Mar 19 18:37:03 106373 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:37:03 136757 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:37:03 178027 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000047 > Mar 19 18:37:03 178064 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000003c > Mar 19 18:37:03 178139 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:37:03 178160 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:37:03 315226 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000015 > Mar 19 18:37:03 315289 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 22 times consecutively > Mar 19 18:37:03 341474 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000016 > Mar 19 18:37:03 341549 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 23 times consecutively > Mar 19 18:37:03 341616 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000003d > Mar 19 18:37:03 342446 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:37:03 343169 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 19 18:37:03 343262 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x5 > trans_id................0x14d08 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x11 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][6][16] > Return path: [0][9][18][D][3][11] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 > > 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 19 18:37:03 343371 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 19 18:37:03 343364 [45007960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 19 18:37:03 343415 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x5 > trans_id................0x14d09 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x12 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][6][16] > Return path: [0][9][18][D][3][11] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 > > 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 19 18:37:03 343409 [45007960] -> PortInfo dump: > port number.............0x11 > node_guid...............0x0005ad0000027c84 > port_guid...............0x0005ad0000027c84 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x11 > link_width_enabled......0x2 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............INIT > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 19 18:37:03 343481 [45007960] -> Capabilities Mask: > Mar 19 18:37:03 343532 [45007960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 19 18:37:03 343537 [45007960] -> PortInfo dump: > port number.............0x12 > node_guid...............0x0005ad0000027c84 > port_guid...............0x0005ad0000027c84 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x11 > link_width_enabled......0x2 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............INIT > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 19 18:37:03 343555 [45007960] -> Capabilities Mask: > Mar 19 18:37:03 348684 [45007960] -> SUBNET UP > Mar 19 18:37:03 461748 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000048 > Mar 19 18:37:03 461958 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:37:03 484827 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000003e > Mar 19 18:37:03 486448 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:37:03 528040 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000049 > Mar 19 18:37:03 528154 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:37:03 580196 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000004a > Mar 19 18:37:03 580534 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:37:03 599784 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000004b > Mar 19 18:37:03 599879 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:37:03 621883 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000004c > Mar 19 18:37:03 621940 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:37:03 707894 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:37:03 764678 [43204960] -> SUBNET UP > Mar 19 18:37:03 783783 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000004d > Mar 19 18:37:03 783844 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:37:04 000228 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000004e > Mar 19 18:37:04 000628 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:37:04 022198 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000004f > Mar 19 18:37:04 022299 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:37:04 043985 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000050 > Mar 19 18:37:04 044052 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:37:04 155809 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:37:04 210448 [41401960] -> SUBNET UP > Mar 19 18:37:04 504490 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000017 > Mar 19 18:37:04 504569 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 24 times consecutively > Mar 19 18:37:04 570084 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:37:04 626298 [43C05960] -> SUBNET UP > Mar 19 18:37:54 424084 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000051 > Mar 19 18:37:54 424430 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:37:54 424457 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000003f > Mar 19 18:37:54 424522 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:37:54 722515 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 722536 [44606960] -> Removed port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 19 18:37:54 722558 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 722565 [44606960] -> Removed port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 19 18:37:54 722587 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 722594 [44606960] -> Removed port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 19 18:37:54 722636 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 722641 [44606960] -> Removed port with > GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > Mar 19 18:37:54 722658 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 722663 [44606960] -> Removed port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 19 18:37:54 722679 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 722684 [44606960] -> Removed port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 19 18:37:54 722701 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 722706 [44606960] -> Removed port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 19 18:37:54 722723 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 722728 [44606960] -> Removed port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 19 18:37:54 722875 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 722880 [44606960] -> Removed port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 19 18:37:54 722909 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 722915 [44606960] -> Removed port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 19 18:37:54 722929 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 722934 [44606960] -> Removed port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 19 18:37:54 722949 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 722955 [44606960] -> Removed port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 19 18:37:54 722970 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 722975 [44606960] -> Removed port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 19 18:37:54 722992 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 722997 [44606960] -> Removed port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 19 18:37:54 723012 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 723073 [44606960] -> Removed port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 19 18:37:54 723090 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 723095 [44606960] -> Removed port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 19 18:37:54 723111 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:37:54 723116 [44606960] -> Removed port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 19 18:37:54 756302 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:37:54 806787 [45A08960] -> SUBNET UP > Mar 19 18:37:55 149566 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:37:55 198855 [41401960] -> SUBNET UP > Mar 19 18:38:48 131054 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000040 > Mar 19 18:38:48 131349 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:38:48 137230 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000052 > Mar 19 18:38:48 137268 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000041 > Mar 19 18:38:48 137395 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:38:48 137432 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:38:48 143370 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000053 > Mar 19 18:38:48 144327 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:38:48 529052 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529065 [41E02960] -> Discovered new port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 19 18:38:48 529071 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529078 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > Mar 19 18:38:48 529083 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529090 [41E02960] -> Discovered new port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 19 18:38:48 529095 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529101 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 19 18:38:48 529106 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529113 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 19 18:38:48 529118 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529124 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 19 18:38:48 529129 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529136 [41E02960] -> Discovered new port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 19 18:38:48 529141 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529147 [41E02960] -> Discovered new port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 19 18:38:48 529152 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529159 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 19 18:38:48 529164 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529170 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 19 18:38:48 529175 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529182 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 19 18:38:48 529186 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529193 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 19 18:38:48 529198 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529204 [41E02960] -> Discovered new port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 19 18:38:48 529209 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529216 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 19 18:38:48 529271 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529277 [41E02960] -> Discovered new port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 19 18:38:48 529281 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529286 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 19 18:38:48 529290 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:38:48 529294 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 19 18:38:48 560082 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:38:48 630464 [43204960] -> SUBNET UP > Mar 19 18:38:49 018498 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:38:49 073355 [45007960] -> SUBNET UP > Mar 19 18:39:04 189829 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000000 > Mar 19 18:39:04 190072 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:39:04 307827 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000001 > Mar 19 18:39:04 307940 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:39:04 330104 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000002 > Mar 19 18:39:04 330210 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:39:04 468676 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000003 > Mar 19 18:39:04 468758 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:39:04 680305 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000004 > Mar 19 18:39:04 680400 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:39:04 702144 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000005 > Mar 19 18:39:04 702286 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:39:04 704346 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:39:04 704354 [43204960] -> Removed port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 19 18:39:04 739059 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:39:04 739896 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000006 > Mar 19 18:39:04 783807 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:39:04 797411 [44606960] -> SUBNET UP > Mar 19 18:39:04 849970 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000007 > Mar 19 18:39:04 850195 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:39:04 853735 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000008 > Mar 19 18:39:04 853809 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:39:04 897727 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000009 > Mar 19 18:39:04 897860 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:39:04 901577 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000a > Mar 19 18:39:04 901719 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 19 18:39:04 923271 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000b > Mar 19 18:39:04 923377 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 11 times consecutively > Mar 19 18:39:05 106246 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000c > Mar 19 18:39:05 106314 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 12 times consecutively > Mar 19 18:39:05 178215 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000d > Mar 19 18:39:05 178258 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 13 times consecutively > Mar 19 18:39:05 272913 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000e > Mar 19 18:39:05 272983 [42803960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 14 times consecutively > Mar 19 18:39:05 339633 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000f > Mar 19 18:39:05 339679 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 15 times consecutively > Mar 19 18:39:05 469093 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000010 > Mar 19 18:39:05 469145 [41401960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 16 times consecutively > Mar 19 18:39:05 484587 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000011 > Mar 19 18:39:05 484633 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 17 times consecutively > Mar 19 18:39:05 574251 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000012 > Mar 19 18:39:05 574301 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 18 times consecutively > Mar 19 18:39:05 602665 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000013 > Mar 19 18:39:05 602700 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 19 times consecutively > Mar 19 18:39:05 646331 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000014 > Mar 19 18:39:05 646369 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 20 times consecutively > Mar 19 18:39:05 834613 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000015 > Mar 19 18:39:05 834685 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 21 times consecutively > Mar 19 18:39:05 851128 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000016 > Mar 19 18:39:05 851166 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 22 times consecutively > Mar 19 18:39:05 875540 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000017 > Mar 19 18:39:05 875592 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 23 times consecutively > Mar 19 18:39:05 897378 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000018 > Mar 19 18:39:05 897424 [42803960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 24 times consecutively > Mar 19 18:39:05 907232 [4780B960] -> umad_receiver: ERR 5409: send completed > with error (method=0x1 attr=0x15 trans_id=0x124ef0001c2fe) -- dropping > Mar 19 18:39:05 907249 [4780B960] -> umad_receiver: ERR 5411: DR SMP > Mar 19 18:39:05 907259 [4780B960] -> __osm_sm_mad_ctrl_send_err_cb: ERR > 3113: MAD completed in error (IB_TIMEOUT) > Mar 19 18:39:05 907295 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x1 (SubnGet) > D bit...................0x0 > status..................0x0 > hop_ptr.................0x0 > hop_count...............0x6 > trans_id................0x1c2fe > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x1 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][6][16][8] > Return path: [0][0][0][0][0][0][0] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > Mar 19 18:39:05 907372 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:39:05 907384 [41401960] -> Removed port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 19 18:39:05 907407 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:39:05 907414 [41401960] -> Removed port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 19 18:39:05 907480 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:39:05 907485 [41401960] -> Removed port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 19 18:39:05 907577 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:39:05 907582 [41401960] -> Removed port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 19 18:39:05 907618 [41401960] -> osm_drop_mgr_process: ERR 0108: Unknown > remote side for node 0x0005ad0000027c84 port 8. Adding to light sweep > sampling list > Mar 19 18:39:05 907657 [41401960] -> Directed Path Dump of 5 hop path: > Path = [0][1][11][1][6][16] > Mar 19 18:39:05 911559 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:39:05 911572 [43204960] -> Discovered new port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 19 18:39:05 927229 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000019 > Mar 19 18:39:05 927285 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 25 times consecutively > Mar 19 18:39:05 942538 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:39:06 000027 [41E02960] -> SUBNET UP > Mar 19 18:39:06 130255 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001a > Mar 19 18:39:06 130308 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 26 times consecutively > Mar 19 18:39:06 131922 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000042 > Mar 19 18:39:06 132063 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:39:06 154579 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001b > Mar 19 18:39:06 154681 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 27 times consecutively > Mar 19 18:39:06 176248 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001c > Mar 19 18:39:06 176304 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 28 times consecutively > Mar 19 18:39:06 198132 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001d > Mar 19 18:39:06 198195 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 29 times consecutively > Mar 19 18:39:06 230022 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001e > Mar 19 18:39:06 230108 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 30 times consecutively > Mar 19 18:39:06 230229 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000043 > Mar 19 18:39:06 230311 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:39:06 399543 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:39:06 399556 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 19 18:39:06 399562 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:39:06 399569 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 19 18:39:06 399574 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:39:06 399580 [43C05960] -> Discovered new port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 19 18:39:06 399585 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 19 18:39:06 399592 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 19 18:39:06 430598 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:39:06 494689 [44606960] -> SUBNET UP > Mar 19 18:39:06 837303 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001f > Mar 19 18:39:06 837446 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 31 times consecutively > Mar 19 18:39:06 838528 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000044 > Mar 19 18:39:06 838636 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:39:06 876308 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:39:07 028376 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000020 > Mar 19 18:39:07 028459 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 32 times consecutively > Mar 19 18:39:07 028545 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000045 > Mar 19 18:39:07 028652 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:39:07 030190 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000054 > Mar 19 18:39:07 030277 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:39:07 096812 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000046 > Mar 19 18:39:07 096959 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:39:07 111719 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 19 18:39:07 111759 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x5 > trans_id................0x1dfac > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x11 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][4][16] > Return path: [0][9][18][D][1][11] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 > > 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 19 18:39:07 111810 [41E02960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 19 18:39:07 111814 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 19 18:39:07 111831 [41E02960] -> PortInfo dump: > port number.............0x11 > node_guid...............0x0005ad0000027c84 > port_guid...............0x0005ad0000027c84 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x11 > link_width_enabled......0x2 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............INIT > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 19 18:39:07 111868 [41E02960] -> Capabilities Mask: > Mar 19 18:39:07 111844 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x5 > trans_id................0x1dfad > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x12 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][4][16] > Return path: [0][9][18][D][1][11] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 > > 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 19 18:39:07 112011 [41401960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 19 18:39:07 112018 [41401960] -> PortInfo dump: > port number.............0x12 > node_guid...............0x0005ad0000027c84 > port_guid...............0x0005ad0000027c84 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x11 > link_width_enabled......0x2 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............INIT > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 19 18:39:07 112034 [41401960] -> Capabilities Mask: > Mar 19 18:39:07 117211 [45A08960] -> SUBNET UP > Mar 19 18:39:07 354540 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000047 > Mar 19 18:39:07 354686 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 19 18:39:07 383453 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000055 > Mar 19 18:39:07 383530 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:39:07 497601 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:39:07 548184 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000056 > Mar 19 18:39:07 548217 [43C05960] -> SUBNET UP > Mar 19 18:39:07 548427 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:39:07 878403 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000057 > Mar 19 18:39:07 887312 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000058 > Mar 19 18:39:07 888156 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:39:07 929819 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:39:07 929834 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:39:07 931166 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000059 > Mar 19 18:39:07 931288 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 19 18:39:07 946406 [42803960] -> SUBNET UP > Mar 19 18:39:08 073735 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000020 > Mar 19 18:39:08 073811 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 33 times consecutively > Mar 19 18:39:08 400790 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 18:39:08 467925 [45A08960] -> SUBNET UP > Mar 19 20:24:07 009911 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0020 > TID:0x0000000000000020 > Mar 19 20:24:07 010153 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0020 > GID:0xfe80000000000000,0x0005ad00000281ad > Mar 19 20:24:07 010966 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0001 > TID:0x000000000000001a > Mar 19 20:24:07 011064 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0001 > GID:0xfe80000000000000,0x0005ad0000027c6a > Mar 19 20:24:07 390927 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 20:24:07 453747 [43204960] -> SUBNET UP > Mar 19 20:24:07 839927 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 20:24:07 895694 [45A08960] -> SUBNET UP > Mar 19 20:24:08 049066 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0001 > TID:0x000000000000001a > Mar 19 20:24:08 049322 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0001 > GID:0xfe80000000000000,0x0005ad0000027c6a > Mar 19 20:24:08 433979 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 20:24:08 487950 [43204960] -> SUBNET UP > Mar 19 20:26:28 608381 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0020 > TID:0x0000000000000021 > Mar 19 20:26:28 608406 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0001 > TID:0x000000000000001b > Mar 19 20:26:28 608685 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0020 > GID:0xfe80000000000000,0x0005ad00000281ad > Mar 19 20:26:28 608693 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0001 > GID:0xfe80000000000000,0x0005ad0000027c6a > Mar 19 20:26:28 972140 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 20:26:29 028682 [43C05960] -> SUBNET UP > Mar 19 20:26:29 399649 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 20:26:29 465737 [45007960] -> SUBNET UP > Mar 19 21:30:38 775260 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0146 > TID:0x000000000000002f > Mar 19 21:30:38 775533 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0146 > GID:0xfe80000000000000,0x0005ad00000281b6 > Mar 19 21:30:38 777083 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0143 > TID:0x0000000000000037 > Mar 19 21:30:38 777242 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0143 > GID:0xfe80000000000000,0x0005ad00000281b9 > Mar 19 21:30:39 144779 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 21:30:39 200635 [43204960] -> SUBNET UP > Mar 19 21:30:39 536003 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 19 21:30:39 591216 [42803960] -> SUBNET UP > Mar 20 14:06:48 971082 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000021 > Mar 20 14:06:48 971376 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:06:49 346734 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:06:49 346761 [42803960] -> Removed port with > GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > Mar 20 14:06:49 381394 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:06:49 440803 [43204960] -> SUBNET UP > Mar 20 14:07:09 098449 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000048 > Mar 20 14:07:09 098708 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:07:09 098733 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000005a > Mar 20 14:07:09 098777 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:07:09 417844 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 417862 [42803960] -> Removed port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:07:09 417879 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 417885 [42803960] -> Removed port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:07:09 417902 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 417907 [42803960] -> Removed port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 20 14:07:09 417924 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 417929 [42803960] -> Removed port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:07:09 417945 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 417951 [42803960] -> Removed port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:07:09 417967 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 417973 [42803960] -> Removed port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 20 14:07:09 417989 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 417994 [42803960] -> Removed port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:07:09 418131 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 418137 [42803960] -> Removed port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 20 14:07:09 418168 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 418173 [42803960] -> Removed port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:07:09 418188 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 418193 [42803960] -> Removed port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 20 14:07:09 418207 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 418212 [42803960] -> Removed port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:07:09 418227 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 418232 [42803960] -> Removed port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 20 14:07:09 418248 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 418253 [42803960] -> Removed port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:07:09 418285 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 418290 [42803960] -> Removed port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:07:09 418306 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 418362 [42803960] -> Removed port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:07:09 418378 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:07:09 418383 [42803960] -> Removed port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 20 14:07:09 451317 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:07:09 502755 [41401960] -> SUBNET UP > Mar 20 14:07:09 902534 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:07:09 955229 [45A08960] -> SUBNET UP > Mar 20 14:08:03 850926 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000049 > Mar 20 14:08:03 851134 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:08:03 856880 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000004a > Mar 20 14:08:03 856955 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:08:03 866819 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000005b > Mar 20 14:08:03 866977 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:08:03 963024 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000005c > Mar 20 14:08:03 963130 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:08:04 106856 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000005d > Mar 20 14:08:04 106995 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:08:04 193747 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193766 [44606960] -> Discovered new port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 20 14:08:04 193771 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193777 [44606960] -> Discovered new port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 20 14:08:04 193781 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193786 [44606960] -> Discovered new port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:08:04 193790 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193795 [44606960] -> Discovered new port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:08:04 193799 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193804 [44606960] -> Discovered new port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:08:04 193808 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193813 [44606960] -> Discovered new port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:08:04 193817 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193822 [44606960] -> Discovered new port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:08:04 193826 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193830 [44606960] -> Discovered new port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:08:04 193834 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193839 [44606960] -> Discovered new port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:08:04 193843 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193848 [44606960] -> Discovered new port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 20 14:08:04 193852 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193857 [44606960] -> Discovered new port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 20 14:08:04 193861 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193866 [44606960] -> Discovered new port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 20 14:08:04 193870 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193874 [44606960] -> Discovered new port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:08:04 193878 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193883 [44606960] -> Discovered new port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 20 14:08:04 193938 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193944 [44606960] -> Discovered new port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:08:04 193948 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:04 193953 [44606960] -> Discovered new port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:08:04 224695 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:08:04 281046 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:08:04 281106 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x61eec > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x13 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][17][2][4] > Return path: [0][9][14][E][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:08:04 281154 [41401960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:08:04 281159 [41401960] -> PortInfo dump: > port number.............0x13 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:08:04 281172 [41401960] -> Capabilities Mask: > Mar 20 14:08:04 281187 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:08:04 281213 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x61eed > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][17][2][4] > Return path: [0][9][14][E][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:08:04 281279 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:08:04 281316 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x61eee > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][17][2][4] > Return path: [0][9][14][E][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:08:04 281392 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:08:04 281416 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x61eef > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][6] > Return path: [0][9][18][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:08:04 281515 [44606960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:08:04 281522 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:08:04 281542 [44606960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:08:04 281553 [44606960] -> Capabilities Mask: > Mar 20 14:08:04 281561 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x61ef0 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][6] > Return path: [0][9][18][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:08:04 281572 [44606960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:08:04 281590 [44606960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:08:04 281600 [44606960] -> Capabilities Mask: > Mar 20 14:08:04 281623 [44606960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:08:04 281626 [44606960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:08:04 281635 [44606960] -> Capabilities Mask: > Mar 20 14:08:04 281637 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:08:04 281652 [44606960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:08:04 281663 [44606960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:08:04 281673 [44606960] -> Capabilities Mask: > Mar 20 14:08:04 281675 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x61ef1 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][6] > Return path: [0][9][18][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:08:04 281721 [41E02960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:08:04 281726 [41E02960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:08:04 281736 [41E02960] -> Capabilities Mask: > Mar 20 14:08:04 287136 [44606960] -> SUBNET UP > Mar 20 14:08:04 711595 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:08:04 766488 [45A08960] -> SUBNET UP > Mar 20 14:08:19 947200 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000000 > Mar 20 14:08:19 947479 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:08:20 086909 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000001 > Mar 20 14:08:20 087084 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:08:20 108865 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000002 > Mar 20 14:08:20 109210 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:08:20 109996 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000003 > Mar 20 14:08:20 110407 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:08:20 222523 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000004 > Mar 20 14:08:20 222613 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:08:20 404596 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000005 > Mar 20 14:08:20 404698 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:08:20 476804 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000006 > Mar 20 14:08:20 476897 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:08:20 572434 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000007 > Mar 20 14:08:20 572520 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:08:20 621715 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:20 621726 [42803960] -> Removed port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:08:20 656232 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:08:20 698700 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000008 > Mar 20 14:08:20 698794 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:08:20 708598 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000009 > Mar 20 14:08:20 708698 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:08:20 713653 [45007960] -> SUBNET UP > Mar 20 14:08:20 730554 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000a > Mar 20 14:08:20 730697 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:08:20 754139 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000b > Mar 20 14:08:20 754251 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 11 times consecutively > Mar 20 14:08:20 947339 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000c > Mar 20 14:08:20 947426 [41401960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 12 times consecutively > Mar 20 14:08:20 975965 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000d > Mar 20 14:08:20 976024 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 13 times consecutively > Mar 20 14:08:20 997569 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000e > Mar 20 14:08:20 997648 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 14 times consecutively > Mar 20 14:08:21 019465 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000f > Mar 20 14:08:21 019512 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 15 times consecutively > Mar 20 14:08:21 064967 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000010 > Mar 20 14:08:21 065009 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 16 times consecutively > Mar 20 14:08:21 082838 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000011 > Mar 20 14:08:21 082877 [41401960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 17 times consecutively > Mar 20 14:08:21 100567 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000012 > Mar 20 14:08:21 100619 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 18 times consecutively > Mar 20 14:08:21 188128 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:21 188144 [43C05960] -> Removed port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:08:21 188166 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:21 188172 [43C05960] -> Removed port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:08:21 188194 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:21 188199 [43C05960] -> Removed port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:08:21 192421 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:21 192436 [41E02960] -> Discovered new port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:08:21 208455 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000013 > Mar 20 14:08:21 208499 [41401960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 19 times consecutively > Mar 20 14:08:21 223240 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:08:21 394585 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000014 > Mar 20 14:08:21 394665 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 20 times consecutively > Mar 20 14:08:21 419333 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000015 > Mar 20 14:08:21 419393 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 21 times consecutively > Mar 20 14:08:21 441228 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000016 > Mar 20 14:08:21 441276 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 22 times consecutively > Mar 20 14:08:21 462915 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000017 > Mar 20 14:08:21 462968 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 23 times consecutively > Mar 20 14:08:21 475440 [45007960] -> SUBNET UP > Mar 20 14:08:21 674045 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000018 > Mar 20 14:08:21 674084 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000004b > Mar 20 14:08:21 674137 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 24 times consecutively > Mar 20 14:08:21 674294 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:08:21 965885 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000004c > Mar 20 14:08:21 965992 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:08:22 092378 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:22 092395 [41401960] -> Removed port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:08:22 092415 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:22 092420 [41401960] -> Removed port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:08:22 092444 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:22 092449 [41401960] -> Removed port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:08:22 092625 [41401960] -> osm_drop_mgr_process: ERR 0108: Unknown > remote side for node 0x0005ad00000281b3 port 22. Adding to light sweep > sampling list > Mar 20 14:08:22 092655 [41401960] -> Directed Path Dump of 4 hop path: > Path = [0][1][11][1][4] > Mar 20 14:08:22 092663 [41401960] -> osm_drop_mgr_process: ERR 0108: Unknown > remote side for node 0x0005ad00000281b3 port 23. Adding to light sweep > sampling list > Mar 20 14:08:22 092672 [41401960] -> Directed Path Dump of 4 hop path: > Path = [0][1][11][1][4] > Mar 20 14:08:22 096789 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:22 096801 [41E02960] -> Discovered new port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:08:22 096805 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:22 096810 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:08:22 096814 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:22 096819 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:08:22 127266 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:08:22 184734 [45007960] -> SUBNET UP > Mar 20 14:08:22 541974 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:22 541985 [41401960] -> Discovered new port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:08:22 541989 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:22 541995 [41401960] -> Discovered new port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:08:22 541998 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:08:22 542003 [41401960] -> Discovered new port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:08:22 572711 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:08:22 611570 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000004d > Mar 20 14:08:22 611751 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:08:22 611770 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000005e > Mar 20 14:08:22 612060 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:08:22 623766 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:08:22 623814 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x66134 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][5] > Return path: [0][9][18][D][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:08:22 623876 [45007960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:08:22 623888 [45007960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:08:22 623907 [45007960] -> Capabilities Mask: > Mar 20 14:08:22 623945 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:08:22 623973 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x66135 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][5] > Return path: [0][9][18][D][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:08:22 624051 [44606960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:08:22 624056 [44606960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:08:22 624069 [44606960] -> Capabilities Mask: > Mar 20 14:08:22 629289 [45A08960] -> SUBNET UP > Mar 20 14:08:22 712180 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000018 > Mar 20 14:08:22 712238 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 25 times consecutively > Mar 20 14:08:22 869303 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000005f > Mar 20 14:08:22 869527 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:08:22 892522 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000004e > Mar 20 14:08:22 892707 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:08:22 957086 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000060 > Mar 20 14:08:22 957189 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:08:23 080551 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000061 > Mar 20 14:08:23 080621 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:08:23 102292 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000062 > Mar 20 14:08:23 102372 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:08:23 124176 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000063 > Mar 20 14:08:23 124278 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:08:23 285320 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000064 > Mar 20 14:08:23 285393 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:08:23 403309 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000065 > Mar 20 14:08:23 403388 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:08:23 425052 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000066 > Mar 20 14:08:23 425117 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:08:23 447189 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000067 > Mar 20 14:08:23 447266 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:08:23 535175 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:08:23 595127 [41401960] -> SUBNET UP > Mar 20 14:08:23 750323 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000018 > Mar 20 14:08:23 750432 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 26 times consecutively > Mar 20 14:08:23 960490 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:08:24 014256 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:08:24 014323 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x67b9d > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][6] > Return path: [0][9][18][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:08:24 014398 [41401960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:08:24 014408 [41401960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:08:24 014422 [41401960] -> Capabilities Mask: > Mar 20 14:08:24 019479 [41401960] -> SUBNET UP > Mar 20 14:11:00 201308 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F > TID:0x0000000000000018 > Mar 20 14:11:00 201580 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001F > GID:0xfe80000000000000,0x0005ad0000027c56 > Mar 20 14:11:00 554517 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:11:00 554538 [41E02960] -> Removed port with > GUID:0x0005ad000002516f LID range [0xBA,0xBA] of node:saguaro-24-1 HCA-1 > Mar 20 14:11:00 589140 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:11:00 641315 [45A08960] -> SUBNET UP > Mar 20 14:14:16 904140 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000068 > Mar 20 14:14:16 904369 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:14:16 904462 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000004f > Mar 20 14:14:16 904600 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:14:17 210726 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 210747 [41401960] -> Removed port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:14:17 210796 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 210802 [41401960] -> Removed port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:14:17 210818 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 210836 [41401960] -> Removed port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 20 14:14:17 210864 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 210869 [41401960] -> Removed port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:14:17 210885 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 210890 [41401960] -> Removed port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:14:17 210908 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 210913 [41401960] -> Removed port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 20 14:14:17 210931 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 210936 [41401960] -> Removed port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:14:17 211090 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 211096 [41401960] -> Removed port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 20 14:14:17 211127 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 211133 [41401960] -> Removed port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:14:17 211147 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 211153 [41401960] -> Removed port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 20 14:14:17 211169 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 211174 [41401960] -> Removed port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:14:17 211189 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 211194 [41401960] -> Removed port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 20 14:14:17 211212 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 211216 [41401960] -> Removed port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:14:17 211232 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 211237 [41401960] -> Removed port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:14:17 211253 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 211317 [41401960] -> Removed port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:14:17 211333 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:14:17 211338 [41401960] -> Removed port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 20 14:14:17 244432 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:14:17 292747 [42803960] -> SUBNET UP > Mar 20 14:14:17 698554 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:14:17 750419 [44606960] -> SUBNET UP > Mar 20 14:15:11 300343 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000050 > Mar 20 14:15:11 300577 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:15:11 306375 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000069 > Mar 20 14:15:11 306439 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000051 > Mar 20 14:15:11 306487 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:15:11 306514 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:15:11 312487 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000006a > Mar 20 14:15:11 312581 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:15:11 636546 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636559 [45007960] -> Discovered new port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 20 14:15:11 636565 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636572 [45007960] -> Discovered new port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 20 14:15:11 636577 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636584 [45007960] -> Discovered new port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:15:11 636589 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636595 [45007960] -> Discovered new port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:15:11 636600 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636606 [45007960] -> Discovered new port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:15:11 636612 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636618 [45007960] -> Discovered new port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:15:11 636623 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636629 [45007960] -> Discovered new port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:15:11 636634 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636641 [45007960] -> Discovered new port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:15:11 636646 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636652 [45007960] -> Discovered new port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:15:11 636657 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636663 [45007960] -> Discovered new port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 20 14:15:11 636668 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636675 [45007960] -> Discovered new port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 20 14:15:11 636680 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636686 [45007960] -> Discovered new port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 20 14:15:11 636691 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636698 [45007960] -> Discovered new port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:15:11 636703 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636709 [45007960] -> Discovered new port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 20 14:15:11 636742 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636750 [45007960] -> Discovered new port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:15:11 636755 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:11 636761 [45007960] -> Discovered new port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:15:11 667436 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:15:11 731917 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:15:11 732017 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x6b507 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x13 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][16][1][4] > Return path: [0][9][13][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:15:11 732102 [41401960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:15:11 732106 [41401960] -> PortInfo dump: > port number.............0x13 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:15:11 732128 [41401960] -> Capabilities Mask: > Mar 20 14:15:11 732160 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:15:11 732185 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x6b508 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][16][1][4] > Return path: [0][9][13][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:15:11 732254 [44606960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:15:11 732258 [44606960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:15:11 732269 [44606960] -> Capabilities Mask: > Mar 20 14:15:11 732300 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:15:11 732334 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x6b509 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][16][1][4] > Return path: [0][9][13][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:15:11 732420 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:15:11 732419 [45007960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:15:11 732451 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x6b50a > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][16][1][4] > Return path: [0][9][13][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:15:11 732447 [45007960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:15:11 732471 [45007960] -> Capabilities Mask: > Mar 20 14:15:11 732511 [45007960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:15:11 732516 [45007960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:15:11 732529 [45007960] -> Capabilities Mask: > Mar 20 14:15:11 732556 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:15:11 732591 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x6b50b > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][2][5] > Return path: [0][9][18][E][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:15:11 732653 [43204960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:15:11 732662 [43204960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:15:11 732673 [43204960] -> Capabilities Mask: > Mar 20 14:15:11 732705 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:15:11 732739 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x6b50c > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][2][5] > Return path: [0][9][18][E][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:15:11 732809 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:15:11 732805 [41E02960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:15:11 732839 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x6b50d > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][2][5] > Return path: [0][9][18][E][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:15:11 732837 [41E02960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:15:11 732856 [41E02960] -> Capabilities Mask: > Mar 20 14:15:11 732898 [41E02960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:15:11 732911 [41E02960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:15:11 732925 [41E02960] -> Capabilities Mask: > Mar 20 14:15:11 738354 [45A08960] -> SUBNET UP > Mar 20 14:15:12 115658 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:15:12 172029 [44606960] -> SUBNET UP > Mar 20 14:15:27 277617 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000000 > Mar 20 14:15:27 277863 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:15:27 510410 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000001 > Mar 20 14:15:27 510626 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:15:27 532239 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000002 > Mar 20 14:15:27 532443 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:15:27 533517 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000003 > Mar 20 14:15:27 533612 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:15:27 591171 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:27 591185 [41401960] -> Removed port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 20 14:15:27 591206 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:27 591211 [41401960] -> Removed port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:15:27 625811 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:15:27 668356 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000004 > Mar 20 14:15:27 668485 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:15:27 682282 [43204960] -> SUBNET UP > Mar 20 14:15:27 737313 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000005 > Mar 20 14:15:27 737387 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:15:27 809341 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000006 > Mar 20 14:15:27 809813 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:15:27 998181 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000007 > Mar 20 14:15:27 998331 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:15:28 012193 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000008 > Mar 20 14:15:28 012277 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:15:28 496329 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000009 > Mar 20 14:15:28 496422 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:15:28 624912 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:28 624940 [43C05960] -> Removed port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:15:28 624965 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:28 624972 [43C05960] -> Removed port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:15:28 625001 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:28 625008 [43C05960] -> Removed port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:15:28 629507 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:28 629518 [42803960] -> Discovered new port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:15:28 649776 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000a > Mar 20 14:15:28 660297 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:15:28 699777 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:15:28 716354 [41E02960] -> SUBNET UP > Mar 20 14:15:28 744686 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000b > Mar 20 14:15:28 744857 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 11 times consecutively > Mar 20 14:15:28 811329 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000c > Mar 20 14:15:28 811392 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 12 times consecutively > Mar 20 14:15:28 999808 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000d > Mar 20 14:15:28 999881 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 13 times consecutively > Mar 20 14:15:29 029918 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000e > Mar 20 14:15:29 029969 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 14 times consecutively > Mar 20 14:15:29 031783 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000052 > Mar 20 14:15:29 031900 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:15:29 037646 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:29 037662 [44606960] -> Removed port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 20 14:15:29 037683 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:29 037690 [44606960] -> Removed port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:15:29 037721 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:29 037726 [44606960] -> Removed port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 20 14:15:29 037741 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:29 037746 [44606960] -> Removed port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:15:29 037766 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:29 037771 [44606960] -> Removed port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 20 14:15:29 361560 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000053 > Mar 20 14:15:29 361622 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:15:29 433665 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:29 433674 [43204960] -> Discovered new port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 20 14:15:29 433680 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:29 433687 [43204960] -> Discovered new port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:15:29 433692 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:29 433698 [43204960] -> Discovered new port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:15:29 433703 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:29 433709 [43204960] -> Discovered new port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:15:29 464434 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:15:29 522011 [42803960] -> SUBNET UP > Mar 20 14:15:29 699605 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000006b > Mar 20 14:15:29 699782 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:15:29 701115 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000054 > Mar 20 14:15:29 701301 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:15:29 818974 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000055 > Mar 20 14:15:29 819054 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:15:29 992006 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000056 > Mar 20 14:15:29 992080 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:15:30 184132 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000006c > Mar 20 14:15:30 184205 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:15:30 207030 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000057 > Mar 20 14:15:30 207101 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:15:30 250541 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000006d > Mar 20 14:15:30 250635 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:15:30 317366 [45A08960] -> osm_drop_mgr_process: ERR 0108: Unknown > remote side for node 0x0005ad00000281a7 port 22. Adding to light sweep > sampling list > Mar 20 14:15:30 317409 [45A08960] -> Directed Path Dump of 4 hop path: > Path = [0][1][17][1][4] > Mar 20 14:15:30 494183 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000006e > Mar 20 14:15:30 494247 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:15:30 521869 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:30 521879 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 20 14:15:30 521885 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:30 521891 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 20 14:15:30 521896 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:30 521903 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:15:30 521908 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:30 521914 [43C05960] -> Discovered new port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 20 14:15:30 521919 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:15:30 521926 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:15:30 552581 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:15:30 553014 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000006f > Mar 20 14:15:30 592863 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:15:30 607595 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:15:30 607666 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x6f744 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][14][1][6] > Return path: [0][9][15][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:15:30 607770 [44606960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:15:30 607777 [44606960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:15:30 607794 [44606960] -> Capabilities Mask: > Mar 20 14:15:30 607914 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:15:30 607958 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x6f745 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][14][1][6] > Return path: [0][9][15][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:15:30 608014 [43204960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:15:30 608018 [43204960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:15:30 608031 [43204960] -> Capabilities Mask: > Mar 20 14:15:30 613309 [41E02960] -> SUBNET UP > Mar 20 14:15:30 995108 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:15:31 050102 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:15:31 050180 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x70486 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][4] > Return path: [0][9][18][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:15:31 050233 [45A08960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:15:31 050238 [45A08960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:15:31 050251 [45A08960] -> Capabilities Mask: > Mar 20 14:15:31 055273 [42803960] -> SUBNET UP > Mar 20 14:15:31 106129 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000e > Mar 20 14:15:31 106193 [41401960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 15 times consecutively > Mar 20 14:17:18 456260 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000058 > Mar 20 14:17:18 456512 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:17:18 456649 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000070 > Mar 20 14:17:18 456761 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:17:18 769730 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 769751 [45007960] -> Removed port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:17:18 769773 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 769780 [45007960] -> Removed port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:17:18 769803 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 769809 [45007960] -> Removed port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 20 14:17:18 769832 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 769838 [45007960] -> Removed port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:17:18 769858 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 769865 [45007960] -> Removed port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:17:18 769888 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 769895 [45007960] -> Removed port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 20 14:17:18 769927 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 769932 [45007960] -> Removed port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:17:18 770075 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 770081 [45007960] -> Removed port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 20 14:17:18 770109 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 770114 [45007960] -> Removed port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:17:18 770130 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 770135 [45007960] -> Removed port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 20 14:17:18 770150 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 770155 [45007960] -> Removed port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:17:18 770171 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 770176 [45007960] -> Removed port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 20 14:17:18 770193 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 770198 [45007960] -> Removed port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:17:18 770216 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 770221 [45007960] -> Removed port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:17:18 770238 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 770301 [45007960] -> Removed port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:17:18 770318 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:17:18 770323 [45007960] -> Removed port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 20 14:17:18 803377 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:17:18 855545 [44606960] -> SUBNET UP > Mar 20 14:17:19 249722 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:17:19 300999 [45A08960] -> SUBNET UP > Mar 20 14:18:11 663850 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000059 > Mar 20 14:18:11 664195 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:18:11 670836 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000071 > Mar 20 14:18:11 670964 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000005a > Mar 20 14:18:11 671199 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:18:11 672933 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:18:11 677654 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000072 > Mar 20 14:18:11 677826 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:18:12 026661 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026675 [44606960] -> Discovered new port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 20 14:18:12 026681 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026688 [44606960] -> Discovered new port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 20 14:18:12 026693 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026700 [44606960] -> Discovered new port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:18:12 026705 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026711 [44606960] -> Discovered new port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:18:12 026716 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026723 [44606960] -> Discovered new port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:18:12 026727 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026740 [44606960] -> Discovered new port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:18:12 026745 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026751 [44606960] -> Discovered new port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:18:12 026758 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026764 [44606960] -> Discovered new port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:18:12 026769 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026776 [44606960] -> Discovered new port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:18:12 026781 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026787 [44606960] -> Discovered new port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 20 14:18:12 026792 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026798 [44606960] -> Discovered new port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 20 14:18:12 026803 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026809 [44606960] -> Discovered new port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 20 14:18:12 026814 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026821 [44606960] -> Discovered new port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:18:12 026826 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026832 [44606960] -> Discovered new port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 20 14:18:12 026869 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026877 [44606960] -> Discovered new port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:18:12 026882 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:12 026888 [44606960] -> Discovered new port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:18:12 057534 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:18:12 133316 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:18:12 133419 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x72d97 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][14][3][6] > Return path: [0][9][15][F][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:18:12 133466 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:18:12 133467 [43204960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:18:12 133478 [43204960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:18:12 133490 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x72d98 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][14][3][6] > Return path: [0][9][15][F][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:18:12 133493 [43204960] -> Capabilities Mask: > Mar 20 14:18:12 133566 [45A08960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:18:12 133595 [45A08960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:18:12 133614 [45A08960] -> Capabilities Mask: > Mar 20 14:18:12 133583 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:18:12 133671 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x72d99 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][14][3][6] > Return path: [0][9][15][F][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:18:12 133760 [43C05960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:18:12 133788 [43C05960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:18:12 133807 [43C05960] -> Capabilities Mask: > Mar 20 14:18:12 139330 [41401960] -> SUBNET UP > Mar 20 14:18:12 496444 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:18:12 558965 [41401960] -> SUBNET UP > Mar 20 14:18:27 748551 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000000 > Mar 20 14:18:27 748795 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:18:27 888669 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000001 > Mar 20 14:18:27 888902 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:18:27 910605 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000002 > Mar 20 14:18:27 910710 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:18:27 911951 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000003 > Mar 20 14:18:27 912119 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:18:28 012957 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000004 > Mar 20 14:18:28 013058 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:18:28 075266 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000005 > Mar 20 14:18:28 075397 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:18:28 259000 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000006 > Mar 20 14:18:28 259121 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:18:28 308865 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000007 > Mar 20 14:18:28 309000 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:18:28 330606 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000008 > Mar 20 14:18:28 330714 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:18:28 444107 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000009 > Mar 20 14:18:28 444191 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:18:28 466156 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000a > Mar 20 14:18:28 466234 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:18:28 478021 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000b > Mar 20 14:18:28 478070 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 11 times consecutively > Mar 20 14:18:28 489091 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:28 489106 [43204960] -> Removed port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:18:28 521430 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000c > Mar 20 14:18:28 521499 [42803960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 12 times consecutively > Mar 20 14:18:28 523658 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:18:28 580295 [43204960] -> SUBNET UP > Mar 20 14:18:28 611805 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000d > Mar 20 14:18:28 611893 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 13 times consecutively > Mar 20 14:18:28 661292 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000e > Mar 20 14:18:28 661351 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 14 times consecutively > Mar 20 14:18:28 871670 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000f > Mar 20 14:18:28 871732 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 15 times consecutively > Mar 20 14:18:28 934440 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000010 > Mar 20 14:18:28 934505 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 16 times consecutively > Mar 20 14:18:28 941281 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:28 941303 [45A08960] -> Removed port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:18:28 941329 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:28 941336 [45A08960] -> Removed port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:18:28 941356 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:28 941363 [45A08960] -> Removed port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:18:28 941388 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:28 941395 [45A08960] -> Removed port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:18:28 941420 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:28 941426 [45A08960] -> Removed port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:18:28 945507 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:28 945515 [45A08960] -> Discovered new port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:18:28 956576 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000011 > Mar 20 14:18:28 956665 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 17 times consecutively > Mar 20 14:18:28 976211 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:18:29 033513 [42803960] -> SUBNET UP > Mar 20 14:18:29 071283 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000012 > Mar 20 14:18:29 071345 [41401960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 18 times consecutively > Mar 20 14:18:29 352103 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000013 > Mar 20 14:18:29 352155 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 19 times consecutively > Mar 20 14:18:29 376386 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000014 > Mar 20 14:18:29 376461 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 20 times consecutively > Mar 20 14:18:29 420228 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000015 > Mar 20 14:18:29 420282 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 21 times consecutively > Mar 20 14:18:29 421294 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000016 > Mar 20 14:18:29 421345 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 22 times consecutively > Mar 20 14:18:29 461135 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000017 > Mar 20 14:18:29 461179 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 23 times consecutively > Mar 20 14:18:29 633008 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000018 > Mar 20 14:18:29 633050 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000005b > Mar 20 14:18:29 633062 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 24 times consecutively > Mar 20 14:18:29 633350 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:18:29 733039 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000005c > Mar 20 14:18:29 733238 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:18:29 947440 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:29 947452 [44606960] -> Discovered new port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:18:29 947457 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:29 947462 [44606960] -> Discovered new port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:18:29 947465 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:29 947470 [44606960] -> Discovered new port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:18:29 947474 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:29 947479 [44606960] -> Discovered new port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:18:29 947482 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:29 947487 [44606960] -> Discovered new port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:18:29 978182 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:18:30 027730 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:18:30 027819 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x762b8 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][4] > Return path: [0][9][18][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:18:30 027897 [41401960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:18:30 027901 [41401960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:18:30 027914 [41401960] -> Capabilities Mask: > Mar 20 14:18:30 027993 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:18:30 028043 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x762b9 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][4] > Return path: [0][9][18][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:18:30 028098 [45A08960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:18:30 028109 [45A08960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:18:30 028124 [45A08960] -> Capabilities Mask: > Mar 20 14:18:30 033824 [44606960] -> SUBNET UP > Mar 20 14:18:30 418497 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:30 418522 [43C05960] -> Removed port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:18:30 453167 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:18:30 494719 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000005d > Mar 20 14:18:30 494877 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:18:30 662496 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000073 > Mar 20 14:18:30 662564 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:18:30 662645 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000005e > Mar 20 14:18:30 662759 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:18:30 707085 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000005f > Mar 20 14:18:30 707179 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:18:30 728948 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000060 > Mar 20 14:18:30 729041 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:18:30 872332 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000061 > Mar 20 14:18:30 872412 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:18:30 899764 [45A08960] -> SUBNET UP > Mar 20 14:18:31 047423 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000062 > Mar 20 14:18:31 047611 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:18:31 165201 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000063 > Mar 20 14:18:31 165272 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:18:31 182461 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000074 > Mar 20 14:18:31 182653 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:18:31 248834 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000075 > Mar 20 14:18:31 248893 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:18:31 499830 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000076 > Mar 20 14:18:31 499908 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:18:31 521824 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000077 > Mar 20 14:18:31 521891 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:18:31 543713 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000078 > Mar 20 14:18:31 543784 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:18:31 589490 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:18:31 589499 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:18:31 620166 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:18:31 672647 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:18:31 672739 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x77d11 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][4] > Return path: [0][9][18][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:18:31 672817 [43C05960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:18:31 672823 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:18:31 672833 [43C05960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:18:31 672852 [43C05960] -> Capabilities Mask: > Mar 20 14:18:31 672861 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x77d12 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][4] > Return path: [0][9][18][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:18:31 672918 [45007960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:18:31 672922 [45007960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:18:31 672936 [45007960] -> Capabilities Mask: > Mar 20 14:18:31 678085 [45A08960] -> SUBNET UP > Mar 20 14:18:31 723715 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000018 > Mar 20 14:18:31 723815 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 25 times consecutively > Mar 20 14:18:32 061932 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:18:32 113545 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:18:32 113610 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x78a4d > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x13 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][15][4][4] > Return path: [0][9][18][D][4] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:18:32 113712 [42803960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:18:32 113725 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:18:32 113730 [42803960] -> PortInfo dump: > port number.............0x13 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x4 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:18:32 113745 [42803960] -> Capabilities Mask: > Mar 20 14:18:32 113751 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x78a4e > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][15][4][4] > Return path: [0][9][18][D][4] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:18:32 113803 [44606960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:18:32 113807 [44606960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x4 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:18:32 113820 [44606960] -> Capabilities Mask: > Mar 20 14:18:32 113845 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:18:32 113907 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x78a4f > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][15][4][4] > Return path: [0][9][18][D][4] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:18:32 113958 [41E02960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:18:32 113963 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:18:32 113969 [41E02960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x4 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:18:32 113986 [41E02960] -> Capabilities Mask: > Mar 20 14:18:32 113992 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x78a50 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][15][4][4] > Return path: [0][9][18][D][4] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:18:32 114052 [45A08960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:18:32 114066 [45A08960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x4 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:18:32 114089 [45A08960] -> Capabilities Mask: > Mar 20 14:18:32 114052 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:18:32 114171 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x78a51 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][13][1][6] > Return path: [0][9][13][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:18:32 114224 [42803960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:18:32 114228 [42803960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:18:32 114242 [42803960] -> Capabilities Mask: > Mar 20 14:18:32 119326 [42803960] -> SUBNET UP > Mar 20 14:23:02 506774 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000019 > Mar 20 14:23:02 507064 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:23:02 861642 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:23:02 861653 [43204960] -> Discovered new port with > GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:Topspin IB-DC > Mar 20 14:23:02 893030 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:23:02 943693 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:23:02 943765 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x5 > trans_id................0x79aff > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x1 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][5][18] > Return path: [0][9][18][D][2][13] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 13 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 2C 4C 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:23:02 943854 [43C05960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:23:02 943870 [43C05960] -> PortInfo dump: > port number.............0x1 > node_guid...............0x0005ad0000027c84 > port_guid...............0x0005ad0000027c84 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x13 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0x2C > vl_enforce..............0x4C > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:23:02 943886 [43C05960] -> Capabilities Mask: > Mar 20 14:23:02 948898 [43C05960] -> SUBNET UP > Mar 20 14:23:03 237496 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x00AF > TID:0x0000000000000000 > Mar 20 14:23:03 237710 [42803960] -> osm_report_notice: Reporting Generic > Notice type:4 num:144 from LID:0x00AF > GID:0xfe80000000000000,0x0005ad0000024b27 > Mar 20 14:23:03 605548 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:23:03 662757 [41401960] -> SUBNET UP > Mar 20 14:24:54 675782 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000079 > Mar 20 14:24:54 676077 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:24:54 677026 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000064 > Mar 20 14:24:54 677118 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:24:55 047478 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 047501 [43204960] -> Removed port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:24:55 047520 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 047525 [43204960] -> Removed port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:24:55 047541 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 047546 [43204960] -> Removed port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 20 14:24:55 047563 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 047569 [43204960] -> Removed port with > GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:Topspin IB-DC > Mar 20 14:24:55 047586 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 047591 [43204960] -> Removed port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:24:55 047607 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 047612 [43204960] -> Removed port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:24:55 047630 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 047635 [43204960] -> Removed port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 20 14:24:55 047652 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 047657 [43204960] -> Removed port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:24:55 047798 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 047803 [43204960] -> Removed port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 20 14:24:55 047836 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 047842 [43204960] -> Removed port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:24:55 047857 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 047862 [43204960] -> Removed port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 20 14:24:55 047877 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 047882 [43204960] -> Removed port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:24:55 047896 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 047902 [43204960] -> Removed port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 20 14:24:55 047918 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 047923 [43204960] -> Removed port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:24:55 047939 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 047988 [43204960] -> Removed port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:24:55 048005 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 048010 [43204960] -> Removed port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:24:55 048025 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:24:55 048030 [43204960] -> Removed port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 20 14:24:55 081006 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:24:55 130875 [45A08960] -> SUBNET UP > Mar 20 14:24:55 484995 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:24:55 535902 [42803960] -> SUBNET UP > Mar 20 14:25:48 653788 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000065 > Mar 20 14:25:48 654009 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:25:48 659749 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000007a > Mar 20 14:25:48 659790 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000066 > Mar 20 14:25:48 659814 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:25:48 659963 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:25:48 665972 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000007b > Mar 20 14:25:48 666050 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:25:49 025384 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025396 [41E02960] -> Discovered new port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 20 14:25:49 025401 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025406 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > Mar 20 14:25:49 025410 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025416 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 20 14:25:49 025420 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025425 [41E02960] -> Discovered new port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:25:49 025428 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025433 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:25:49 025437 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025442 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:25:49 025446 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025451 [41E02960] -> Discovered new port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:25:49 025461 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025466 [41E02960] -> Discovered new port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:25:49 025470 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025475 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:25:49 025483 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025488 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 20 14:25:49 025491 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025496 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 20 14:25:49 025500 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025505 [41E02960] -> Discovered new port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 20 14:25:49 025508 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025513 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:25:49 025517 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025522 [41E02960] -> Discovered new port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 20 14:25:49 025556 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025562 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:25:49 025565 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025570 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:25:49 025574 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:25:49 025579 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:25:49 056324 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:25:49 126247 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:25:49 126356 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x7d165 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x13 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][15][1][6] > Return path: [0][9][18][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:25:49 126409 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:25:49 126442 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x7d166 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][15][1][6] > Return path: [0][9][18][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:25:49 126496 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:25:49 126489 [42803960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:25:49 126535 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x7d167 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][15][1][6] > Return path: [0][9][18][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:25:49 126526 [42803960] -> PortInfo dump: > port number.............0x13 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:25:49 126567 [42803960] -> Capabilities Mask: > Mar 20 14:25:49 126613 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:25:49 126617 [42803960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:25:49 126658 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x7d168 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][15][1][6] > Return path: [0][9][18][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:25:49 126653 [42803960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:25:49 126687 [42803960] -> Capabilities Mask: > Mar 20 14:25:49 126703 [43204960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:25:49 126709 [43204960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:25:49 126744 [43204960] -> Capabilities Mask: > Mar 20 14:25:49 126765 [43C05960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:25:49 126770 [43C05960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:25:49 126874 [43C05960] -> Capabilities Mask: > Mar 20 14:25:49 126975 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:25:49 127015 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x7d169 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][13][1][6] > Return path: [0][9][13][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:25:49 127066 [45A08960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:25:49 127072 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:25:49 127084 [45A08960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:25:49 127103 [45A08960] -> Capabilities Mask: > Mar 20 14:25:49 127121 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x7d16a > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][13][1][6] > Return path: [0][9][13][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:25:49 127188 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:25:49 127220 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x7d16b > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][13][1][6] > Return path: [0][9][13][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:25:49 127326 [44606960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:25:49 127339 [44606960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:25:49 127357 [44606960] -> Capabilities Mask: > Mar 20 14:25:49 127378 [45007960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:25:49 127397 [45007960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:25:49 127410 [45007960] -> Capabilities Mask: > Mar 20 14:25:49 132961 [43204960] -> SUBNET UP > Mar 20 14:25:49 523879 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:25:49 580522 [42803960] -> SUBNET UP > Mar 20 14:26:04 718574 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000000 > Mar 20 14:26:04 718819 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:26:04 836781 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000001 > Mar 20 14:26:04 836881 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:26:04 858762 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000002 > Mar 20 14:26:04 860242 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:26:04 997451 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000003 > Mar 20 14:26:04 997647 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:26:05 180722 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000004 > Mar 20 14:26:05 180855 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:26:05 209122 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000005 > Mar 20 14:26:05 209200 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:26:05 347419 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000006 > Mar 20 14:26:05 347488 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:26:05 378670 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000007 > Mar 20 14:26:05 378739 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:26:05 409112 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:05 409121 [41401960] -> Removed port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:26:05 443639 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:26:05 483503 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000008 > Mar 20 14:26:05 486002 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:26:05 499183 [44606960] -> SUBNET UP > Mar 20 14:26:05 499856 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000009 > Mar 20 14:26:05 499941 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:26:05 521857 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000a > Mar 20 14:26:05 521971 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:26:05 532569 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000b > Mar 20 14:26:05 532624 [41401960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 11 times consecutively > Mar 20 14:26:05 633813 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000c > Mar 20 14:26:05 633869 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 12 times consecutively > Mar 20 14:26:05 655421 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000d > Mar 20 14:26:05 655501 [41401960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 13 times consecutively > Mar 20 14:26:05 702652 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000e > Mar 20 14:26:05 702745 [42803960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 14 times consecutively > Mar 20 14:26:05 817201 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:05 817216 [43204960] -> Removed port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:26:05 817235 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:05 817241 [43204960] -> Removed port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:26:05 817259 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:05 817264 [43204960] -> Removed port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:26:05 821322 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:05 821330 [41E02960] -> Discovered new port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:26:05 847950 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000f > Mar 20 14:26:05 848031 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 15 times consecutively > Mar 20 14:26:05 852036 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:26:05 893954 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000010 > Mar 20 14:26:05 894021 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 16 times consecutively > Mar 20 14:26:05 910489 [44606960] -> SUBNET UP > Mar 20 14:26:05 999993 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000011 > Mar 20 14:26:06 000039 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 17 times consecutively > Mar 20 14:26:06 021880 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000012 > Mar 20 14:26:06 021970 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 18 times consecutively > Mar 20 14:26:06 043912 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000013 > Mar 20 14:26:06 044001 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 19 times consecutively > Mar 20 14:26:06 052878 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000014 > Mar 20 14:26:06 052975 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 20 times consecutively > Mar 20 14:26:06 147560 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000015 > Mar 20 14:26:06 147616 [42803960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 21 times consecutively > Mar 20 14:26:06 158945 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000016 > Mar 20 14:26:06 158978 [41401960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 22 times consecutively > Mar 20 14:26:06 346046 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000017 > Mar 20 14:26:06 346106 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 23 times consecutively > Mar 20 14:26:06 405311 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000018 > Mar 20 14:26:06 405349 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 24 times consecutively > Mar 20 14:26:06 632882 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000019 > Mar 20 14:26:06 632923 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 25 times consecutively > Mar 20 14:26:06 634031 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000067 > Mar 20 14:26:06 634110 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:26:06 883831 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001a > Mar 20 14:26:06 883879 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 26 times consecutively > Mar 20 14:26:06 885475 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000068 > Mar 20 14:26:06 885560 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:26:06 982877 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001b > Mar 20 14:26:06 982926 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 27 times consecutively > Mar 20 14:26:06 992809 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000069 > Mar 20 14:26:06 992871 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:26:06 992909 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001c > Mar 20 14:26:06 992943 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 28 times consecutively > Mar 20 14:26:06 993058 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:06 993065 [41E02960] -> Discovered new port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:26:06 993069 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:06 993074 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:26:07 023890 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:26:07 085081 [41E02960] -> SUBNET UP > Mar 20 14:26:07 348105 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001d > Mar 20 14:26:07 348218 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 29 times consecutively > Mar 20 14:26:07 348958 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000006a > Mar 20 14:26:07 349041 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:26:07 540871 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000006b > Mar 20 14:26:07 540983 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:26:07 541063 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000007c > Mar 20 14:26:07 541131 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:26:07 585394 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000006c > Mar 20 14:26:07 585464 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:26:07 607406 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000006d > Mar 20 14:26:07 607486 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:26:07 850410 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000006e > Mar 20 14:26:07 850483 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:26:07 956365 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000007d > Mar 20 14:26:07 956404 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:07 956413 [42803960] -> Discovered new port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:26:07 987136 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:26:08 018887 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:26:08 032634 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:26:08 032679 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x813ce > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][12][4][5] > Return path: [0][9][14][D][5] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 05 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:26:08 032749 [41E02960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:26:08 032757 [41E02960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x5 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:26:08 032774 [41E02960] -> Capabilities Mask: > Mar 20 14:26:08 033119 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:26:08 033154 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x813cf > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][12][4][5] > Return path: [0][9][14][D][5] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 05 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:26:08 033202 [43C05960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:26:08 033213 [43C05960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x5 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:26:08 033231 [43C05960] -> Capabilities Mask: > Mar 20 14:26:08 038497 [45A08960] -> SUBNET UP > Mar 20 14:26:08 055480 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000007e > Mar 20 14:26:08 055587 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:26:08 372288 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000007f > Mar 20 14:26:08 376158 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:26:08 418607 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000080 > Mar 20 14:26:08 420668 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:26:08 420714 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:26:08 430046 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:26:08 430147 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x820fa > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][15][1][4] > Return path: [0][9][18][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:26:08 430236 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:26:08 430236 [43C05960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:26:08 430267 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x820fb > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][12][1][6] > Return path: [0][9][14][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:26:08 430262 [43C05960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:26:08 430286 [43C05960] -> Capabilities Mask: > Mar 20 14:26:08 430350 [43C05960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:26:08 430362 [43C05960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:26:08 430375 [43C05960] -> Capabilities Mask: > Mar 20 14:26:08 435317 [43C05960] -> SUBNET UP > Mar 20 14:26:08 583769 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001e > Mar 20 14:26:08 583903 [41401960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 30 times consecutively > Mar 20 14:26:08 854841 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:26:08 913273 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:26:08 913349 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x82e32 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x13 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][17][2][5] > Return path: [0][9][14][E][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:26:08 913415 [45A08960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:26:08 913432 [45A08960] -> PortInfo dump: > port number.............0x13 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:26:08 913449 [45A08960] -> Capabilities Mask: > Mar 20 14:26:08 913598 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:26:08 913676 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x82e33 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][17][2][5] > Return path: [0][9][14][E][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:26:08 913727 [43C05960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:26:08 913732 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:26:08 913734 [43C05960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:26:08 913752 [43C05960] -> Capabilities Mask: > Mar 20 14:26:08 913766 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x82e34 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][17][2][5] > Return path: [0][9][14][E][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:26:08 913828 [41E02960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:26:08 913833 [41E02960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:26:08 913848 [41E02960] -> Capabilities Mask: > Mar 20 14:26:08 918887 [41E02960] -> SUBNET UP > Mar 20 14:26:48 657517 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000006f > Mar 20 14:26:48 657779 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:26:48 658393 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000081 > Mar 20 14:26:48 658465 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:26:48 979610 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 979629 [41401960] -> Removed port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:26:48 979652 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 979660 [41401960] -> Removed port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:26:48 979682 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 979688 [41401960] -> Removed port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 20 14:26:48 979721 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 979727 [41401960] -> Removed port with > GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > Mar 20 14:26:48 979770 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 979782 [41401960] -> Removed port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:26:48 979799 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 979804 [41401960] -> Removed port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:26:48 979822 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 979827 [41401960] -> Removed port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 20 14:26:48 979845 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 979849 [41401960] -> Removed port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:26:48 980028 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 980033 [41401960] -> Removed port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 20 14:26:48 980061 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 980066 [41401960] -> Removed port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:26:48 980081 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 980087 [41401960] -> Removed port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 20 14:26:48 980102 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 980107 [41401960] -> Removed port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:26:48 980122 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 980127 [41401960] -> Removed port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 20 14:26:48 980143 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 980148 [41401960] -> Removed port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:26:48 980163 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 980239 [41401960] -> Removed port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:26:48 980256 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 980261 [41401960] -> Removed port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:26:48 980365 [41401960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:26:48 980371 [41401960] -> Removed port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 20 14:26:49 013365 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:26:49 065887 [43C05960] -> SUBNET UP > Mar 20 14:26:49 407010 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:26:49 459477 [44606960] -> SUBNET UP > Mar 20 14:27:42 754098 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000070 > Mar 20 14:27:42 754349 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:27:42 760115 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000071 > Mar 20 14:27:42 760178 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000082 > Mar 20 14:27:42 760236 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:27:42 760406 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:27:42 766931 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000083 > Mar 20 14:27:42 767049 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:27:43 085327 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085345 [43C05960] -> Discovered new port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 20 14:27:43 085349 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085355 [43C05960] -> Discovered new port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:27:43 085359 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085364 [43C05960] -> Discovered new port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:27:43 085368 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085373 [43C05960] -> Discovered new port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:27:43 085377 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085382 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > Mar 20 14:27:43 085386 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085390 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 20 14:27:43 085394 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085399 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:27:43 085403 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085407 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 20 14:27:43 085411 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085416 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:27:43 085420 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085425 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:27:43 085428 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085433 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:27:43 085437 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085442 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 20 14:27:43 085446 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085450 [43C05960] -> Discovered new port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 20 14:27:43 085454 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085459 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:27:43 085511 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085517 [43C05960] -> Discovered new port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 20 14:27:43 085521 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085526 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:27:43 085530 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:43 085534 [43C05960] -> Discovered new port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:27:43 116308 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:27:43 179935 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:27:43 179980 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x85669 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x13 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][16][1][4] > Return path: [0][9][13][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:27:43 180019 [41401960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:27:43 180037 [41401960] -> PortInfo dump: > port number.............0x13 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:27:43 180050 [41401960] -> Capabilities Mask: > Mar 20 14:27:43 180092 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:27:43 180137 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x8566a > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][16][1][4] > Return path: [0][9][13][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:27:43 180185 [44606960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:27:43 180189 [44606960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:27:43 180199 [44606960] -> Capabilities Mask: > Mar 20 14:27:43 180239 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:27:43 180263 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x8566b > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][16][1][4] > Return path: [0][9][13][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:27:43 180307 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:27:43 180319 [42803960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:27:43 180332 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x8566c > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][16][1][4] > Return path: [0][9][13][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:27:43 180336 [42803960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:27:43 180364 [42803960] -> Capabilities Mask: > Mar 20 14:27:43 180389 [42803960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:27:43 180410 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:27:43 180392 [42803960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:27:43 180415 [42803960] -> Capabilities Mask: > Mar 20 14:27:43 180436 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x8566d > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][2][5] > Return path: [0][9][18][E][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:27:43 180490 [41E02960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:27:43 180494 [41E02960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:27:43 180504 [41E02960] -> Capabilities Mask: > Mar 20 14:27:43 180536 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:27:43 180560 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x8566e > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][2][5] > Return path: [0][9][18][E][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:27:43 180606 [45007960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:27:43 180615 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:27:43 180634 [45007960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:27:43 180657 [45007960] -> Capabilities Mask: > Mar 20 14:27:43 180678 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x8566f > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][2][5] > Return path: [0][9][18][E][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:27:43 180769 [43C05960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:27:43 180775 [43C05960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:27:43 180789 [43C05960] -> Capabilities Mask: > Mar 20 14:27:43 186228 [43204960] -> SUBNET UP > Mar 20 14:27:43 557268 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:27:43 611082 [45A08960] -> SUBNET UP > Mar 20 14:27:58 852744 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000000 > Mar 20 14:27:58 852982 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:27:58 970772 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000001 > Mar 20 14:27:58 970864 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:27:58 992628 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000002 > Mar 20 14:27:58 992712 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:27:59 132331 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000003 > Mar 20 14:27:59 132484 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:27:59 314893 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000004 > Mar 20 14:27:59 315006 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:27:59 343241 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000005 > Mar 20 14:27:59 343320 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:27:59 481698 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000006 > Mar 20 14:27:59 481775 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:27:59 512746 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000007 > Mar 20 14:27:59 512853 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:27:59 548851 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:59 548861 [41E02960] -> Removed port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:27:59 583414 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:27:59 583817 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000008 > Mar 20 14:27:59 623971 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:27:59 626182 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000009 > Mar 20 14:27:59 626329 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:27:59 634080 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000a > Mar 20 14:27:59 634442 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:27:59 641962 [45A08960] -> SUBNET UP > Mar 20 14:27:59 656231 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000b > Mar 20 14:27:59 656307 [41401960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 11 times consecutively > Mar 20 14:27:59 689788 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000c > Mar 20 14:27:59 690249 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 12 times consecutively > Mar 20 14:27:59 758521 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000d > Mar 20 14:27:59 758646 [42803960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 13 times consecutively > Mar 20 14:27:59 970740 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000e > Mar 20 14:27:59 970812 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 14 times consecutively > Mar 20 14:27:59 985557 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:59 985577 [41E02960] -> Removed port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:27:59 985601 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:59 985615 [41E02960] -> Removed port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:27:59 985649 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:59 985656 [41E02960] -> Removed port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:27:59 989767 [42803960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:27:59 989787 [42803960] -> Discovered new port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:28:00 014445 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000f > Mar 20 14:28:00 014524 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 15 times consecutively > Mar 20 14:28:00 020896 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:28:00 086824 [43204960] -> SUBNET UP > Mar 20 14:28:00 124057 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000010 > Mar 20 14:28:00 124108 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 16 times consecutively > Mar 20 14:28:00 131596 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000011 > Mar 20 14:28:00 131643 [41401960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 17 times consecutively > Mar 20 14:28:00 412484 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000012 > Mar 20 14:28:00 412528 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 18 times consecutively > Mar 20 14:28:00 436877 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000013 > Mar 20 14:28:00 436921 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 19 times consecutively > Mar 20 14:28:00 458745 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000014 > Mar 20 14:28:00 458816 [42803960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 20 times consecutively > Mar 20 14:28:00 480551 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000015 > Mar 20 14:28:00 480599 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 21 times consecutively > Mar 20 14:28:00 695340 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000016 > Mar 20 14:28:00 695386 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 22 times consecutively > Mar 20 14:28:00 695726 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000072 > Mar 20 14:28:00 695886 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:28:00 719764 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000017 > Mar 20 14:28:00 719825 [41401960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 23 times consecutively > Mar 20 14:28:00 743680 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000018 > Mar 20 14:28:00 743775 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 24 times consecutively > Mar 20 14:28:00 763599 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000019 > Mar 20 14:28:00 763654 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 25 times consecutively > Mar 20 14:28:00 813393 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001a > Mar 20 14:28:00 813473 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 26 times consecutively > Mar 20 14:28:00 831287 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001b > Mar 20 14:28:00 831302 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000073 > Mar 20 14:28:00 831383 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 27 times consecutively > Mar 20 14:28:00 831424 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:28:00 841593 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001c > Mar 20 14:28:00 841644 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 28 times consecutively > Mar 20 14:28:01 050511 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:28:01 050529 [41E02960] -> Discovered new port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:28:01 050535 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:28:01 050542 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:28:01 050547 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:28:01 050554 [41E02960] -> Discovered new port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:28:01 081322 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:28:01 142873 [43204960] -> SUBNET UP > Mar 20 14:28:01 460275 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001d > Mar 20 14:28:01 460358 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 29 times consecutively > Mar 20 14:28:01 488474 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:28:01 538634 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:28:01 538712 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x898d1 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][6] > Return path: [0][9][18][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:28:01 538752 [42803960] -> osm_pi_rcv_process_set: ERR 0F10: > Received error status for SetResp() > Mar 20 14:28:01 538758 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:28:01 538767 [42803960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............DOWN > state_info2.............0x42 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:28:01 538795 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x898d2 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][6] > Return path: [0][9][18][D][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:28:01 538810 [42803960] -> Capabilities Mask: > Mar 20 14:28:01 538849 [42803960] -> osm_pi_rcv_process_set: ERR 0F10: > Received error status for SetResp() > Mar 20 14:28:01 538856 [42803960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............DOWN > state_info2.............0x42 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:28:01 538871 [42803960] -> Capabilities Mask: > Mar 20 14:28:01 539658 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:28:01 539696 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x5 > trans_id................0x898d3 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x11 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][15][1][4][17] > Return path: [0][9][18][D][1][16] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 16 02 03 02 > > 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:28:01 539778 [45A08960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:28:01 539784 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:28:01 539798 [45A08960] -> PortInfo dump: > port number.............0x11 > node_guid...............0x0005ad0000027c84 > port_guid...............0x0005ad0000027c84 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x16 > link_width_enabled......0x2 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............DOWN > state_info2.............0x42 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:28:01 539834 [45A08960] -> Capabilities Mask: > Mar 20 14:28:01 539844 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x5 > trans_id................0x898d4 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x12 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][15][1][4][17] > Return path: [0][9][18][D][1][16] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 16 02 03 02 > > 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:28:01 539903 [45007960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:28:01 539908 [45007960] -> PortInfo dump: > port number.............0x12 > node_guid...............0x0005ad0000027c84 > port_guid...............0x0005ad0000027c84 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x16 > link_width_enabled......0x2 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............DOWN > state_info2.............0x42 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:28:01 539924 [45007960] -> Capabilities Mask: > Mar 20 14:28:01 545091 [45007960] -> SUBNET UP > Mar 20 14:28:01 652647 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000084 > Mar 20 14:28:01 652864 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:28:01 879631 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000074 > Mar 20 14:28:01 880104 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:28:01 962839 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000075 > Mar 20 14:28:01 965155 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:28:02 006432 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000085 > Mar 20 14:28:02 030610 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:28:02 068999 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:28:02 081130 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:28:02 081198 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x8a604 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][4][4] > Return path: [0][9][18][D][4] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:28:02 081249 [43204960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:28:02 081257 [43204960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x4 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:28:02 081275 [43204960] -> Capabilities Mask: > Mar 20 14:28:02 081650 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:28:02 081713 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x8a605 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][4][4] > Return path: [0][9][18][D][4] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:28:02 081782 [43C05960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:28:02 081787 [43C05960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x4 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:28:02 081802 [43C05960] -> Capabilities Mask: > Mar 20 14:28:02 087435 [45A08960] -> SUBNET UP > Mar 20 14:28:02 170696 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000086 > Mar 20 14:28:02 170819 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:28:02 459228 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:28:02 500761 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000087 > Mar 20 14:28:02 500979 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:28:02 510190 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:28:02 510258 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x8b330 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][17][1][5] > Return path: [0][9][14][D][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:28:02 510366 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:28:02 510370 [45007960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:28:02 510384 [45007960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:28:02 510394 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x8b331 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][3][6] > Return path: [0][9][18][F][3] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:28:02 510398 [45007960] -> Capabilities Mask: > Mar 20 14:28:02 510481 [41401960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:28:02 510491 [41401960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x3 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:28:02 510509 [41401960] -> Capabilities Mask: > Mar 20 14:28:02 510511 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000088 > Mar 20 14:28:02 515576 [41401960] -> SUBNET UP > Mar 20 14:28:02 515695 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:28:02 532552 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000089 > Mar 20 14:28:02 538569 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:28:02 695997 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001e > Mar 20 14:28:02 696096 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 30 times consecutively > Mar 20 14:28:02 918226 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:28:02 975244 [43204960] -> SUBNET UP > Mar 20 14:28:03 325494 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:28:03 379145 [41401960] -> SUBNET UP > Mar 20 14:29:12 561841 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F > TID:0x0000000000000019 > Mar 20 14:29:12 562033 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001F > GID:0xfe80000000000000,0x0005ad0000027c56 > Mar 20 14:29:12 562751 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F > TID:0x000000000000001a > Mar 20 14:29:12 562902 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001F > GID:0xfe80000000000000,0x0005ad0000027c56 > Mar 20 14:29:12 571346 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0084 > TID:0x0000000000000018 > Mar 20 14:29:12 571569 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0084 > GID:0xfe80000000000000,0x0005ad0000027c70 > Mar 20 14:29:12 914371 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F > TID:0x000000000000001b > Mar 20 14:29:12 916287 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:12 916297 [44606960] -> Discovered new port with > GUID:0x0005ad000002502f LID range [0x2,0x2] of node:Topspin IB-DC > Mar 20 14:29:12 946985 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:29:12 976839 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001F > GID:0xfe80000000000000,0x0005ad0000027c56 > Mar 20 14:29:12 987963 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:29:12 988004 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x5 > trans_id................0x8dbb2 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0xD > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][2][4][D] > Return path: [0][9][18][E][1][12] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 12 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 2C 4C 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:29:12 988078 [43C05960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:29:12 988089 [43C05960] -> PortInfo dump: > port number.............0xD > node_guid...............0x0005ad0000027c70 > port_guid...............0x0005ad0000027c70 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x12 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0x2C > vl_enforce..............0x4C > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:29:12 988105 [43C05960] -> Capabilities Mask: > Mar 20 14:29:12 993136 [44606960] -> SUBNET UP > Mar 20 14:29:13 300755 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x0002 > TID:0x0000000000000000 > Mar 20 14:29:13 300874 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:4 num:144 from LID:0x0002 > GID:0xfe80000000000000,0x0005ad000002502f > Mar 20 14:29:13 338077 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:13 338099 [41E02960] -> Discovered new port with > GUID:0x0005ad000002516f LID range [0xBA,0xBA] of node:Topspin IB-DC > Mar 20 14:29:13 368879 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:29:13 431763 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:29:13 431806 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x5 > trans_id................0x8e8e9 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0xA > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][14][1][6][12] > Return path: [0][9][15][D][3][11] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 11 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 2C 4C 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:29:13 432093 [43204960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:29:13 432116 [43204960] -> PortInfo dump: > port number.............0xA > node_guid...............0x0005ad0000027c56 > port_guid...............0x0005ad0000027c56 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x11 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0x2C > vl_enforce..............0x4C > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:29:13 432135 [43204960] -> Capabilities Mask: > Mar 20 14:29:13 437219 [45007960] -> SUBNET UP > Mar 20 14:29:13 690992 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x00BA > TID:0x0000000000000000 > Mar 20 14:29:13 691128 [42803960] -> osm_report_notice: Reporting Generic > Notice type:4 num:144 from LID:0x00BA > GID:0xfe80000000000000,0x0005ad000002516f > Mar 20 14:29:13 835017 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:29:13 891082 [42803960] -> SUBNET UP > Mar 20 14:29:14 235714 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:29:14 289127 [41E02960] -> SUBNET UP > Mar 20 14:29:17 689267 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000008a > Mar 20 14:29:17 689511 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:29:17 689975 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000076 > Mar 20 14:29:17 690097 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:29:18 025237 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025255 [44606960] -> Removed port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:29:18 025273 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025279 [44606960] -> Removed port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:29:18 025296 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025300 [44606960] -> Removed port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 20 14:29:18 025317 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025323 [44606960] -> Removed port with > GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > Mar 20 14:29:18 025340 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025345 [44606960] -> Removed port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:29:18 025362 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025367 [44606960] -> Removed port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:29:18 025385 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025390 [44606960] -> Removed port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 20 14:29:18 025406 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025411 [44606960] -> Removed port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:29:18 025571 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025576 [44606960] -> Removed port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 20 14:29:18 025612 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025619 [44606960] -> Removed port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:29:18 025634 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025639 [44606960] -> Removed port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 20 14:29:18 025655 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025660 [44606960] -> Removed port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:29:18 025678 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025683 [44606960] -> Removed port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 20 14:29:18 025700 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025705 [44606960] -> Removed port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:29:18 025721 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025777 [44606960] -> Removed port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:29:18 025794 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025800 [44606960] -> Removed port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:29:18 025816 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:29:18 025821 [44606960] -> Removed port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 20 14:29:18 058968 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:29:18 112970 [43C05960] -> SUBNET UP > Mar 20 14:29:18 511156 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:29:18 573846 [41E02960] -> SUBNET UP > Mar 20 14:30:11 599965 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000077 > Mar 20 14:30:11 600182 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:30:11 606044 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000078 > Mar 20 14:30:11 606078 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000008b > Mar 20 14:30:11 606178 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:30:11 606207 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:30:11 612375 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000008c > Mar 20 14:30:11 612499 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:30:11 947057 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947074 [45007960] -> Discovered new port with > GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > Mar 20 14:30:11 947079 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947084 [45007960] -> Discovered new port with > GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > Mar 20 14:30:11 947088 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947093 [45007960] -> Discovered new port with > GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > Mar 20 14:30:11 947097 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947102 [45007960] -> Discovered new port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:30:11 947106 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947118 [45007960] -> Discovered new port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:30:11 947138 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947143 [45007960] -> Discovered new port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:30:11 947148 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947153 [45007960] -> Discovered new port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:30:11 947157 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947162 [45007960] -> Discovered new port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:30:11 947166 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947170 [45007960] -> Discovered new port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:30:11 947174 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947179 [45007960] -> Discovered new port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:30:11 947183 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947188 [45007960] -> Discovered new port with > GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > Mar 20 14:30:11 947191 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947196 [45007960] -> Discovered new port with > GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > Mar 20 14:30:11 947200 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947205 [45007960] -> Discovered new port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 20 14:30:11 947209 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947214 [45007960] -> Discovered new port with > GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > Mar 20 14:30:11 947282 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947288 [45007960] -> Discovered new port with > GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > Mar 20 14:30:11 947291 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947296 [45007960] -> Discovered new port with > GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > Mar 20 14:30:11 947300 [45007960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:11 947305 [45007960] -> Discovered new port with > GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > Mar 20 14:30:11 978149 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:30:12 042474 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:30:12 042577 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x92b38 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x13 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][16][1][5] > Return path: [0][9][13][D][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:30:12 042668 [45007960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:30:12 042676 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:30:12 042682 [45007960] -> PortInfo dump: > port number.............0x13 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:30:12 042701 [45007960] -> Capabilities Mask: > Mar 20 14:30:12 042714 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x92b39 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][16][1][5] > Return path: [0][9][13][D][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:30:12 042845 [41401960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:30:12 042856 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:30:12 042851 [41401960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:30:12 042897 [41401960] -> Capabilities Mask: > Mar 20 14:30:12 042907 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x92b3a > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][16][1][5] > Return path: [0][9][13][D][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:30:12 043013 [43204960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:30:12 043015 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:30:12 043038 [43204960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:30:12 043090 [43204960] -> Capabilities Mask: > Mar 20 14:30:12 043094 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x92b3b > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][16][1][5] > Return path: [0][9][13][D][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:30:12 043173 [44606960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:30:12 043178 [44606960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:30:12 043191 [44606960] -> Capabilities Mask: > Mar 20 14:30:12 043222 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:30:12 043247 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x92b3c > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][12][1][4] > Return path: [0][9][14][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:30:12 043318 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:30:12 043314 [41E02960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:30:12 043357 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x92b3d > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][12][1][4] > Return path: [0][9][14][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:30:12 043367 [41E02960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:30:12 043422 [41E02960] -> Capabilities Mask: > Mar 20 14:30:12 043513 [43C05960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:30:12 043518 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:30:12 043519 [43C05960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:30:12 043535 [43C05960] -> Capabilities Mask: > Mar 20 14:30:12 043553 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x92b3e > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][12][1][4] > Return path: [0][9][14][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:30:12 043658 [42803960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:30:12 043663 [42803960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:30:12 043678 [42803960] -> Capabilities Mask: > Mar 20 14:30:12 049088 [43204960] -> SUBNET UP > Mar 20 14:30:12 442903 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:30:12 497312 [45007960] -> SUBNET UP > Mar 20 14:30:27 571421 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000000 > Mar 20 14:30:27 571674 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:30:27 782498 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000001 > Mar 20 14:30:27 782616 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:30:27 804302 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000002 > Mar 20 14:30:27 805103 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:30:27 924983 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000003 > Mar 20 14:30:27 925088 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:30:27 934314 [43204960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:27 934327 [43204960] -> Removed port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:30:27 969077 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:30:28 017451 [41E02960] -> SUBNET UP > Mar 20 14:30:28 030947 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000004 > Mar 20 14:30:28 031177 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:30:28 120040 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000005 > Mar 20 14:30:28 120190 [42803960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:30:28 148805 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000006 > Mar 20 14:30:28 149108 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:30:28 170453 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000007 > Mar 20 14:30:28 170971 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:30:28 336861 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:28 336884 [43C05960] -> Removed port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:30:28 336910 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:28 336916 [43C05960] -> Removed port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:30:28 336945 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:28 336951 [43C05960] -> Removed port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:30:28 371497 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:30:28 410709 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000008 > Mar 20 14:30:28 410894 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:30:28 415926 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000009 > Mar 20 14:30:28 419624 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:30:28 426978 [45A08960] -> SUBNET UP > Mar 20 14:30:28 438003 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000a > Mar 20 14:30:28 438182 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0152 > GID:0xfe80000000000000,0x0005ad0000027c84 > Mar 20 14:30:28 470141 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000b > Mar 20 14:30:28 470197 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 11 times consecutively > Mar 20 14:30:28 652535 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000c > Mar 20 14:30:28 652623 [42803960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 12 times consecutively > Mar 20 14:30:28 681514 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000d > Mar 20 14:30:28 681636 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 13 times consecutively > Mar 20 14:30:28 703052 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000e > Mar 20 14:30:28 703092 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 14 times consecutively > Mar 20 14:30:28 724753 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000000f > Mar 20 14:30:28 724809 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 15 times consecutively > Mar 20 14:30:28 855519 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000010 > Mar 20 14:30:28 855671 [42803960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 16 times consecutively > Mar 20 14:30:28 877289 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000011 > Mar 20 14:30:28 877354 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 17 times consecutively > Mar 20 14:30:28 899021 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000012 > Mar 20 14:30:28 899062 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 18 times consecutively > Mar 20 14:30:29 006886 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000013 > Mar 20 14:30:29 006950 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 19 times consecutively > Mar 20 14:30:29 099965 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000014 > Mar 20 14:30:29 100020 [44606960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 20 times consecutively > Mar 20 14:30:29 146532 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000015 > Mar 20 14:30:29 146578 [41E02960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 21 times consecutively > Mar 20 14:30:29 356891 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000016 > Mar 20 14:30:29 356938 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 22 times consecutively > Mar 20 14:30:29 383112 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000017 > Mar 20 14:30:29 383157 [43204960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 23 times consecutively > Mar 20 14:30:29 383710 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x0000000000000079 > Mar 20 14:30:29 383790 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:30:29 407890 [42803960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000018 > Mar 20 14:30:29 407935 [42803960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 24 times consecutively > Mar 20 14:30:29 429653 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x0000000000000019 > Mar 20 14:30:29 429700 [45A08960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 25 times consecutively > Mar 20 14:30:29 451352 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001a > Mar 20 14:30:29 451401 [45007960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 26 times consecutively > Mar 20 14:30:29 479843 [4780B960] -> umad_receiver: ERR 5409: send completed > with error (method=0x1 attr=0x11 trans_id=0x124ef00095cbf) -- dropping > Mar 20 14:30:29 479855 [4780B960] -> umad_receiver: ERR 5411: DR SMP > Mar 20 14:30:29 479865 [4780B960] -> __osm_sm_mad_ctrl_send_err_cb: ERR > 3113: MAD completed in error (IB_TIMEOUT) > Mar 20 14:30:29 479901 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x1 (SubnGet) > D bit...................0x0 > status..................0x0 > hop_ptr.................0x0 > hop_count...............0x6 > trans_id................0x95cbf > attr_id.................0x11 (NodeInfo) > resv....................0x0 > attr_mod................0x0 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][5][17][C] > Return path: [0][0][0][0][0][0][0] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:30:29 480017 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:29 480030 [44606960] -> Removed port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:30:29 480092 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:29 480099 [44606960] -> Removed port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 20 14:30:29 480121 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:29 480128 [44606960] -> Removed port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:30:29 480152 [44606960] -> osm_report_notice: Reporting Generic > Notice type:3 num:65 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:29 480160 [44606960] -> Removed port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:30:29 480325 [44606960] -> osm_drop_mgr_process: ERR 0108: Unknown > remote side for node 0x0005ad0000027c84 port 12. Adding to light sweep > sampling list > Mar 20 14:30:29 480343 [44606960] -> Directed Path Dump of 5 hop path: > Path = [0][1][11][1][5][17] > Mar 20 14:30:29 665327 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000007a > Mar 20 14:30:29 665355 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001b > Mar 20 14:30:29 665397 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:30:29 665404 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 27 times consecutively > Mar 20 14:30:29 680658 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:29 680668 [45A08960] -> Discovered new port with > GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > Mar 20 14:30:29 680672 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:29 680678 [45A08960] -> Discovered new port with > GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > Mar 20 14:30:29 680681 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:29 680686 [45A08960] -> Discovered new port with > GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > Mar 20 14:30:29 680690 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:29 680695 [45A08960] -> Discovered new port with > GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > Mar 20 14:30:29 711542 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:30:29 768280 [41401960] -> SUBNET UP > Mar 20 14:30:30 113195 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:30 113206 [45A08960] -> Discovered new port with > GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > Mar 20 14:30:30 113211 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:30 113216 [45A08960] -> Discovered new port with > GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > Mar 20 14:30:30 113220 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:30 113225 [45A08960] -> Discovered new port with > GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > Mar 20 14:30:30 113228 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:3 num:64 from LID:0x0092 > GID:0xfe80000000000000,0x0005ad0000024bbb > Mar 20 14:30:30 113233 [45A08960] -> Discovered new port with > GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > Mar 20 14:30:30 144149 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:30:30 195765 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:30:30 195850 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x96dcd > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x16 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][14][2][4] > Return path: [0][9][15][E][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:30:30 195929 [43C05960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:30:30 195942 [43C05960] -> PortInfo dump: > port number.............0x16 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:30:30 195968 [43C05960] -> Capabilities Mask: > Mar 20 14:30:30 196144 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:30:30 196179 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x96dce > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x17 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][14][2][4] > Return path: [0][9][15][E][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:30:30 196248 [43204960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:30:30 196254 [43204960] -> PortInfo dump: > port number.............0x17 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:30:30 196269 [43204960] -> Capabilities Mask: > Mar 20 14:30:30 201633 [45007960] -> SUBNET UP > Mar 20 14:30:30 278051 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001c > Mar 20 14:30:30 278107 [43C05960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 28 times consecutively > Mar 20 14:30:30 278656 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000007b > Mar 20 14:30:30 278871 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:30:30 279653 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000008d > Mar 20 14:30:30 279765 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:30:30 568539 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000008e > Mar 20 14:30:30 568617 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:30:30 607916 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > TID:0x000000000000007c > Mar 20 14:30:30 625139 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:30:30 663838 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x0148 > GID:0xfe80000000000000,0x0005ad00000281b3 > Mar 20 14:30:30 664569 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x000000000000008f > Mar 20 14:30:30 664747 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:30:30 679262 [45A08960] -> SUBNET UP > Mar 20 14:30:30 784024 [43204960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000090 > Mar 20 14:30:30 784123 [43204960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:30:30 804217 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000091 > Mar 20 14:30:30 807807 [41401960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:30:30 825500 [45A08960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000092 > Mar 20 14:30:30 825600 [45A08960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:30:30 988887 [43C05960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000093 > Mar 20 14:30:30 988978 [43C05960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:30:31 059298 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > configured on all switches > Mar 20 14:30:31 106840 [41E02960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000094 > Mar 20 14:30:31 111335 [41E02960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:30:31 112465 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:30:31 112497 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x98837 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][16][1][5] > Return path: [0][9][13][D][2] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > > 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:30:31 112593 [44606960] -> osm_pi_rcv_process_set: ERR 0F10: > Received error status for SetResp() > Mar 20 14:30:31 112627 [44606960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281a7 > port_guid...............0x0005ad00000281a7 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x2 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............DOWN > state_info2.............0x42 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:30:31 112673 [44606960] -> Capabilities Mask: > Mar 20 14:30:31 113808 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > 3111: Error status = 0x1C00 > Mar 20 14:30:31 113838 [4780B960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x4 > trans_id................0x9883e > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x18 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][11][1][4] > Return path: [0][9][18][D][1] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > > 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > > Mar 20 14:30:31 113925 [43204960] -> osm_pi_rcv_process_set: Received error > status 0x1c for SetResp() during ACTIVE transition > Mar 20 14:30:31 113930 [43204960] -> PortInfo dump: > port number.............0x18 > node_guid...............0x0005ad00000281b3 > port_guid...............0x0005ad00000281b3 > m_key...................0x0000000000000000 > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > client_reregister.......0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Mar 20 14:30:31 113946 [43204960] -> Capabilities Mask: > Mar 20 14:30:31 119007 [43204960] -> SUBNET UP > Mar 20 14:30:31 128758 [45007960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000095 > Mar 20 14:30:31 128851 [45007960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:30:31 150370 [44606960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > TID:0x0000000000000096 > Mar 20 14:30:31 150468 [44606960] -> osm_report_notice: Reporting Generic > Notice type:1 num:128 from LID:0x001B > GID:0xfe80000000000000,0x0005ad00000281a7 > Mar 20 14:30:31 316422 [41401960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > TID:0x000000000000001c > Mar 20 14:30:31 316498 [41401960] -> __osm_trap_rcv_process_request: ERR > 3804: Received trap 29 times consecutively > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sweitzen at cisco.com Wed Mar 21 11:55:15 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 21 Mar 2007 11:55:15 -0700 Subject: [ofa-general] how do I interactively install the OFED included with RHEL5? Message-ID: Doug, I can't seem to get an interactive RHEL5 install to include all the OFED packages (for example, libib*). I see an OpenIB choice in the Base packages, but that doesn't install everything. RHEL should start using OFED name instead of OpenIB, too. I've only been able to install all OFED packages by using Kickstart and specifying this in my .cfg: %packages @ everything Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From krause at cup.hp.com Wed Mar 21 12:09:24 2007 From: krause at cup.hp.com (Michael Krause) Date: Wed, 21 Mar 2007 12:09:24 -0700 Subject: [ofa-general] [RFC] host stack IB-to-IB router support In-Reply-To: <4601708B.6090602@ichips.intel.com> References: <1174486948.6493.80653.camel@hal.voltaire.com> <6.2.0.14.2.20070321083240.031da1d8@esmail.cup.hp.com> <1174498055.17678.2769.camel@hal.voltaire.com> <4601708B.6090602@ichips.intel.com> Message-ID: <6.2.0.14.2.20070321120017.03349450@esmail.cup.hp.com> At 10:51 AM 3/21/2007, Sean Hefty wrote: >>Ok, lets assume Sean would finish his experiments with remote_sa, how >>would that find its way into the commercial sm/sa versions that are >>mostly used, how would we guarantee interoperability between all >>implementations, .. ? >>How would that address future routing, security, QoS, .. enhancements ? >>can it ? > >The 'remote sa' as simply a proprietary UD protocol. Whatever data two >'remote sa' services exchange shouldn't matter, nor should the fact that >each issues local SA path records. There's nothing magical about this. > >If I have an app that can query its local SA, there's nothing that >prevents that app from sending that data to whatever peer it can connect >to. It can even send the data over TCP if it wants. Keeping the SA >subnet local doesn't add any real security. > >Coming up with a solution that doesn't work with any existing hardware, >targets, and SAs isn't very useful. Just to clarify: - Nothing in the router protocol should have an impact on existing or even future hardware if done right. The basic wire protocol, i.e. the use of GRH, etc. should not require any modifications to operate on existing hardware. - Whether a HCA or a TCA, there will be some level of management protocol changes. This impacts the software above but not the hardware itself unless the implementation hard-coded / state machined its behavior in which case it is unlikely to work in any router environment. - For the SA, I think most will agree that there will be implementation changes required to comprehend where a router exists on a subnet and how to respond to queries that target a router. However, much of what I've noted here does not impact existing SA or SM operations - they continue to work as implemented. The changes proposed would be new additional capabilities that would enable router communication to occur. If people construct something like a DNS equivalent service to find the IB router, then this leverages the practices used with IP today and the more IB looks like IP when it comes to its operation, the easier it is to get it actually deployed beyond the HPC market. None of these items breaks or changes interoperability among any components. Again, I don't see any harm in waiting until Sonoma to discuss this face-to-face. Quite true that no major breakthroughs are likely but the benefits of plowing ahead with an implementation that may be viewed as an academic or a niche experiment does not seem worthwhile. However, people are free to spend their time as they desire. My only caution is such work should not set precedence nor should there be any expectation that it will ever see commercial adoption or deployment. Mike From yaronh at voltaire.com Wed Mar 21 12:44:25 2007 From: yaronh at voltaire.com (Yaron Haviv) Date: Wed, 21 Mar 2007 21:44:25 +0200 Subject: [ofa-general] [RFC] host stack IB-to-IB router support In-Reply-To: <4601708B.6090602@ichips.intel.com> References: <1174486948.6493.80653.camel@hal.voltaire.com><6.2.0.14.2.20070321083240.031da1d8@esmail.cup.hp.com> <1174498055.17678.2769.camel@hal.voltaire.com> <4601708B.6090602@ichips.intel.com> Message-ID: > -----Original Message----- > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Wednesday, March 21, 2007 1:51 PM > To: Yaron Haviv > Cc: Hal Rosenstock; Michael Krause; general at lists.openfabrics.org > Subject: Re: [ofa-general] [RFC] host stack IB-to-IB router support > > > Ok, lets assume Sean would finish his experiments with remote_sa, how > > would that find its way into the commercial sm/sa versions that are > > mostly used, how would we guarantee interoperability between all > > implementations, .. ? > > How would that address future routing, security, QoS, .. enhancements ? > > can it ? > > The 'remote sa' as simply a proprietary UD protocol. Whatever data two > 'remote > sa' services exchange shouldn't matter, nor should the fact that each > issues > local SA path records. There's nothing magical about this. > > If I have an app that can query its local SA, there's nothing that > prevents that > app from sending that data to whatever peer it can connect to. It can > even send > the data over TCP if it wants. Keeping the SA subnet local doesn't add > any real > security. > > Coming up with a solution that doesn't work with any existing hardware, > targets, > and SAs isn't very useful. > > - Sean Sean, If we are doing experiments, wouldn't it be simpler to do in the IP way I suggested: If its not my subnet read the DGID from a table (or even a config for now) And conduct SA query on that one On the remote side, add the reverse lookup rather than use the CM REQ SLID It sounds to me less work and less complexity Yaron From halr at voltaire.com Wed Mar 21 13:53:56 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Mar 2007 15:53:56 -0500 Subject: [ofa-general] broken links In-Reply-To: <200703191501.59020.arkady@netapp.com> References: <200703191501.59020.arkady@netapp.com> Message-ID: <1174510435.17678.15779.camel@hal.voltaire.com> On Mon, 2007-03-19 at 15:01, Arkady Kanevsky wrote: > The README for diagnostics building and running is located here: > https://openib.org/svn/gen2/trunk/src/userspace/management/README > > and > > A more complete description and command syntax of the diagnostic tools can be > found as: > https://openib.org/svn/gen2/trunk/src/userspace/management/doc/diagtools.txt > > are broken. > What are the correct links? Thanks to Michael and Jeff, I've updated the Diagnostics and OpenSM wiki pages to properly point into my management git repository. Let me know if you have furtjer issues with this. Thanks. -- Hal > Thanks, > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sean.hefty at intel.com Wed Mar 21 12:55:26 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 21 Mar 2007 12:55:26 -0700 Subject: [ofa-general] [RFC] host stack IB-to-IB router support In-Reply-To: Message-ID: <000401c76bf2$def60030$76248686@amr.corp.intel.com> >If its not my subnet read the DGID from a table (or even a config for >now) >And conduct SA query on that one >On the remote side, add the reverse lookup rather than use the CM REQ >SLID Trying to perform SA queries inside the CM protocol/state machine on the passive side is actually fairly complex. It's easier to separate that functionality out and push the responsibility over to the active side, which already issues SA queries. That said, speaking with the labs this morning, it sounds like the current functionality of replacing the SLID/DLID/SL in the CM REQ with data from the received LRH (in the work completion) is sufficient for their purposes. - Sean From rdreier at cisco.com Wed Mar 21 14:17:11 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 21 Mar 2007 14:17:11 -0700 Subject: [ofa-general] Re: [PATCH] IB/ipoib: fix thinko in packet length checks In-Reply-To: <20070321134505.GA23221@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 21 Mar 2007 15:45:05 +0200") References: <20070321132015.3FE95E6080F@openfabrics.org> <20070321134505.GA23221@mellanox.co.il> Message-ID: This definitely looks like a problem, but I'm confused by this: > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > @@ -452,7 +452,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ > skb->len, tx->mtu); > ++priv->stats.tx_dropped; > ++priv->stats.tx_errors; > - ipoib_cm_skb_too_long(dev, skb, tx->mtu - INFINIBAND_ALEN); > + ipoib_cm_skb_too_long(dev, skb, tx->mtu - IPOIB_ENCAP_LEN); > return; > } After this change, the code looks like: if (unlikely(skb->len > tx->mtu)) { ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", skb->len, tx->mtu); ++priv->stats.tx_dropped; ++priv->stats.tx_errors; ipoib_cm_skb_too_long(dev, skb, tx->mtu - IPOIB_ENCAP_LEN); return; } so why is the test against just tx->mtu, while ipoib_cm_skb_too_long() is passed an mtu of tx->mtu - IPOIB_ENCAP_LEN? - R. From rdreier at cisco.com Wed Mar 21 14:25:39 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 21 Mar 2007 14:25:39 -0700 Subject: [ofa-general] Re: IsSMdisabled and user_mad.c In-Reply-To: <1174458246.6493.49979.camel@hal.voltaire.com> (Hal Rosenstock's message of "21 Mar 2007 01:24:07 -0500") References: <1174458246.6493.49979.camel@hal.voltaire.com> Message-ID: > > I guess it's OK, although I would also like to know how you plan to > > handle the interaction between IsSM and IsSMDisabled -- eg what if a > > process opens issm0 and then another process tries to open > > issmdisabled0? Or conversely if issmdisabled0 is open, what happens > > when someone opens issm0? > > I would think those are error cases. Does that make sense ? If so, what > error makes most sense ? EINVAL or something else ? That's not really in keeping with the current interface. Right now if one process opens issm0 and then a second process tries to open, the second process blocks until the first one closes the file. Would it make more sense to make the issmdisabled interface work in a similar way, i.e. only one of issm and issmdisabled can be open at any time, and an attempt to open both would block the second attempt? From rdreier at cisco.com Wed Mar 21 14:26:55 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 21 Mar 2007 14:26:55 -0700 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count to completion event channels In-Reply-To: <4600E054.90504@dev.mellanox.co.il> (Dotan Barak's message of "Wed, 21 Mar 2007 09:35:48 +0200") References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> <45FFE47B.8000408@dev.mellanox.co.il> <4600E054.90504@dev.mellanox.co.il> Message-ID: > I believe that we can avoid problems only if we will have a lock in > the ibv_context that handles the completion channels. Yes, that is the idea I had too. > If you think that adding a lock to the ibv_context is good enough i > will send you a patch that implements this solution. Please do. I don't see anything objectionable to that solution. - R. From halr at voltaire.com Wed Mar 21 16:14:14 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Mar 2007 18:14:14 -0500 Subject: [ofa-general] Re: IsSMdisabled and user_mad.c In-Reply-To: References: <1174458246.6493.49979.camel@hal.voltaire.com> Message-ID: <1174518853.17678.24805.camel@hal.voltaire.com> On Wed, 2007-03-21 at 16:25, Roland Dreier wrote: > > > I guess it's OK, although I would also like to know how you plan to > > > handle the interaction between IsSM and IsSMDisabled -- eg what if a > > > process opens issm0 and then another process tries to open > > > issmdisabled0? Or conversely if issmdisabled0 is open, what happens > > > when someone opens issm0? > > > > I would think those are error cases. Does that make sense ? If so, what > > error makes most sense ? EINVAL or something else ? > > That's not really in keeping with the current interface. Right now if > one process opens issm0 and then a second process tries to open, the > second process blocks until the first one closes the file. Would it > make more sense to make the issmdisabled interface work in a similar > way, i.e. only one of issm and issmdisabled can be open at any time, > and an attempt to open both would block the second attempt? Sure; it could work that way too. I'll work on a patch for this over the next couple days. -- Hal From halr at voltaire.com Wed Mar 21 16:22:00 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Mar 2007 18:22:00 -0500 Subject: [ofa-general] Re: [PATCH] IB/umad: fix GRH handling In-Reply-To: <000001c76b85$74adfb50$18fd070a@amr.corp.intel.com> References: <000001c76b85$74adfb50$18fd070a@amr.corp.intel.com> Message-ID: <1174519319.17678.25309.camel@hal.voltaire.com> On Wed, 2007-03-21 at 01:52, Sean Hefty wrote: > >> Unfortunately, at least opensm cannot respond to SA queries issued from a > >> remote subnet. I'm not sure how much work this would take to fix, or if > >> other SAs have this issue. Hal briefly looked at the problems, > > > >FWIW, I'll be looking some more at these again. > > I think the following patch corrects the GRH handling issues in ib_umad. > (Tested loading of ib_umad module only, and not against openSM.) It can't be tested against OpenSM right now. > If this looks right, It looks right to me. I'll need some time to take it out for a test driver as some other issues need some work to exercise this. -- Hal > I'll add it to my rdma-dev.git ib_router branch > > Signed-off-by: Sean Hefty > --- > diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c > index c069ebe..7774cf5 100644 > --- a/drivers/infiniband/core/user_mad.c > +++ b/drivers/infiniband/core/user_mad.c > @@ -231,12 +231,17 @@ static void recv_handler(struct ib_mad_agent *agent, > packet->mad.hdr.path_bits = mad_recv_wc->wc->dlid_path_bits; > packet->mad.hdr.grh_present = !!(mad_recv_wc->wc->wc_flags & IB_WC_GRH); > if (packet->mad.hdr.grh_present) { > - /* XXX parse GRH */ > - packet->mad.hdr.gid_index = 0; > - packet->mad.hdr.hop_limit = 0; > - packet->mad.hdr.traffic_class = 0; > - memset(packet->mad.hdr.gid, 0, 16); > - packet->mad.hdr.flow_label = 0; > + struct ib_ah_attr ah_attr; > + > + ib_init_ah_from_wc(agent->device, agent->port_num, > + mad_recv_wc->wc, mad_recv_wc->recv_buf.grh, > + &ah_attr); > + > + packet->mad.hdr.gid_index = ah_attr.grh.sgid_index; > + packet->mad.hdr.hop_limit = ah_attr.grh.hop_limit; > + packet->mad.hdr.traffic_class = ah_attr.grh.traffic_class; > + memcpy(packet->mad.hdr.gid, &ah_attr.grh.dgid, 16); > + packet->mad.hdr.flow_label = cpu_to_be32(ah_attr.grh.flow_label); > } > > if (queue_packet(file, agent, packet)) > @@ -473,6 +478,7 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf, > if (packet->mad.hdr.grh_present) { > ah_attr.ah_flags = IB_AH_GRH; > memcpy(ah_attr.grh.dgid.raw, packet->mad.hdr.gid, 16); > + ah_attr.grh.sgid_index = packet->mad.hdr.gid_index; > ah_attr.grh.flow_label = be32_to_cpu(packet->mad.hdr.flow_label); > ah_attr.grh.hop_limit = packet->mad.hdr.hop_limit; > ah_attr.grh.traffic_class = packet->mad.hdr.traffic_class; > From sean.hefty at intel.com Wed Mar 21 15:25:39 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 21 Mar 2007 15:25:39 -0700 Subject: [ofa-general] RE: [PATCH] IB/umad: fix GRH handling In-Reply-To: <1174519319.17678.25309.camel@hal.voltaire.com> Message-ID: <000801c76c07$daddd350$76248686@amr.corp.intel.com> >It can't be tested against OpenSM right now. I actually meant testing that the changes didn't break opensm. I did verify that I could run opensm with these changes in place. >It looks right to me. I'll need some time to take it out for a test >driver as some other issues need some work to exercise this. I've pushed the changes into my ib_router git tree for now. - Sean From mst at dev.mellanox.co.il Wed Mar 21 22:50:34 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Mar 2007 07:50:34 +0200 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count to completion event channels In-Reply-To: References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> <45FFE47B.8000408@dev.mellanox.co.il> <4600E054.90504@dev.mellanox.co.il> Message-ID: <20070322055012.GB6303@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH V2 - libibverbs] Added reference count to completion event channels > > > I believe that we can avoid problems only if we will have a lock in > > the ibv_context that handles the completion channels. > > Yes, that is the idea I had too. > > > If you think that adding a lock to the ibv_context is good enough i > > will send you a patch that implements this solution. > > Please do. I don't see anything objectionable to that solution. Since this changes structure size, will this break the ABI wrt libibverbs 1.0 compatibility? -- MST From mst at dev.mellanox.co.il Wed Mar 21 23:04:58 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Mar 2007 08:04:58 +0200 Subject: [ofa-general] Re: [PATCH] IB/ipoib: fix thinko in packet length checks In-Reply-To: References: <20070321132015.3FE95E6080F@openfabrics.org> <20070321134505.GA23221@mellanox.co.il> Message-ID: <20070322060458.GC6303@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH] IB/ipoib: fix thinko in packet length checks > > This definitely looks like a problem, but I'm confused by this: > > > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > > @@ -452,7 +452,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ > > skb->len, tx->mtu); > > ++priv->stats.tx_dropped; > > ++priv->stats.tx_errors; > > - ipoib_cm_skb_too_long(dev, skb, tx->mtu - INFINIBAND_ALEN); > > + ipoib_cm_skb_too_long(dev, skb, tx->mtu - IPOIB_ENCAP_LEN); > > return; > > } > > After this change, the code looks like: > > if (unlikely(skb->len > tx->mtu)) { > ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", > skb->len, tx->mtu); > ++priv->stats.tx_dropped; > ++priv->stats.tx_errors; > ipoib_cm_skb_too_long(dev, skb, tx->mtu - IPOIB_ENCAP_LEN); > return; > } > > so why is the test against just tx->mtu, while ipoib_cm_skb_too_long() > is passed an mtu of tx->mtu - IPOIB_ENCAP_LEN? I actually copied this from datagram code. What ipoib_cm_skb_too_long does is set the dest MTU metric. So look at this logic in datagram mode: if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN) { return -EINVAL; } Why is the max MTU set to IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN and not to IPOIB_PACKET_SIZE? -- MST From sweitzen at cisco.com Thu Mar 22 00:56:07 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 22 Mar 2007 00:56:07 -0700 Subject: [ofa-general] bugs to fix for OFED 1.2 RC1 Message-ID: Here's a list of bugs I would like fixed for RC1 in a week. bug_id assigned_to component short_desc 258 pasha at mellanox.co.il MVAPICH OFED: ppc64 GNU mpif90 missing for MVAPICH 261 bugzilla at openib.org IPoIB can't configure IPoIB pkey interfaces at boot time 262 ishai at mellanox.co.il SRP can't configure SRP mounts from /etc/fstab 404 vlad at mellanox.co.il IPoIB IPoIB HA starts with port ib1 up 418 mst at dev.mellanox.co.il IPoIB IPoIB CM causes 2020-byte message IPv4 multicast to fail 431 mst at mellanox.co.il IPoIB IPoIB CM locks up server on SLES10/RHEL4 ppc64 443 ishai at mellanox.co.il SRP add support for SRP HA on RHEL4 445 pasha at mellanox.co.il MVAPICH OFED 1.2 MVAPICH won't work on ppc64 455 vlad at mellanox.co.il IPoIB new dmesg output every time IPoIB CM HA fails over 459 monis at voltaire.com IPoIB support ib-bonding on RHEL4U4/RHEL5, put kernel name in RPM name, and clean up better 464 rolandd at cisco.com Verbs release libibverbs-1.1 final before OFED 1.2 465 mst at mellanox.co.il IPoIB IPoIB CM HA fails after several hours of failovers 466 mst at mellanox.co.il SDP sdpnetstat not getting built on RHEL5 474 ishai at mellanox.co.il SRP OFED srp_daemon keeps readding targets with Cisco FC GW Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From ishai at dev.mellanox.co.il Thu Mar 22 01:30:32 2007 From: ishai at dev.mellanox.co.il (Ishai Rabinovitz) Date: Thu, 22 Mar 2007 10:30:32 +0200 Subject: [ewg] RE: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <000001c76be3$68e00ad0$76248686@amr.corp.intel.com> References: <000001c76be3$68e00ad0$76248686@amr.corp.intel.com> Message-ID: <46023EA8.5010507@dev.mellanox.co.il> I think it is very good that you changed the define to a module parameter, but shouldn't the default value be smaller (21). (15*32 sec = 8 Min) seems like quite a long time. Ishai Sean Hefty wrote: > + * Limit CM message timeouts to something reasonable: > + * 32 seconds per message, with up to 15 retries > + */ > +static int max_timeout = 23; > +module_param(max_timeout, int, 0644); > +MODULE_PARM_DESC(max_timeout, "Maximum IB CM per message timeout " > + "(default=23, or ~32 seconds)"); > + From mst at dev.mellanox.co.il Thu Mar 22 01:43:24 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Mar 2007 10:43:24 +0200 Subject: [ewg] RE: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <46023EA8.5010507@dev.mellanox.co.il> References: <000001c76be3$68e00ad0$76248686@amr.corp.intel.com> <46023EA8.5010507@dev.mellanox.co.il> Message-ID: <20070322084324.GE29341@mellanox.co.il> > Sean Hefty wrote: > > >+ * Limit CM message timeouts to something reasonable: > >+ * 32 seconds per message, with up to 15 retries > >+ */ > >+static int max_timeout = 23; > >+module_param(max_timeout, int, 0644); > >+MODULE_PARM_DESC(max_timeout, "Maximum IB CM per message timeout " > >+ "(default=23, or ~32 seconds)"); > >+ > > Quoting Ishai Rabinovitz : > Subject: Re: [ewg] RE: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes > > I think it is very good that you changed the define to a module parameter, > but shouldn't the default value be smaller (21). (15*32 sec = 8 Min) seems > like quite a long time. > > Ishai Hmm, indeed. Sean, wouldn't 1 minute be sufficient? -- MST From vlad at lists.openfabrics.org Thu Mar 22 02:35:20 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Thu, 22 Mar 2007 02:35:20 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070322-0200 daily build status Message-ID: <20070322093520.79711E6080E@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.13 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.17 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.14 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.14 Passed on powerpc with linux-2.6.15 Passed on x86_64 with linux-2.6.17 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.14 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Failed: From monil at voltaire.com Thu Mar 22 06:02:29 2007 From: monil at voltaire.com (Moni Levy) Date: Thu, 22 Mar 2007 15:02:29 +0200 Subject: [ofa-general] Re: [ewg] bugs to fix for OFED 1.2 RC1 In-Reply-To: References: Message-ID: <6a122cc00703220602s7cdad558ud73f72e39f812eaf@mail.gmail.com> On 3/22/07, Scott Weitzenkamp (sweitzen) wrote: > > Here's a list of bugs I would like fixed for RC1 in a week. > > bug_id assigned_to component short_desc 258 pasha at mellanox.co.il > MVAPICH OFED: ppc64 GNU mpif90 missing for MVAPICH 261 > bugzilla at openib.org IPoIB can't configure IPoIB pkey interfaces at boot > time 262 ishai at mellanox.co.il SRP can't configure SRP mounts from > /etc/fstab 404 vlad at mellanox.co.il IPoIB IPoIB HA starts with port ib1 up > 418 mst at dev.mellanox.co.il IPoIB IPoIB CM causes 2020-byte message IPv4 > multicast to fail 431 mst at mellanox.co.il IPoIB IPoIB CM locks up server > on SLES10/RHEL4 ppc64 443 ishai at mellanox.co.il SRP add support for SRP HA > on RHEL4 445 pasha at mellanox.co.il MVAPICH OFED 1.2 MVAPICH won't work on > ppc64 455 vlad at mellanox.co.il IPoIB new dmesg output every time IPoIB CM > HA fails over 459 monis at voltaire.com IPoIB support ib-bonding on > RHEL4U4/RHEL5, put kernel name in RPM name, and clean up better 464 > rolandd at cisco.com Verbs release libibverbs-1.1 final before OFED 1.2 465 > mst at mellanox.co.il IPoIB IPoIB CM HA fails after several hours of > failovers 466 mst at mellanox.co.il SDP sdpnetstat not getting built on > RHEL5 474 ishai at mellanox.co.il SRP OFED srp_daemon keeps readding targets > with Cisco FC GW > I would like to add these two to the list: 413 nor P3 All mst at mellanox.co.il NEW IPoIB passes async events to an unrelated devices. 420 cri P3 All monil at voltaire.com NEW PKey table reordering caused by SM failover stops ipoib t... -- Moni Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Thu Mar 22 07:02:17 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Mar 2007 07:02:17 -0700 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count to completion event channels In-Reply-To: <20070322055012.GB6303@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 22 Mar 2007 07:50:34 +0200") References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> <45FFE47B.8000408@dev.mellanox.co.il> <4600E054.90504@dev.mellanox.co.il> <20070322055012.GB6303@mellanox.co.il> Message-ID: > Since this changes structure size, will this break the ABI wrt > libibverbs 1.0 compatibility? Why?? We already wrap the context structure in the ABI compatibility code. - R. From mst at dev.mellanox.co.il Thu Mar 22 07:05:39 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Mar 2007 16:05:39 +0200 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count to completion event channels In-Reply-To: References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> <45FFE47B.8000408@dev.mellanox.co.il> <4600E054.90504@dev.mellanox.co.il> <20070322055012.GB6303@mellanox.co.il> Message-ID: <20070322140539.GB11177@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH V2 - libibverbs] Added reference count to completion event channels > > > Since this changes structure size, will this break the ABI wrt > > libibverbs 1.0 compatibility? > > Why?? We already wrap the context structure in the ABI compatibility code. Oh. missed that. -- MST From mst at dev.mellanox.co.il Thu Mar 22 07:07:27 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Mar 2007 16:07:27 +0200 Subject: [ofa-general] Re: [PATCH] IB/ipoib: fix thinko in packet length checks In-Reply-To: <20070322060458.GC6303@mellanox.co.il> References: <20070321132015.3FE95E6080F@openfabrics.org> <20070321134505.GA23221@mellanox.co.il> <20070322060458.GC6303@mellanox.co.il> Message-ID: <20070322140727.GC11177@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: Re: [PATCH] IB/ipoib: fix thinko in packet length checks > > > Quoting Roland Dreier : > > Subject: Re: [PATCH] IB/ipoib: fix thinko in packet length checks > > > > This definitely looks like a problem, but I'm confused by this: > > > > > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > > > @@ -452,7 +452,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ > > > skb->len, tx->mtu); > > > ++priv->stats.tx_dropped; > > > ++priv->stats.tx_errors; > > > - ipoib_cm_skb_too_long(dev, skb, tx->mtu - INFINIBAND_ALEN); > > > + ipoib_cm_skb_too_long(dev, skb, tx->mtu - IPOIB_ENCAP_LEN); > > > return; > > > } > > > > After this change, the code looks like: > > > > if (unlikely(skb->len > tx->mtu)) { > > ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", > > skb->len, tx->mtu); > > ++priv->stats.tx_dropped; > > ++priv->stats.tx_errors; > > ipoib_cm_skb_too_long(dev, skb, tx->mtu - IPOIB_ENCAP_LEN); > > return; > > } > > > > so why is the test against just tx->mtu, while ipoib_cm_skb_too_long() > > is passed an mtu of tx->mtu - IPOIB_ENCAP_LEN? > > I actually copied this from datagram code. > > What ipoib_cm_skb_too_long does is set the dest MTU metric. > > So look at this logic in datagram mode: > if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN) { > return -EINVAL; > } > > Why is the max MTU set to IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN and > not to IPOIB_PACKET_SIZE? And by the way, the chunk in the datagram part looks the same. -- MST From kliteyn at dev.mellanox.co.il Thu Mar 22 07:33:25 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 22 Mar 2007 16:33:25 +0200 Subject: [ofa-general] [PATCH] osm: bug in fat-tree routing Message-ID: <460293B5.4060708@dev.mellanox.co.il> Hi Hal, Fixing bug in fat-tree routing with loops in fabric. When the switch connections form loop in the fabric, fat-tree routing could follow this loop and set LFT table value for switches that have been already configured for this LID. Please apply to ofed_1_2 and to master. Thanks. -- Yevgeny Signed-off-by: Yevgeny Kliteynik --- osm/opensm/osm_ucast_ftree.c | 17 +++++++++++++++++ 1 files changed, 17 insertions(+), 0 deletions(-) diff --git a/osm/opensm/osm_ucast_ftree.c b/osm/opensm/osm_ucast_ftree.c index a4f307d..655a821 100644 --- a/osm/opensm/osm_ucast_ftree.c +++ b/osm/opensm/osm_ucast_ftree.c @@ -1826,6 +1826,23 @@ __osm_ftree_fabric_route_upgoing_by_goin set LFT(target_lid) on the remote switch to the remote port */ p_remote_sw = p_group->remote_hca_or_sw.remote_sw; + if ( osm_switch_get_least_hops(p_remote_sw->p_osm_sw, + cl_ntoh16(target_lid)) != OSM_NO_PATH ) + { + /* Loop in the fabric - we already routed the remote switch + on our way UP, and now we see it again on our way DOWN */ + osm_log(&p_ftree->p_osm->log, OSM_LOG_DEBUG, + "__osm_ftree_fabric_route_upgoing_by_going_down: " + "Loop of lenght %d in the fabric:\n " + "Switch %s (LID 0x%x) closes loop through switch %s (LID 0x%x)\n", + (p_remote_sw->rank - highest_rank_in_route) * 2, + __osm_ftree_tuple_to_str(p_remote_sw->tuple), + cl_ntoh16(p_group->base_lid), + __osm_ftree_tuple_to_str(p_sw->tuple), + cl_ntoh16(p_group->remote_base_lid)); + continue; + } + /* Four possible cases: * * 1. is_real_lid == TRUE && is_main_path == TRUE: -- 1.4.4.1.GIT From halr at voltaire.com Thu Mar 22 09:03:32 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Mar 2007 11:03:32 -0500 Subject: [ofa-general] Re: [PATCH] osm: bug in fat-tree routing In-Reply-To: <460293B5.4060708@dev.mellanox.co.il> References: <460293B5.4060708@dev.mellanox.co.il> Message-ID: <1174579410.24305.63844.camel@hal.voltaire.com> On Thu, 2007-03-22 at 09:33, Yevgeny Kliteynik wrote: > Hi Hal, > > Fixing bug in fat-tree routing with loops in fabric. > > When the switch connections form loop in the fabric, fat-tree routing > could follow this loop and set LFT table value for switches that have > been already configured for this LID. > > Please apply to ofed_1_2 and to master. > > Thanks. > > -- Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied (to both master and ofed_1_2). -- Hal From dotanb at dev.mellanox.co.il Thu Mar 22 08:10:32 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 22 Mar 2007 17:10:32 +0200 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count to completion event channels In-Reply-To: References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> <45FFE47B.8000408@dev.mellanox.co.il> <4600E054.90504@dev.mellanox.co.il> Message-ID: <46029C68.1090706@dev.mellanox.co.il> Roland Dreier wrote: > > If you think that adding a lock to the ibv_context is good enough i > > will send you a patch that implements this solution. > > Please do. I don't see anything objectionable to that solution. > > Great. I will send you the patch during the next week. thanks Dotan From mst at dev.mellanox.co.il Thu Mar 22 08:24:22 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Mar 2007 17:24:22 +0200 Subject: [ofa-general] ofed 1.2: please pull ~mst/mstflint.git Message-ID: <20070322152422.GH11177@mellanox.co.il> fixes bug 484 -- MST From swise at opengridcomputing.com Thu Mar 22 08:38:20 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 22 Mar 2007 10:38:20 -0500 Subject: [ofa-general] [PATCH 2.6.21] iw_cxgb3: Handle build_phys_page_list() failure in iwch_reregister_phys_mem(). Message-ID: <1174577900.16862.27.camel@stevo-desktop> Handle build_phys_page_list() failure in iwch_reregister_phys_mem(). Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index f2774ae..24e0df0 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -545,11 +545,14 @@ static int iwch_reregister_phys_mem(stru php = to_iwch_pd(pd); if (mr_rereg_mask & IB_MR_REREG_ACCESS) mh.attr.perms = iwch_ib_to_tpt_access(acc); - if (mr_rereg_mask & IB_MR_REREG_TRANS) + if (mr_rereg_mask & IB_MR_REREG_TRANS) { ret = build_phys_page_list(buffer_list, num_phys_buf, iova_start, &total_size, &npages, &shift, &page_list); + if (ret) + return ret; + } ret = iwch_reregister_mem(rhp, php, &mh, shift, page_list, npages); kfree(page_list); From fenkes at de.ibm.com Thu Mar 22 08:52:13 2007 From: fenkes at de.ibm.com (Joachim Fenkes) Date: Thu, 22 Mar 2007 16:52:13 +0100 Subject: [ofa-general] [PATCH] eHCA: Make scaling code also work without CPU hotplug Message-ID: <200703221652.13216.fenkes@de.ibm.com> eHCA scaling code must not depend on register_cpu_notifier() if CONFIG_HOTPLUG_CPU is not set, so put all related code into #ifdefs. Signed-off-by: Joachim Fenkes --- ehca_irq.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index 20f36bf..f284be1 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -66,7 +66,9 @@ #define ERROR_DATA_TYPE EHCA_BMAS static void queue_comp_task(struct ehca_cq *__cq); static struct ehca_comp_pool* pool; +#ifdef CONFIG_HOTPLUG_CPU static struct notifier_block comp_pool_callback_nb; +#endif static inline void comp_event_callback(struct ehca_cq *cq) { @@ -733,6 +735,7 @@ static void take_over_work(struct ehca_c } +#ifdef CONFIG_HOTPLUG_CPU static int comp_pool_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) @@ -775,6 +778,7 @@ static int comp_pool_callback(struct not return NOTIFY_OK; } +#endif int ehca_create_comp_pool(void) { @@ -805,9 +809,11 @@ int ehca_create_comp_pool(void) } } +#ifdef CONFIG_HOTPLUG_CPU comp_pool_callback_nb.notifier_call = comp_pool_callback; comp_pool_callback_nb.priority =0; register_cpu_notifier(&comp_pool_callback_nb); +#endif printk(KERN_INFO "eHCA scaling code enabled\n"); @@ -821,7 +827,9 @@ void ehca_destroy_comp_pool(void) if (!ehca_scaling_code) return; +#ifdef CONFIG_HOTPLUG_CPU unregister_cpu_notifier(&comp_pool_callback_nb); +#endif for (i = 0; i < NR_CPUS; i++) { if (cpu_online(i)) From tziporet at dev.mellanox.co.il Thu Mar 22 09:15:08 2007 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Thu, 22 Mar 2007 18:15:08 +0200 Subject: [ofa-general] Reminder: OFED 1.2 meeting on Monday Mar 26 9am PST Message-ID: <4602AB8C.9080906@mellanox.co.il> This is a reminder for the OFED 1.2 release coordination meeting on Monday Mar 26 at 9am PST Agenda: * Release status toward RC1 * Updated bugs list will be sent on Monday Tziporet -------------------------------------------------------------------------------------------------------------------------------------- Bridge info: Date/Time: MAR 12, 2007 at 12:00PM America/New_York Length: 60 Frequency: 10 Meeting ID: 2102061 Meeting Password: Global Access Numbers: http://cisco.com/en/US/about/doing_business/conferencing/index.html US/Canada: +1.866.432.9903 United Kingdom: +44.20.8824.0117 India: +91.80.4103.3979 Germany: +49.619.6773.9002 Japan: +81.3.5763.9394 China: +86.10.8515.5666 GLOBAL ACCESS NUMBERS COUNTRY LOCATION LOCAL NUMBER TOLL FREE-FREEFONE AMERICAS United States East +1.919.392.3330 1.866.349.3520 West +1.408.525.6800 1.866.432.9903 Argentina Buenos Aires +54.11.4341.0101 Brazil Brasilia +55.613.424.0220 Rio de Janeiro +55.21.2483.6302 Sao Paulo +55.11.5508.6311 Canada Calgary +1.403.514.2435 Edmonton +1.780.441.3715 Halifax +1.902.474.0214 Kanata +1.613.254.0005 Markham +1.905.470.4810 Montreal +1.514.847.6875 Ottawa +1.613.788.7250 Quebec +1.418.634.5645 Regina +1.306.566.6410 Toronto +1.416.306.7230 Vancouver +1.604.647.2350 Winnipeg +1.204.336.6610 Chile Santiago +56.2.431.4936 Colombia Bogota +57.1.325.6065 Mexico Mexico City +52.55.5267.1800 Peru Lima +51.1.215.5101 Puerto Rico San Juan +1.787.620.1865 Venezuela Caracas +58.212.902.0210 EMEA Austria Vienna +43.12.4030.6022 Belgium Diegem +32.2.704.5072 Bulgaria Sofia +359.2.937.5938 Croatia Zagreb +385.1.462.8908 Denmark Aabyhoj +45.8.939.7131 Copenhagen +45.3.958.5010 Finland Espoo +358.204.70.6227 France Paris +33.15.804.3116 Germany Eschborn +49.619.6773.9002 Hallbergmoos +49.811.554.3016 Greece Athens +30.210.638.1303 Hungary Budapest +36.1.225.4621 Ireland Dublin +353.1.819.2717 Israel Netanya +972.9.892.7026 Italy Rome +39.06.5164.4006 Netherlands Amsterdam +31.20.357.1487 Norway Oslo +47.23.27.3647 Poland Warsaw +48.22.572.2615 Portugal Lisbon +351.21.446.8756 Slovakia Bratislava +421.2.5825.5309 South Africa Johannesburg +27.11.267.1011 Pretoria +27.12.844.7401 Spain Barcelona +34.93.393.4037 Madrid +34.91.201.2149 Sweden Gothenburg +46.31.63.4409 Stockholm +46.8.685.9035 Switzerland Glattzentrum +41.44.878.7335 Turkey Istanbul +90.212.335.0208 United Arab Emirates (UAE) Dubai +971.4.390.7840 United Kingdom Bedfont Lakes +44.20.8824.0117 Edinburgh +44.131.561.3643 London City +44.20.7496.3743 ASIA PAC Australia Canberra +61.2.6216.0643 86.16.0643 Melbourne +61.3.9659.4173 North Sydney +61.2.8446.5260 China Beijing +86.10.8515.5666 HongKong HongKong +852.3414.1802 India Bangalore +91.80.4103.3979 Mumbai IL & FS +91.22.4043.4030 New Delhi +91.11.4261.1088 Indonesia Jakarta +62.21.7854.7476 Japan Tokyo Akasaka +81.3.5763.9394 South Korea Seoul Asem +82.2.3429.8102 Malaysia Kuala Lumpur +60.3.7723.8620 Penang +60.4.631.5125 New Zealand Auckland +64.9.355.1968 Wellington +64.4.496.5554 Phillipines Makati (Manila) +63.2.750.5886 Singapore Singapore Capital +65.6317.7088 Taiwan Taipei +886.2.8758.7088 Thailand Bangkok +66.2.263.7008 Vietnam Hanoi +84.4.974.6250 Ho Chi Minh City +84.8.823.3418 (Saigon) From bugzilla-daemon at lists.openfabrics.org Thu Mar 22 09:49:18 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 22 Mar 2007 09:49:18 -0700 (PDT) Subject: [ofa-general] [Bug 485] New: creating & deleting a subinterface with a bad pkey crashs the kernel: NULL pointer reference Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=485 Summary: creating & deleting a subinterface with a bad pkey crashs the kernel: NULL pointer reference Product: OpenFabrics Linux Version: 1.2beta1 Platform: X86 OS/Version: RHEL 4 Status: NEW Severity: normal Priority: P1 Component: IB Core AssignedTo: bugzilla at openib.org ReportedBy: Philippe.Gregoire at cea.fr Creating and deleting a subinterface with a pkey partition which do not include the node yields into a panic : Mar 22 18:19:07 cors118 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000054 Mar 22 18:19:07 cors118 kernel: printing eip: Mar 22 18:19:07 cors118 kernel: c02d3325 Mar 22 18:19:07 cors118 kernel: *pde = 2cbd1001 Mar 22 18:19:07 cors118 kernel: Oops: 0000 [#1] Mar 22 18:19:07 cors118 kernel: SMP Mar 22 18:19:07 cors118 kernel: Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd nfs_acl sunrpc rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_local_s a(U) ib_ipoib(U) md5 ipv6 ide_dump cciss_dump scsi_dump diskdump zlib_deflate dm_mirror dm_mod button battery ac ohci_hcd ib_mthca(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) tg3 floppy ext3 jbd lpfc scsi_transport_fc cciss sd_mod scsi_mod Mar 22 18:19:07 cors118 kernel: CPU: 3 Mar 22 18:19:07 cors118 kernel: EIP: 0060:[] Not tainted VLI Mar 22 18:19:07 cors118 kernel: EFLAGS: 00010046 (2.6.9-42.ELsmp) Mar 22 18:19:07 cors118 kernel: EIP is at _spin_lock_irqsave+0x7/0x45 Mar 22 18:19:07 cors118 kernel: eax: 00000050 ebx: 00000246 ecx: e99fd200 edx: 00000000 Mar 22 18:19:07 cors118 kernel: esi: 00000050 edi: 00000050 ebp: 00000000 esp: f368ce18 Mar 22 18:19:07 cors118 kernel: ds: 007b es: 007b ss: 0068 Mar 22 18:19:07 cors118 kernel: Process bash (pid: 4452, threadinfo=f368c000 task=f10f9830) Mar 22 18:19:07 cors118 kernel: Stack: 00000000 e99fd240 f88edb33 e99fd000 e99fd240 e99fd240 00008020 f89dfefe Mar 22 18:19:07 cors118 kernel: e99fd000 e99fd240 e99fd240 f89dc176 00000001 e99fd000 f736aaf0 ea1d5000 Mar 22 18:19:07 cors118 kernel: c016691a 00000001 00000001 f89e8338 f89e8338 00000058 f4be0000 f368ce98 Mar 22 18:19:07 cors118 kernel: Call Trace: Mar 22 18:19:07 cors118 kernel: [] cm_destroy_id+0x12/0x1a5 [ib_cm] Mar 22 18:19:07 cors118 kernel: [] ipoib_cm_dev_stop+0x23/0xae [ib_ipoib] Mar 22 18:19:07 cors118 kernel: [] ipoib_ib_dev_stop+0x28/0x33c [ib_ipoib] Mar 22 18:19:07 cors118 kernel: [] __link_path_walk+0x133/0xbb5 Mar 22 18:19:07 cors118 kernel: [] flush_cpu_workqueue+0x14b/0x153 Mar 22 18:19:07 cors118 kernel: [] autoremove_wake_function+0x0/0x2d Mar 22 18:19:07 cors118 kernel: [] autoremove_wake_function+0x0/0x2d Mar 22 18:19:07 cors118 kernel: [] autoremove_wake_function+0x0/0x2d Mar 22 18:19:07 cors118 kernel: [] ipoib_flush_paths+0x11a/0x122 [ib_ipoib] Mar 22 18:19:07 cors118 kernel: [] ipoib_stop+0x58/0xf8 [ib_ipoib] Mar 22 18:19:07 cors118 kernel: [] dev_close+0x57/0x77 Mar 22 18:19:07 cors118 kernel: [] unregister_netdevice+0x94/0x1fa Mar 22 18:19:07 cors118 kernel: [] unregister_netdev+0xf/0x15 Mar 22 18:19:07 cors118 kernel: [] ipoib_vlan_delete+0x30/0xfa [ib_ipoib] Mar 22 18:19:07 cors118 kernel: [] delete_child+0x39/0x46 [ib_ipoib] Mar 22 18:19:07 cors118 kernel: [] delete_child+0x0/0x46 [ib_ipoib] Mar 22 18:19:07 cors118 kernel: [] class_device_attr_store+0x19/0x21 Mar 22 18:19:07 cors118 kernel: [] flush_write_buffer+0x20/0x25 Mar 22 18:19:07 cors118 kernel: [] sysfs_write_file+0x57/0x7c Mar 22 18:19:07 cors118 kernel: [] vfs_write+0xb6/0xe2 Mar 22 18:19:07 cors118 kernel: [] sys_write+0x3c/0x62 Mar 22 18:19:07 cors118 kernel: [] syscall_call+0x7/0xb Mar 22 18:19:07 cors118 kernel: Code: 6c 00 0c 60 2e c0 0f b6 02 84 c0 7e 08 0f 0b 6d 00 0c 60 2e c0 86 0a c3 f0 81 00 00 00 00 01 c3 f0 ff 00 c3 56 89 c6 53 9c 5b fa <81> 78 04 ad 4e ad de 74 18 ff 74 24 08 68 a1 6f 2e c0 e8 62 f5 SYSTEM INFORMATIONS : [root at cors118 ~]# uname -a Linux cors118 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:27:17 EDT 2006 i686 i686 i386 GNU/Linux [root at cors118 ~]# cat /etc/redhat-release Red Hat Enterprise Linux WS release 4 (Nahant Update 4) [root at cors118 ~]# rpm -qa kernel-ib kernel-ib-1.2-2.6.9_42.ELsmp [root at cors118 ~]# lspci | grep Mella 03:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1) 04:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1) PARTITIONS CONFIGURED BY THE CISCO SUBNET MANAGER : [root at cors118 ib0]# grep -v 0x0000 /sys/class/infiniband/mthca0/ports/1/pkeys/* /sys/class/infiniband/mthca0/ports/1/pkeys/0:0xffff /sys/class/infiniband/mthca0/ports/1/pkeys/1:0x8001 /sys/class/infiniband/mthca0/ports/1/pkeys/2:0x8002 /sys/class/infiniband/mthca0/ports/1/pkeys/3:0x8010 PROCEDURE [root at cors118 ib0]# dmesg | grep ib0 divert: not allocating divert_blk for non-ethernet device ib0 root at cors118 ib0]# pwd /sys/class/net/ib0 [root at cors118 ib0]# echo 0x8001 > create_child [root at cors118 ib0]# ls .. eth0 eth1 ib0 ib0.8001 ib1 lo sit0 [root at cors118 ib0]# dmesg | grep ib0 divert: not allocating divert_blk for non-ethernet device ib0 divert: not allocating divert_blk for non-ethernet device ib0.8001 [root at cors118 ib0]# echo 0x8001 > delete_child [root at cors118 ib0]# dmesg | grep ib0 divert: not allocating divert_blk for non-ethernet device ib0 divert: not allocating divert_blk for non-ethernet device ib0.8001 divert: no divert_blk to free, ib0.8001 not ethernet [root at cors118 ib0]# echo 0x8020 > create_child [root at cors118 ib0]# dmesg | grep ib0 divert: not allocating divert_blk for non-ethernet device ib0 divert: not allocating divert_blk for non-ethernet device ib0.8001 divert: no divert_blk to free, ib0.8001 not ethernet divert: not allocating divert_blk for non-ethernet device ib0.8020 [root at cors118 ib0]# ls .. eth0 eth1 ib0 ib0.8020 ib1 lo sit0 [root at cors118 ib0]# echo 0x8020 > delete_child .... you got it ! -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mshefty at ichips.intel.com Thu Mar 22 09:56:58 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 22 Mar 2007 09:56:58 -0700 Subject: [ewg] RE: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <20070322084324.GE29341@mellanox.co.il> References: <000001c76be3$68e00ad0$76248686@amr.corp.intel.com> <46023EA8.5010507@dev.mellanox.co.il> <20070322084324.GE29341@mellanox.co.il> Message-ID: <4602B55A.2090402@ichips.intel.com> >>I think it is very good that you changed the define to a module parameter, >>but shouldn't the default value be smaller (21). (15*32 sec = 8 Min) seems >>like quite a long time. >> >>Ishai > > > Hmm, indeed. Sean, wouldn't 1 minute be sufficient? This really depends on the app. For scale-up testing, we need to include the time that it takes an app to come down and fetch the connection event, process it, and respond. When new connections form along with data transfers, the app may be slow to handle this. I have no objections to setting the default lower though. But this is a max timeout. If the timeout in the path record/CM messages is smaller, that would be used instead. - Sean From mst at dev.mellanox.co.il Thu Mar 22 10:22:45 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Mar 2007 19:22:45 +0200 Subject: [ofa-general] Re: bugs to fix for OFED 1.2 RC1 In-Reply-To: <6a122cc00703220602s7cdad558ud73f72e39f812eaf@mail.gmail.com> References: <6a122cc00703220602s7cdad558ud73f72e39f812eaf@mail.gmail.com> Message-ID: <20070322172245.GB17532@mellanox.co.il> > I would like to add these two to the list: > > IPoIB passes async events to an > 413 nor P3 All mst at mellanox.co.il NEW unrelated devices. > This is not a problem. > 420 cri P3 All monil at voltaire.com NEW PKey table reordering caused by > SM failover stops ipoib t... Please re-post the latest patch on openib-general. I'd like Roland's feedback. -- MST From rdreier at cisco.com Thu Mar 22 10:27:16 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Mar 2007 10:27:16 -0700 Subject: [ofa-general] Re: [PATCH] IB/ipoib: fix thinko in packet length checks In-Reply-To: <20070322060458.GC6303@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 22 Mar 2007 08:04:58 +0200") References: <20070321132015.3FE95E6080F@openfabrics.org> <20070321134505.GA23221@mellanox.co.il> <20070322060458.GC6303@mellanox.co.il> Message-ID: > > After this change, the code looks like: > > > > if (unlikely(skb->len > tx->mtu)) { > > ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", > > skb->len, tx->mtu); > > ++priv->stats.tx_dropped; > > ++priv->stats.tx_errors; > > ipoib_cm_skb_too_long(dev, skb, tx->mtu - IPOIB_ENCAP_LEN); > > return; > > } > > > > so why is the test against just tx->mtu, while ipoib_cm_skb_too_long() > > is passed an mtu of tx->mtu - IPOIB_ENCAP_LEN? > > I actually copied this from datagram code. > > What ipoib_cm_skb_too_long does is set the dest MTU metric. Right, but this code sets the dest MTU to tx->mtu - IPOIB_ENCAP_LEN if it sees a packet longer than tx->mtu. Shouldn't the initial test be against tx->mtu - IPOIB_ENCAP_LEN too? > So look at this logic in datagram mode: > if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN) { > return -EINVAL; > } > > Why is the max MTU set to IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN and > not to IPOIB_PACKET_SIZE? Because the largest MTU we can handle is the biggest message we can receive minus the 4 byte encapsulation overhead. In other words, if we can handle 2048 byte UD messages, then we shouldn't allow a datagram MTU above 2044. The same reasoning applies to ethernet, so an ethernet interface has an MTU of 1500, because ethernet can send 1514 byte packets but space has to be left for the 14 bytes of src/dest/ethertype fields. Or am I missing something here? - R. From rdreier at cisco.com Thu Mar 22 10:27:51 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Mar 2007 10:27:51 -0700 Subject: [ofa-general] Re: [PATCH] IB/ipoib: fix thinko in packet length checks In-Reply-To: <20070322140727.GC11177@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 22 Mar 2007 16:07:27 +0200") References: <20070321132015.3FE95E6080F@openfabrics.org> <20070321134505.GA23221@mellanox.co.il> <20070322060458.GC6303@mellanox.co.il> <20070322140727.GC11177@mellanox.co.il> Message-ID: > And by the way, the chunk in the datagram part looks the same. Which chunk? - R. From mst at dev.mellanox.co.il Thu Mar 22 10:51:32 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Mar 2007 19:51:32 +0200 Subject: [ofa-general] Re: [PATCH] IB/ipoib: fix thinko in packet length checks In-Reply-To: References: <20070321132015.3FE95E6080F@openfabrics.org> <20070321134505.GA23221@mellanox.co.il> <20070322060458.GC6303@mellanox.co.il> Message-ID: <20070322175132.GC17532@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH] IB/ipoib: fix thinko in packet length checks > > > > After this change, the code looks like: > > > > > > if (unlikely(skb->len > tx->mtu)) { > > > ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", > > > skb->len, tx->mtu); > > > ++priv->stats.tx_dropped; > > > ++priv->stats.tx_errors; > > > ipoib_cm_skb_too_long(dev, skb, tx->mtu - IPOIB_ENCAP_LEN); > > > return; > > > } > > > > > > so why is the test against just tx->mtu, while ipoib_cm_skb_too_long() > > > is passed an mtu of tx->mtu - IPOIB_ENCAP_LEN? > > > > I actually copied this from datagram code. > > > > What ipoib_cm_skb_too_long does is set the dest MTU metric. > > Right, but this code sets the dest MTU to tx->mtu - IPOIB_ENCAP_LEN if > it sees a packet longer than tx->mtu. Shouldn't the initial test be > against tx->mtu - IPOIB_ENCAP_LEN too? Packets that we get here include IPOIB_ENCAP_LEN, right? But the MTU that we set is for length of datagram, so we subtract IPOIB_ENCAP_LEN. > > So look at this logic in datagram mode: > > if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN) { > > return -EINVAL; > > } > > > > Why is the max MTU set to IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN and > > not to IPOIB_PACKET_SIZE? > > Because the largest MTU we can handle is the biggest message we can > receive minus the 4 byte encapsulation overhead. In other words, if > we can handle 2048 byte UD messages, then we shouldn't allow a > datagram MTU above 2044. The same reasoning applies to ethernet, so > an ethernet interface has an MTU of 1500, because ethernet can send > 1514 byte packets but space has to be left for the 14 bytes of > src/dest/ethertype fields. > > Or am I missing something here? So same here. -- MST From mst at dev.mellanox.co.il Thu Mar 22 10:54:16 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Mar 2007 19:54:16 +0200 Subject: [ofa-general] Re: [PATCH] IB/ipoib: fix thinko in packet length checks In-Reply-To: References: <20070321132015.3FE95E6080F@openfabrics.org> <20070321134505.GA23221@mellanox.co.il> <20070322060458.GC6303@mellanox.co.il> <20070322140727.GC11177@mellanox.co.il> Message-ID: <20070322175416.GD17532@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH] IB/ipoib: fix thinko in packet length checks > > > And by the way, the chunk in the datagram part looks the same. > > Which chunk? Original ipoib code had this: if (unlikely(skb->len > dev->mtu + INFINIBAND_ALEN)) { ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", skb->len, dev->mtu + INFINIBAND_ALEN); ++priv->stats.tx_dropped; ++priv->stats.tx_errors; dev_kfree_skb_any(skb); return; } so length is compared to mtu + encap length here. -- MST From mst at dev.mellanox.co.il Thu Mar 22 10:55:51 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Mar 2007 19:55:51 +0200 Subject: [ewg] RE: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <4602B55A.2090402@ichips.intel.com> References: <000001c76be3$68e00ad0$76248686@amr.corp.intel.com> <46023EA8.5010507@dev.mellanox.co.il> <20070322084324.GE29341@mellanox.co.il> <4602B55A.2090402@ichips.intel.com> Message-ID: <20070322175551.GE17532@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [ewg] RE: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes > > >>I think it is very good that you changed the define to a module > >>parameter, but shouldn't the default value be smaller (21). (15*32 sec = > >>8 Min) seems like quite a long time. > >> > >>Ishai > > > > > >Hmm, indeed. Sean, wouldn't 1 minute be sufficient? > > This really depends on the app. For scale-up testing, we need to include > the time that it takes an app to come down and fetch the connection event, > process it, and respond. When new connections form along with data > transfers, the app may be slow to handle this. > > I have no objections to setting the default lower though. But this is a > max timeout. If the timeout in the path record/CM messages is smaller, > that would be used instead. OK, so can you change the default to lower value in your branch? -- MST From swise at opengridcomputing.com Thu Mar 22 10:54:48 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 22 Mar 2007 12:54:48 -0500 Subject: [ofa-general] bugs to fix for OFED 1.2 RC1 In-Reply-To: References: Message-ID: <1174586088.16862.57.camel@stevo-desktop> I think bug 468 needs to be fixed too. It seems to be either a mvapich2 bug or a librdmacm bug since it happens over both IB and IW. 468 nor P2 Othe sean.hefty at intel.com NEW seg fault when running cpi over IB or iWARP On Thu, 2007-03-22 at 00:56 -0700, Scott Weitzenkamp (sweitzen) wrote: > Here's a list of bugs I would like fixed for RC1 in a week. > > bug_id > assigned_to > component > short_desc > 258 > pasha at mellanox.co.il > MVAPICH > OFED: ppc64 GNU > mpif90 missing > for MVAPICH > 261 > bugzilla at openib.org > IPoIB > can't configure > IPoIB pkey > interfaces at > boot time > 262 > ishai at mellanox.co.il > SRP > can't configure > SRP mounts > from /etc/fstab > 404 > vlad at mellanox.co.il > IPoIB > IPoIB HA starts > with port ib1 up > 418 > mst at dev.mellanox.co.il > IPoIB > IPoIB CM causes > 2020-byte message > IPv4 multicast to > fail > 431 > mst at mellanox.co.il > IPoIB > IPoIB CM locks up > server on > SLES10/RHEL4 > ppc64 > 443 > ishai at mellanox.co.il > SRP > add support for > SRP HA on RHEL4 > 445 > pasha at mellanox.co.il > MVAPICH > OFED 1.2 MVAPICH > won't work on > ppc64 > 455 > vlad at mellanox.co.il > IPoIB > new dmesg output > every time IPoIB > CM HA fails over > 459 > monis at voltaire.com > IPoIB > support > ib-bonding on > RHEL4U4/RHEL5, > put kernel name > in RPM name, and > clean up better > 464 > rolandd at cisco.com > Verbs > release > libibverbs-1.1 > final before OFED > 1.2 > 465 > mst at mellanox.co.il > IPoIB > IPoIB CM HA fails > after several > hours of > failovers > 466 > mst at mellanox.co.il > SDP > sdpnetstat not > getting built on > RHEL5 > 474 > ishai at mellanox.co.il > SRP > OFED srp_daemon > keeps readding > targets with > Cisco FC GW > > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Thu Mar 22 10:58:25 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Mar 2007 10:58:25 -0700 Subject: [ofa-general] Re: [PATCH 2.6.21] iw_cxgb3: Handle build_phys_page_list() failure in iwch_reregister_phys_mem(). In-Reply-To: <1174577900.16862.27.camel@stevo-desktop> (Steve Wise's message of "Thu, 22 Mar 2007 10:38:20 -0500") References: <1174577900.16862.27.camel@stevo-desktop> Message-ID: thanks, applied. From rdreier at cisco.com Thu Mar 22 11:00:06 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Mar 2007 11:00:06 -0700 Subject: [ofa-general] Re: [PATCH] eHCA: Make scaling code also work without CPU hotplug In-Reply-To: <200703221652.13216.fenkes@de.ibm.com> (Joachim Fenkes's message of "Thu, 22 Mar 2007 16:52:13 +0100") References: <200703221652.13216.fenkes@de.ibm.com> Message-ID: thanks, applied. From rdreier at cisco.com Thu Mar 22 11:04:51 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Mar 2007 11:04:51 -0700 Subject: [ofa-general] drivers/infiniband/ulp/ipoib/ipoib_main.c: use-after-free In-Reply-To: <20070319094619.GE8386@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 19 Mar 2007 11:46:19 +0200") References: <20070319092310.GJ752@stusta.de> <20070319094619.GE8386@mellanox.co.il> Message-ID: > Something like this then? Untested. Looks right to me, and seems to work. So I'll apply this. - R. From rdreier at cisco.com Thu Mar 22 11:16:32 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Mar 2007 11:16:32 -0700 Subject: [ofa-general] cxio_hal_init_ctrl_qp() resource leak Message-ID: Hey steve, coverity found this issue: static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p) { int err; u64 sge_cmd, ctx0, ctx1; u64 base_addr; struct t3_modify_qp_wr *wqe; struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_KERNEL); if (!skb) { PDBG("%s alloc_skb failed\n", __FUNCTION__); return -ENOMEM; } >>> skb is allocated here err = cxio_hal_init_ctrl_cq(rdev_p); if (err) { PDBG("%s err %d initializing ctrl_cq\n", __FUNCTION__, err); return err; >>> but leaked here... } rdev_p->ctrl_qp.workq = dma_alloc_coherent( &(rdev_p->rnic_info.pdev->dev), (1 << T3_CTRL_QP_SIZE_LOG2) * sizeof(union t3_wr), &(rdev_p->ctrl_qp.dma_addr), GFP_KERNEL); if (!rdev_p->ctrl_qp.workq) { PDBG("%s dma_alloc_coherent failed\n", __FUNCTION__); return -ENOMEM; >>> ... and leaked here too } From rdreier at cisco.com Thu Mar 22 11:18:40 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Mar 2007 11:18:40 -0700 Subject: [ofa-general] Re: [PATCH] IB/ipoib: fix thinko in packet length checks In-Reply-To: <20070322175132.GC17532@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 22 Mar 2007 19:51:32 +0200") References: <20070321132015.3FE95E6080F@openfabrics.org> <20070321134505.GA23221@mellanox.co.il> <20070322060458.GC6303@mellanox.co.il> <20070322175132.GC17532@mellanox.co.il> Message-ID: > Packets that we get here include IPOIB_ENCAP_LEN, right? > But the MTU that we set is for length of datagram, so we > subtract IPOIB_ENCAP_LEN. OK, I see finally... Thanks From swise at opengridcomputing.com Thu Mar 22 11:36:16 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 22 Mar 2007 13:36:16 -0500 Subject: [ofa-general] Re: cxio_hal_init_ctrl_qp() resource leak In-Reply-To: References: Message-ID: <1174588576.16862.61.camel@stevo-desktop> yup. I'll address this. Thanks! BTW: How do I run coverity myself? On Thu, 2007-03-22 at 11:16 -0700, Roland Dreier wrote: > Hey steve, coverity found this issue: > > static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p) > { > int err; > u64 sge_cmd, ctx0, ctx1; > u64 base_addr; > struct t3_modify_qp_wr *wqe; > struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_KERNEL); > > > if (!skb) { > PDBG("%s alloc_skb failed\n", __FUNCTION__); > return -ENOMEM; > } > > >>> skb is allocated here > > err = cxio_hal_init_ctrl_cq(rdev_p); > if (err) { > PDBG("%s err %d initializing ctrl_cq\n", __FUNCTION__, err); > return err; > >>> but leaked here... > } > rdev_p->ctrl_qp.workq = dma_alloc_coherent( > &(rdev_p->rnic_info.pdev->dev), > (1 << T3_CTRL_QP_SIZE_LOG2) * > sizeof(union t3_wr), > &(rdev_p->ctrl_qp.dma_addr), > GFP_KERNEL); > if (!rdev_p->ctrl_qp.workq) { > PDBG("%s dma_alloc_coherent failed\n", __FUNCTION__); > return -ENOMEM; > >>> ... and leaked here too > } From rdreier at cisco.com Thu Mar 22 12:08:22 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Mar 2007 12:08:22 -0700 Subject: [ofa-general] Re: cxio_hal_init_ctrl_qp() resource leak In-Reply-To: <1174588576.16862.61.camel@stevo-desktop> (Steve Wise's message of "Thu, 22 Mar 2007 13:36:16 -0500") References: <1174588576.16862.61.camel@stevo-desktop> Message-ID: > BTW: How do I run coverity myself? Spend hella money... I just have access to the reports at scan.coverity.com. I think you can ask them for access. - R. From pradeep at us.ibm.com Thu Mar 22 14:35:16 2007 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Thu, 22 Mar 2007 14:35:16 -0700 Subject: [ofa-general] IPOIB CM performance issues Message-ID: While working on the non-SRQ support for IPOIB CM I observed that scatter-gather lists adversely impacts performance (as compared to without it). On the whole, CM mode does improve performance -with or without scatter-gather lists, but we lose a lot of throughput (something like >15%) with sg lists. I looked at the profiles and found that ipoib_cm_alloc_rx_skb() (and the associated alloc_page()) show up far more (> 10X) in the profile with sg lists, than without it. To put this in perspective, upon receipt of a packet we call ipoib_cm_alloc_rx_skb() which in turn ends up calling alloc_page() 16 times (every time!). I believe that is where we are taking a big hit with sg lists. This and the associated sg list processing is what causes the throughput drop. I loked at the e1000 driver to see how they handle this issue and here are a few things that I learnt; which we may try and incorporate as we find suitable: 1. e1000 driver does not use sg lists in all cases 2. e100 driver uses a max of 3 fragments (to handle jumbo frames) 3. e1000 driver uses "copybreak" as a module paramater. For small packets (less than copybreak) they actually go ahead and unsplit the packet. In fact they specifically call out alloc_page() and put_page() as eating up CPU cycles and try to avoid them when feasible. 4. There is decision made (rx_ps_pages) if one one should use packet split or not. This decision is based on several factors like mtu, page size and the like. Can we try and incorporate items 1, 3 and 4 in to the implementation of IPOIB CM? What is the general opinions about this? Should we look at some other drivers? Pradeep pradeep at us.ibm.com From rdreier at cisco.com Thu Mar 22 14:39:16 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 22 Mar 2007 14:39:16 -0700 Subject: [ofa-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get various small post-rc4 fixes: Bryan O'Sullivan (1): IB/ipath: check return value of lookup_one_len Joachim Fenkes (1): IB/ehca: Make scaling code work without CPU hotplug Michael S. Tsirkin (3): IPoIB/cm: Fix reaping of stale connections IPoIB: Fix use-after-free in path_rec_completion() IB/ipoib: Fix thinko in packet length checks Sean Hefty (1): IPoIB: Fix race in detaching from mcast group before attaching Steve Wise (1): RDMA/cxgb3: Handle build_phys_page_list() failure in iwch_reregister_phys_mem() drivers/infiniband/hw/cxgb3/iwch_provider.c | 5 ++++- drivers/infiniband/hw/ehca/ehca_irq.c | 8 ++++++++ drivers/infiniband/hw/ipath/ipath_fs.c | 16 +++++++++++++++- drivers/infiniband/ulp/ipoib/ipoib_cm.c | 4 ++-- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 4 ++-- drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 ++-- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 6 +++--- 7 files changed, 36 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index f2774ae..24e0df0 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -545,11 +545,14 @@ static int iwch_reregister_phys_mem(struct ib_mr *mr, php = to_iwch_pd(pd); if (mr_rereg_mask & IB_MR_REREG_ACCESS) mh.attr.perms = iwch_ib_to_tpt_access(acc); - if (mr_rereg_mask & IB_MR_REREG_TRANS) + if (mr_rereg_mask & IB_MR_REREG_TRANS) { ret = build_phys_page_list(buffer_list, num_phys_buf, iova_start, &total_size, &npages, &shift, &page_list); + if (ret) + return ret; + } ret = iwch_reregister_mem(rhp, php, &mh, shift, page_list, npages); kfree(page_list); diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index 20f36bf..f284be1 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -66,7 +66,9 @@ static void queue_comp_task(struct ehca_cq *__cq); static struct ehca_comp_pool* pool; +#ifdef CONFIG_HOTPLUG_CPU static struct notifier_block comp_pool_callback_nb; +#endif static inline void comp_event_callback(struct ehca_cq *cq) { @@ -733,6 +735,7 @@ static void take_over_work(struct ehca_comp_pool *pool, } +#ifdef CONFIG_HOTPLUG_CPU static int comp_pool_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) @@ -775,6 +778,7 @@ static int comp_pool_callback(struct notifier_block *nfb, return NOTIFY_OK; } +#endif int ehca_create_comp_pool(void) { @@ -805,9 +809,11 @@ int ehca_create_comp_pool(void) } } +#ifdef CONFIG_HOTPLUG_CPU comp_pool_callback_nb.notifier_call = comp_pool_callback; comp_pool_callback_nb.priority =0; register_cpu_notifier(&comp_pool_callback_nb); +#endif printk(KERN_INFO "eHCA scaling code enabled\n"); @@ -821,7 +827,9 @@ void ehca_destroy_comp_pool(void) if (!ehca_scaling_code) return; +#ifdef CONFIG_HOTPLUG_CPU unregister_cpu_notifier(&comp_pool_callback_nb); +#endif for (i = 0; i < NR_CPUS; i++) { if (cpu_online(i)) diff --git a/drivers/infiniband/hw/ipath/ipath_fs.c b/drivers/infiniband/hw/ipath/ipath_fs.c index 5b40a84..ed55979 100644 --- a/drivers/infiniband/hw/ipath/ipath_fs.c +++ b/drivers/infiniband/hw/ipath/ipath_fs.c @@ -451,12 +451,18 @@ bail: return ret; } -static void remove_file(struct dentry *parent, char *name) +static int remove_file(struct dentry *parent, char *name) { struct dentry *tmp; + int ret; tmp = lookup_one_len(name, parent, strlen(name)); + if (IS_ERR(tmp)) { + ret = PTR_ERR(tmp); + goto bail; + } + spin_lock(&dcache_lock); spin_lock(&tmp->d_lock); if (!(d_unhashed(tmp) && tmp->d_inode)) { @@ -469,6 +475,14 @@ static void remove_file(struct dentry *parent, char *name) spin_unlock(&tmp->d_lock); spin_unlock(&dcache_lock); } + + ret = 0; +bail: + /* + * We don't expect clients to care about the return value, but + * it's there if they need it. + */ + return ret; } static int remove_device_files(struct super_block *sb, diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 3484e8b..e70492d 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -452,7 +452,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ skb->len, tx->mtu); ++priv->stats.tx_dropped; ++priv->stats.tx_errors; - ipoib_cm_skb_too_long(dev, skb, tx->mtu - INFINIBAND_ALEN); + ipoib_cm_skb_too_long(dev, skb, tx->mtu - IPOIB_ENCAP_LEN); return; } @@ -1095,7 +1095,7 @@ static void ipoib_cm_stale_task(struct work_struct *work) /* List if sorted by LRU, start from tail, * stop when we see a recently used entry */ p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list); - if (time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT)) + if (time_before_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT)) break; list_del_init(&p->list); spin_unlock_irqrestore(&priv->lock, flags); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index f2aa923..ba0ee5c 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -328,9 +328,9 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_tx_buf *tx_req; u64 addr; - if (unlikely(skb->len > priv->mcast_mtu + INFINIBAND_ALEN)) { + if (unlikely(skb->len > priv->mcast_mtu + IPOIB_ENCAP_LEN)) { ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", - skb->len, priv->mcast_mtu + INFINIBAND_ALEN); + skb->len, priv->mcast_mtu + IPOIB_ENCAP_LEN); ++priv->stats.tx_dropped; ++priv->stats.tx_errors; ipoib_cm_skb_too_long(dev, skb, priv->mcast_mtu); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index f9dbc6f..0741c6d 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -380,7 +380,7 @@ static void path_rec_completion(int status, struct net_device *dev = path->dev; struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_ah *ah = NULL; - struct ipoib_neigh *neigh; + struct ipoib_neigh *neigh, *tn; struct sk_buff_head skqueue; struct sk_buff *skb; unsigned long flags; @@ -418,7 +418,7 @@ static void path_rec_completion(int status, while ((skb = __skb_dequeue(&path->queue))) __skb_queue_tail(&skqueue, skb); - list_for_each_entry(neigh, &path->neigh_list, list) { + list_for_each_entry_safe(neigh, tn, &path->neigh_list, list) { kref_get(&path->ah->ref); neigh->ah = path->ah; memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw, diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index 56c87a8..54fbead 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -644,6 +644,9 @@ static int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast) struct ipoib_dev_priv *priv = netdev_priv(dev); int ret = 0; + if (test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) + ib_sa_free_multicast(mcast->mc); + if (test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) { ipoib_dbg_mcast(priv, "leaving MGID " IPOIB_GID_FMT "\n", IPOIB_GID_ARG(mcast->mcmember.mgid)); @@ -655,9 +658,6 @@ static int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast) ipoib_warn(priv, "ipoib_mcast_detach failed (result = %d)\n", ret); } - if (test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) - ib_sa_free_multicast(mcast->mc); - return 0; } From swise at opengridcomputing.com Thu Mar 22 14:39:57 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 22 Mar 2007 16:39:57 -0500 Subject: [ofa-general] [PATCH 2.6.21] iw_cxgb3: Fix a resource leak in cxio_hal_init_ctrl_qp(). Message-ID: <1174599597.16862.70.camel@stevo-desktop> Fix a resource leak in cxio_hal_init_ctrl_qp(). Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/cxio_hal.c | 12 ++++++++---- 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c index 818cf1a..f5e9aee 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c @@ -498,9 +498,9 @@ static int cxio_hal_init_ctrl_qp(struct u64 sge_cmd, ctx0, ctx1; u64 base_addr; struct t3_modify_qp_wr *wqe; - struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_KERNEL); - + struct sk_buff *skb; + skb = alloc_skb(sizeof(*wqe), GFP_KERNEL); if (!skb) { PDBG("%s alloc_skb failed\n", __FUNCTION__); return -ENOMEM; @@ -508,7 +508,7 @@ static int cxio_hal_init_ctrl_qp(struct err = cxio_hal_init_ctrl_cq(rdev_p); if (err) { PDBG("%s err %d initializing ctrl_cq\n", __FUNCTION__, err); - return err; + goto err; } rdev_p->ctrl_qp.workq = dma_alloc_coherent( &(rdev_p->rnic_info.pdev->dev), @@ -518,7 +518,8 @@ static int cxio_hal_init_ctrl_qp(struct GFP_KERNEL); if (!rdev_p->ctrl_qp.workq) { PDBG("%s dma_alloc_coherent failed\n", __FUNCTION__); - return -ENOMEM; + err = -ENOMEM; + goto err; } pci_unmap_addr_set(&rdev_p->ctrl_qp, mapping, rdev_p->ctrl_qp.dma_addr); @@ -556,6 +557,9 @@ static int cxio_hal_init_ctrl_qp(struct rdev_p->ctrl_qp.workq, 1 << T3_CTRL_QP_SIZE_LOG2); skb->priority = CPL_PRIORITY_CONTROL; return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb)); +err: + kfree_skb(skb); + return err; } static int cxio_hal_destroy_ctrl_qp(struct cxio_rdev *rdev_p) From bogus@does.not.exist.com Thu Mar 22 14:59:40 2007 From: bogus@does.not.exist.com (Frank Powell) Date: Thu, 22 Mar 2007 21:59:40 +0000 (UTC) Subject: [ofa-general] Dear Beloved Message-ID: <20070322215940.EE3511643E6@c1.servage.net> An HTML attachment was scrubbed... URL: From bugzilla-daemon at lists.openfabrics.org Thu Mar 22 16:07:07 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Thu, 22 Mar 2007 16:07:07 -0700 (PDT) Subject: [ofa-general] [Bug 485] creating & deleting a subinterface with a bad pkey crashs the kernel: NULL pointer reference In-Reply-To: Message-ID: <20070322230707.3A19CE603BE@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=485 sean.hefty at intel.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sean.hefty at intel.com -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mshefty at ichips.intel.com Thu Mar 22 16:39:42 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 22 Mar 2007 16:39:42 -0700 Subject: [ewg] RE: [ofa-general] Re: [GIT PULL] OFED 1.2: CM scaling fixes In-Reply-To: <20070322175551.GE17532@mellanox.co.il> References: <000001c76be3$68e00ad0$76248686@amr.corp.intel.com> <46023EA8.5010507@dev.mellanox.co.il> <20070322084324.GE29341@mellanox.co.il> <4602B55A.2090402@ichips.intel.com> <20070322175551.GE17532@mellanox.co.il> Message-ID: <460313BE.4020400@ichips.intel.com> > OK, so can you change the default to lower value in your branch? Done - set to 21 (~8 seconds) From mst at dev.mellanox.co.il Fri Mar 23 02:22:34 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 23 Mar 2007 11:22:34 +0200 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: Message-ID: <20070323092234.GG17532@mellanox.co.il> > Quoting Roland Dreier : > Subject: [GIT PULL] please pull infiniband.git > > Linus, please pull from > > master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus > > This tree is also available from kernel.org mirrors at: > > git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus > > This will get various small post-rc4 fixes: What about mthca QP reset issues? -- MST From vlad at lists.openfabrics.org Fri Mar 23 02:34:33 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Fri, 23 Mar 2007 02:34:33 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070323-0200 daily build status Message-ID: <20070323093434.9C079E6080D@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.14 Passed on powerpc with linux-2.6.17 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.16 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.12 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From halr at voltaire.com Fri Mar 23 06:55:46 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Mar 2007 08:55:46 -0500 Subject: [ofa-general] Re: [PATCH] IB/umad: fix GRH handling In-Reply-To: <1174519319.17678.25309.camel@hal.voltaire.com> References: <000001c76b85$74adfb50$18fd070a@amr.corp.intel.com> <1174519319.17678.25309.camel@hal.voltaire.com> Message-ID: <1174658146.24305.148489.camel@hal.voltaire.com> On Wed, 2007-03-21 at 18:22, Hal Rosenstock wrote: > On Wed, 2007-03-21 at 01:52, Sean Hefty wrote: > > >> Unfortunately, at least opensm cannot respond to SA queries issued from a > > >> remote subnet. I'm not sure how much work this would take to fix, or if > > >> other SAs have this issue. Hal briefly looked at the problems, > > > > > >FWIW, I'll be looking some more at these again. > > > > I think the following patch corrects the GRH handling issues in ib_umad. > > (Tested loading of ib_umad module only, and not against openSM.) > > It can't be tested against OpenSM right now. > > > If this looks right, > > It looks right to me. I'll need some time to take it out for a test > driver as some other issues need some work to exercise this. I exercised this and it works fine. The received GRH information is now seen on the receive side of user MADs. Can this be pushed for OFED 1.2 as well ? -- Hal > -- Hal > > > I'll add it to my rdma-dev.git ib_router branch > > > > Signed-off-by: Sean Hefty > > --- > > diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c > > index c069ebe..7774cf5 100644 > > --- a/drivers/infiniband/core/user_mad.c > > +++ b/drivers/infiniband/core/user_mad.c > > @@ -231,12 +231,17 @@ static void recv_handler(struct ib_mad_agent *agent, > > packet->mad.hdr.path_bits = mad_recv_wc->wc->dlid_path_bits; > > packet->mad.hdr.grh_present = !!(mad_recv_wc->wc->wc_flags & IB_WC_GRH); > > if (packet->mad.hdr.grh_present) { > > - /* XXX parse GRH */ > > - packet->mad.hdr.gid_index = 0; > > - packet->mad.hdr.hop_limit = 0; > > - packet->mad.hdr.traffic_class = 0; > > - memset(packet->mad.hdr.gid, 0, 16); > > - packet->mad.hdr.flow_label = 0; > > + struct ib_ah_attr ah_attr; > > + > > + ib_init_ah_from_wc(agent->device, agent->port_num, > > + mad_recv_wc->wc, mad_recv_wc->recv_buf.grh, > > + &ah_attr); > > + > > + packet->mad.hdr.gid_index = ah_attr.grh.sgid_index; > > + packet->mad.hdr.hop_limit = ah_attr.grh.hop_limit; > > + packet->mad.hdr.traffic_class = ah_attr.grh.traffic_class; > > + memcpy(packet->mad.hdr.gid, &ah_attr.grh.dgid, 16); > > + packet->mad.hdr.flow_label = cpu_to_be32(ah_attr.grh.flow_label); > > } > > > > if (queue_packet(file, agent, packet)) > > @@ -473,6 +478,7 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf, > > if (packet->mad.hdr.grh_present) { > > ah_attr.ah_flags = IB_AH_GRH; > > memcpy(ah_attr.grh.dgid.raw, packet->mad.hdr.gid, 16); > > + ah_attr.grh.sgid_index = packet->mad.hdr.gid_index; > > ah_attr.grh.flow_label = be32_to_cpu(packet->mad.hdr.flow_label); > > ah_attr.grh.hop_limit = packet->mad.hdr.hop_limit; > > ah_attr.grh.traffic_class = packet->mad.hdr.traffic_class; > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sashak at voltaire.com Fri Mar 23 06:42:29 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 23 Mar 2007 15:42:29 +0200 Subject: [ofa-general] Re: [openib-general] [ofw] [Fwd: Re: [Fwd: Re:winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9EBB1F2@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C9EBB1F2@mtlexch01.mtl.com> Message-ID: <20070323134229.GP20990@sashak.voltaire.com> Hi Tzahi, On 18:56 Wed 21 Feb , Tzachi Dar wrote: > > To be on the practical side, I have read the introduction to pthreads in > the past and from what I saw it was relatively easy to implement that on > Win32. I want to look at the functions that were mentioned before in > this thread and see if that is still the case. > > Let me get back to you on this at the beginning of next week. Any news here? Thanks. Sasha From rdreier at cisco.com Fri Mar 23 07:36:08 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 23 Mar 2007 07:36:08 -0700 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: <20070323092234.GG17532@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 23 Mar 2007 11:22:34 +0200") References: <20070323092234.GG17532@mellanox.co.il> Message-ID: > What about mthca QP reset issues? I'm still thinking about synchronizing with the completion EQ's irq. From S.Linev at gsi.de Fri Mar 23 08:08:15 2007 From: S.Linev at gsi.de (Linev Sergei) Date: Fri, 23 Mar 2007 16:08:15 +0100 Subject: [ofa-general] compilation problem on ofed_1_2 Message-ID: <60E9D8CA1AC31048A237499BD73FF9AD01BC03@W2K3MAILSV.gsi.de> Hi I trying to compile OFED 1.2 beta OFED-1.2-20070322-0837.tgz on SuSE9 SP3 x86_64 with 2.6.19 Kernel and Real Time PREEMPT patch. I get two problems. First, in file ofa_user-1.2/src/userspace/sdpnetstat/lib/fdd.c. It has #include "if_fddi.h", and in "linux/if_fddi.h" I saw error message, that type __be16 is not defined. Probably, it is only feature of SuSE9. To fix it, one just should use #include "linux/types.h" right before "netinet/if_fddi.h" include in file fdd.c Second, in file ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c, failed definition SPIN_LOCK_UNLOCKED. Seems to be, "spinlock.h" include is missed in this file. Again, it may be features of our old SuSE9. If I compile ofed without these two components (and also without openmpi, it has some linking problems), I can run basic components (verbs, opensm, IPoIB) without any problem. Sergey Linev P.S. Sorry, that I did not provide log files. Our cluster is off for some time. From pw at osc.edu Fri Mar 23 08:11:27 2007 From: pw at osc.edu (Pete Wyckoff) Date: Fri, 23 Mar 2007 11:11:27 -0400 Subject: [ofa-general] ib_umem_get always wants write access In-Reply-To: References: <20070321175005.GA8123@osc.edu> Message-ID: <20070323151127.GD11480@osc.edu> rdreier at cisco.com wrote on Wed, 21 Mar 2007 11:23 -0700: > > I've wondered about this for a while. In ib_umem_get, there is a > > call to get_user_pages that does the work of virtual to physical > > translation and increasing the ref counts. It is always invoked > > with write == 1, even if cmd.access_flags == 0 (read only > > registration). > > > > This is fine for anonymous private memory, or writeable shared > > memory. But consider pinning a read-only section of memory, such as > > shared read-only data or text segment, or a file mapping of a file > > that was opened O_RDONLY. Having write == 1 there forces a full > > copy of all these pages. > > > > The force argument is explicitly set to 1 only when access_flags > > does not specify write access, giving gup permission to do the > > copy-on-write, essentially. That seems correct, but always setting > > write to 1 has me confused. > > > > Is there some IB semantic reason for forcing the registered pages to > > be writable? > > I'm having a hard time remembering the exact reasoning, but the basic > idea is that we need to allow read-only memory to be registered but we > also need to force allocated but not touched memory to be faulted in. Thanks, I'll try to setup a scenario where read-only memory is not present, then gup with write = 0 and see if it does not do the faulting properly. It's not clear to me now. The performance degradation of COW-ing read-only pages is noticable. -- Pete From mst at dev.mellanox.co.il Fri Mar 23 08:28:22 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 23 Mar 2007 17:28:22 +0200 Subject: [ofa-general] Re: Re: [PATCH] IB/umad: fix GRH handling In-Reply-To: <1174658146.24305.148489.camel@hal.voltaire.com> References: <000001c76b85$74adfb50$18fd070a@amr.corp.intel.com> <1174519319.17678.25309.camel@hal.voltaire.com> <1174658146.24305.148489.camel@hal.voltaire.com> Message-ID: <20070323152822.GH17532@mellanox.co.il> > Quoting Hal Rosenstock : > Subject: Re: Re: [PATCH] IB/umad: fix GRH handling > > On Wed, 2007-03-21 at 18:22, Hal Rosenstock wrote: > > On Wed, 2007-03-21 at 01:52, Sean Hefty wrote: > > > >> Unfortunately, at least opensm cannot respond to SA queries issued from a > > > >> remote subnet. I'm not sure how much work this would take to fix, or if > > > >> other SAs have this issue. Hal briefly looked at the problems, > > > > > > > >FWIW, I'll be looking some more at these again. > > > > > > I think the following patch corrects the GRH handling issues in ib_umad. > > > (Tested loading of ib_umad module only, and not against openSM.) > > > > It can't be tested against OpenSM right now. > > > > > If this looks right, > > > > It looks right to me. I'll need some time to take it out for a test > > driver as some other issues need some work to exercise this. > > I exercised this and it works fine. The received GRH information is now > seen on the receive side of user MADs. > > Can this be pushed for OFED 1.2 as well ? Overall, looks safe. If you want the fix in OFED 1.2, file a bug in the bugzilla. But - is this patch going into 2.6.21? And if not, why does it have to be in OFED 1.2? -- MST From bugzilla-daemon at lists.openfabrics.org Fri Mar 23 08:48:07 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Fri, 23 Mar 2007 08:48:07 -0700 (PDT) Subject: [ofa-general] [Bug 485] creating & deleting a subinterface with a bad pkey crashs the kernel: NULL pointer reference In-Reply-To: Message-ID: <20070323154807.86BFDE603B1@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=485 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sweitzen at cisco.com -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mshefty at ichips.intel.com Fri Mar 23 09:41:47 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 23 Mar 2007 09:41:47 -0700 Subject: [ofa-general] Re: Re: [PATCH] IB/umad: fix GRH handling In-Reply-To: <20070323152822.GH17532@mellanox.co.il> References: <000001c76b85$74adfb50$18fd070a@amr.corp.intel.com> <1174519319.17678.25309.camel@hal.voltaire.com> <1174658146.24305.148489.camel@hal.voltaire.com> <20070323152822.GH17532@mellanox.co.il> Message-ID: <4604034B.6030507@ichips.intel.com> > Overall, looks safe. > If you want the fix in OFED 1.2, file a bug in the bugzilla. I've made one adjustment to the original patch to set the hop_limit to 0xff on receives. The updated patch is in the ib_router branch of my git tree. > But - is this patch going into 2.6.21? And if not, why does > it have to be in OFED 1.2? My intent was to queue it for 2.6.22. - Sean From troy at scl.ameslab.gov Fri Mar 23 10:30:55 2007 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Fri, 23 Mar 2007 12:30:55 -0500 Subject: [ofa-general] interesting ibv_reg_mr failures Message-ID: <77F22E47-0C9B-4EF6-A1A9-902B3E6C566D@scl.ameslab.gov> We have been getting some interesting failures with ibv_reg_mr.. The max_mr number is on the order of 100K regions, yet we are only able to register under 2500 regions on a mellanox card, and around 4800 regions on an ehca. A testcase is available here: http://source.scl.ameslab.gov/hg/ibv-mr-test?f=b117b624511e;file=mr- test.c raw wget-able form: http://source.scl.ameslab.gov/hg/ibv-mr-test?f=b117b624511e;file=mr- test.c;style=raw Here's what the output looks like: gcc -ggdb -libverbs -o mr-test mr-test.c /usr/src/ibv-mr-test/mr-test mr-test: bufsize 1048576 device # 0 name="mthca0" guid="00066a0098000464" ibv_open_device() context=0x10012c98 ibv_alloc_pd() pd=0x10013678 alloc: 2482 ibv_reg_mr failed:: Cannot allocate memory fw_ver: 3.3.2 max_mr_size 0xffffffffffffffff max_mr: 131056, could only register 2482 regions sleep 5 sec free: 0 done device # 1 name="ehca0" guid="000255000001c900" ibv_open_device() context=0x10012c98 ibv_alloc_pd() pd=0x10012080 alloc: 3067 free: 0 done with a 10MB buffer: gcc -ggdb -libverbs -o mr-test mr-test.c /usr/src/ibv-mr-test/mr-test mr-test: bufsize 10485760 device # 0 name="mthca0" guid="00066a0098000464" ibv_open_device() context=0x10012c98 ibv_alloc_pd() pd=0x10013678 alloc: 2482 ibv_reg_mr failed:: Cannot allocate memory fw_ver: 3.3.2 max_mr_size 0xffffffffffffffff max_mr: 131056, could only register 2482 regions sleep 5 sec free: 0 done device # 1 name="ehca0" guid="000255000001c900" ibv_open_device() context=0x10012c98 ibv_alloc_pd() pd=0x10012080 alloc: 4119 PID264f ehca0 EHCA_ERR:ehcau_reg_mr ibv_cmd_reg_mr ret=c alloc: 4120 ibv_reg_mr failed:: Cannot allocate memory fw_ver: max_mr_size 0x200000000 max_mr: 61382, could only register 4120 regions sleep 5 sec free: 0 done And, on an PCI-express mellanox hca: /afs/scl.ameslab.gov/user/troy/src/ibv-mr-test/mr-test mr-test: bufsize 10485760 device # 0 name="mthca0" guid="0002c9020040272c" ibv_open_device() context=0x504c00 ibv_alloc_pd() pd=0x503f30 alloc: 12277 ibv_reg_mr failed:: Cannot allocate memory fw_ver: 5.1.0 max_mr_size 0xffffffffffffffff max_mr: 131056, could only register 12277 regions sleep 5 sec free: 0 done On the pci-express hca, it also looks like the memory usage, as reported by "free" goes down by about 300MB once all these regions are allocated.. but the process usage as reported by top is only 20mb total virtual size. What's going on here? From Douglas.Fuller at asu.edu Fri Mar 23 10:41:36 2007 From: Douglas.Fuller at asu.edu (Douglas Fuller) Date: Fri, 23 Mar 2007 10:41:36 -0700 Subject: [ofa-general] osm error messages In-Reply-To: <1174506796.17678.11941.camel@hal.voltaire.com> Message-ID: On /21/07 2:53 PM, "Hal Rosenstock" halr at voltaire.cm> wrote: > On Wed, 200-03-21 at 13:29, Douglas Fulle wrote: >> I'm seeing some sporadic error activity from OpnSM (FED 1.1; osm.log>> below) that ay correlate with some ob failures -- I'mtrying to get to the >> bottom of this. >> >> efore seeing this, I isolatedand disabled with ibortstat what ppeared >> to be a ba intenal port n one of our core switches. That leads me to >> suspectI have a switchmisbehaving somwhere. >> >> ithout any other ntervention, things seem to check out (wth >> ibdiagnet/ibchecknet). An thought? Need any more nformatin? > > Is something bouncingyour subnet or was this just what ibporttte did > ? It could be if this was a coreswitch. Nothing should be. The same thing appears to happen onceevery couple days -- it is very difficult to correlate wth anything. > Also, you may have someSMAs which have gone nonresponsive to SMPs > (IB_TIMEOUs) but the links are up. I can't be surenot knowng what the > exact scenario was. If you do, you will like want to chase these and do > something abot them if you haven't already. Hmm, what could causethat? All my hosts are responsive whenever I check (though it hasn't been during one of these stors of activity). > All the messages reltin to ACTIVE-> ACTIVE transition can be ignored. > > Also, it looks likesomething i removing characters n the log. Yeah, there are characters missing in the whole message. rious. Thans again, --Doug > > -- Hal> >> Thanks,>> --Doug Fuller >> >> Ma 19 18:8:50 000354 [AB000160] -> OpenSM ev:openib-2.0.5 OpnIBsvn >> Exported evision >> Mar19 18:28:0 000466 [AB00160] -> OpenSM Rev:openib-2.0.5 OnB svn > Exporte revision >> Mar 19 18:28:50 007666 [AB000160] -> om_vendor_bind: Binding to port >> 0xad0000024bb >> Mr 19 18:28:50 011279 [AB00160 ->osm_vendo_bind: Binding to port >> 05ad0000024bbb >> Mar 19 18:2850 438326 [44606960] -> Entering MASTER stt >> Mar 19 18:28:5 438628 [4606960] > osm_report_notice: Reporting Geneic >> Notice type:3num:66 from LID:0x0000 >> GID:0xfe8000000000000,0x0005ad000024bbb >> Mar 19 1828:50438661 [4460660] -> sm_report_notice: Reorting Geneic >> Notie type:3 num:6 from LID:0x0000 >> GID:0xf8000000000000,0x0005ad0000024bbb >> Mar 1 18:28:50 50476 41401960] -> osm_cat_mgr_process: Min Hop Tabes >> onfigured on all witches >> Mar 19 18:2850 639453 [44606960] -> SUNET UP >> Mar 19 18:28:50 853613 1E02960 -> __osm_traprcv_process_reqest >> Rceived Generic Notice type:0x04 num:144 Poducer:1 from LID:0x0092 >> TID:0x00000000000018 >> Mar 19 1828:5 853813 [4E0960] > osm_report_notice: Reporting Generic > Notice typ:4 num:144 from LID:0x0092 >>GID:0xfe8000000000000,0x0005ad0000024bb >> Mar 1 18:28:51 273470 [4460960] -> osm_ucast_gr_process: MinHopTables >> configured on all switches >> Mar 19 18:28:51 33730 [43C05960] -> SUBNET UP >> Ma 19 18:3033 565682 [4320490] -> __osm_trap_rcv_process_requst: >> Received Generic Notice type:0x1 um:128 Poducer:2 from LID:0x0001 >> TID:0x000000000000019 >> Mar 19 18:30:33 565958 [4320496] -> sm_reprt_notice: Reporting Generic >> Noicetype:1 num:128 from ID:0x0001 >> GID:0xfe80000000000000,0x005d0000027c6 >> Mar 19 18:30:33 963901 [41401960] > osm_rport_notice: Reporting Generic >> Noticetyp:3 num:64 fro LID:x0092 >> GID:0xfe80000000000000,0x05ad0000024bbb >> Mar 19 18:30:33 963914 4140196] -> Discovered nw port with >> GUID0x0005ad00000297b LI range [0x3,0x37] of node:saguro-14-9 HCA-1 >>Mar 19 18:30:33 994698 [4401960] > om_ucast_mgr_procss: Min Hp Tables >> configured n all switches >> Mar 19 18:30:34 054763 [45A08960]-> UBNET UP >> Ma 1 18:30:34 351397 43C060] -> __osm_tra_rcv_process_request: >> Received Gneri Nice type:0x04 num:144 Producer:1 fomLID:0x0037 >> TID:0x00000000000000 > Mar 19 18:30:4 351615 [4C05960] -> osm_report_notice Reportig Genric >> otice type: num:144 from LID0x0037 >> GID:0xfe80000000000000,0x0005ad000497b >> ar 19 18:30:34 777488 [45A0896 > osm_ucast_mgr_process:Min Hop Tables > configured onall switces >> Ma 1 18:30:34 832664 [4A08960] -> SUBNE UP >> Ma 19 18:32:27 476136 [45A0890] -> _osm_trap_cv_process_reqest: >> Received Generic Notice typ:0x01 um:128Producer:2 from LID:0x018 >> TID:0x00000000000002b >> Mar 19 18:3:27 476340 [43204960] ->__osm_trap_cv_process_request: >> Reeivd Gneric Noti type:0x01 num:128 Poducer:2 from LID:0x001B > TID:0x000000000000037 >> Mar 19 18:32: 476389[45A08960] -> osm_reort_notice: Reporting Generic >> Notice type: num:128 from LID:0x0148 >>GID0xfe800000000000,0x0005ad00000281b3 >> Mar 19 18::27 47485 [4320460] -> osm_report_ntice: Reportig Generic >> Notice tye:1 num:128 from ID0x001B >> GID0xfe8000000000000,0x0005ad0000081a7Mar 9 18:32:27 817617 [42803960] - >> osm_reportnotie: Reporting eneric >> Notice type: nm:65 frm LID:0x002 >> GID:0xfe80000000000000,0x05ad000024bbb >> Mr 19 18:32:27 817637 [4280396] -> Remove port wth >> GUID:0x0005ad0000024e0b LID rane [0xB3,xB3] of nodesaguaro-23-4 C- >> ar19 18:32:27 817655[42803960] -> sm_report_notice: Reporting Generc >> Notice type:3 num:65from LID:0x092 >> GID:0fe800000000000000005ad0000024bbb >> Mar 9 18:32:27 8176 [42803960] -> emove port with >> GUID:0x0005ad000002510b LID range [0xB5,0B5] of node:saguaro-23-6 HCA- >> Mar 1 18::2 817694 [42803960] -> osreport_notice Reporting Generic >> Ntice type: num:65 from LID:0x0092 >> GID:0xfe80000000000000,00005d000024bbb >> Mar 19 18:3:781769 [4280360] -> Rmoved port wth >> GUID0x0005ad000002511b LID range 0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 19 18:322717716 4280390] -> osm_report_ntice: ReprtingGneric >> Notice type:3 nm:65 from LID:0092 >> GID:0xfe80000000000000,0x0005ad000002bbb >> Mar 1918:32:27 81771 [42803960] - Remved port with > GUID0x0005a0000024b7 LID range [0xAF,0xAF] of node:sguaro-23-0 HCA-1 >> Mar 19 18:3:27 817738 [4280390] - osm_reportnotice: ReportingGeneric >> otic type:3 num:65 from LID:0x0092 >> GD:0xf800000000000000x0005ad0000024bbb >>Mar 19 18:32:27 817743 4203960] -> Removed ort with >> GUID:0x000ad000025043 LI range [0xB4,0xB4] of node:sguaro-35 HA-1 >> Mar 19 18:32:27 817758 [42803960] - osm_report_notce: ReporinGeneric >> Notice type:3 num65 from LID:0x009 >> GID:0xfe8000000000000,00005ad0000024bbb >> Mar 19 1:32:27 817763 [2803960 -> Remoed port wih >> GUID:0x0005ad000024d7 LID rane [0xB6,0xB6] of node:saguar-23-7 HCA-1 > Mar 19 18:32:27 817780 [42803960] -> osmport_notice: Reporting Genric >> Notce type:3 nu:65 fromLID:0x009 >> ID:0xfe80000000000000,0x0005ad0000024bb >> Mar 1 18:32:27 17785 [42803960] -> Remved port with >> UID0x0005ad0000024d6bLID range [0xB8,0xB8] node:saguao-23-9 HCA-1 >> Mar 19 18:32:27 817803 [48036] -> osm_report_notce: Reporting Generic >> Notice tye:3 num65from LD:0x0092 >> GID:0xfe8000000000000,0x0005ad000024bbb >> Mar 19 8:3227 817808 [4283960] -> Removed porwith >> GUID:0x0005ad000004977 LID rane 0xA9,0xA9] of node:saguro-224 HCA1 >> Mar 19 18:32:27 817932 [4803960] -> osm_report_notice: Rporting Generic >> Ntice type:3 num:65 from LID:x009 > GID:0xfe80000000000000,0x000ad0024bbb >> Mar 19 18:32:27 817938 [428090] -> Removed port with >> GD:0x0005d0000027c84 LID range [0x1,0152] of node:Topspin Switch TS20 >> M 19 18:32:27 817970 [4280390] -> osm_report_notice: Reporing Generic>> Notice type:3 num:65 from LID:0x0092 >> GID:fe8000000000000,0x0005ad000024bbb >> Ma 19 18:32:27 817977 [4280360 > Removd port with>> GUID0x0005ad0000024d8b LID range [0xB,0xB7] of nde:aguaro-23-8 HCA-1Mar 19 >>1:32:27 81792 [42803960] -> osm_report_ntice: eporting Generic >> Notic tye:3 num:65 from LID:0x0092 >> GID:0xfe800000000000000x005ad000004bbb >> Mar 19 8:32:27 81797 [42803960 -> Removed por with >> GID:0x0005ad00000249f ID range[0xA8,0xA8] of node:saguaro-22-3 HCA- >> Ma 19 18:32:27 81811 [42803960] > osm_report_notice Reportin Generic >> Notice type:3 num:6 from LID:0x0092 >> GID:0xe80000000000000,0x0005ad0000024b > Mar 9 18:32:27 818016 [42803960] - emoved port with > UID:x0005ad000004c9b LID range [0xA7,0xA7] of node:saguaro-2-2 HCA-1 > Mar 1 18:32:7 818032 [42803960] -> osm_report_ntice: Reporing Generic >> Notice tpe:3 num:65 from LD:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb>> a 19 18:3:7 818037 [42803960] -> Rmoved port wit >> GUID:0x0005ad00004da7 LD range[0xB0,0xB0]of node:saguaro-23-1 HCA-1 >> Mar 19 1:32:27 818054 [428060] -> osm_repot_notice: ReportingGeneric >> Notice typ:3 num:65rom LID:0x0092 >> GID:0xfe80000000000000,x0005ad0000024bbb >> Mar19 18:3:27 818115 [428060] -> Remoed port wth >> GUID:0x0005ad000024cbb LID range [B2,0xB2] of ode:saguar-23-3 HCA-1 >> Mar 1 18:3:27 81812 [42803960] ->osm_report_otice: Reorting Generic >> Notie tpe:3 num:65 from LID:0x092 >> GID:0xfe8000000000000,0x005ad000002bbb >> Mar1918:32:27 818137 [803960] -> Removedport ith >> GUID:0x0005ad00000249d3 LID range[xB1,0B1] of node:saguaro-23-2 HCA-1>> Mar 19 8:322 818153 [4280360] -> om_report_notice: Reporting Gneric >> Notice typ:3 num:65 from LID:0x0092 >> GID0xfe800000000000,0x0005d0000024bbb >> Mar 19 1832:7 818158 [42803960]->Removedpot with >> GUID:0x0005ad00002feb LID range [0x153,0x153] of noesaguaro-2-5 HCA-1 >> Mar 19 8:32:7818173 [42803960] -> osm_report_notic Reporting Geeric >> Notice typ:3 num:65 from LID:0x092 >> GID:0xfe8000000000000,0x0005ad000024bbb >> ar 19 18:322 818178 [42803960 -> Removed prt wih > GUID:0x0005ad0000024afb LID rng [0xA5,0xA5] of node:saguaro-22-0 HC-1 >> Mar 19 183227 85129[42803960] -> osm_ucast_mgr_procss: Min Hop Tbls >> conigured on all sitches >> Mar 19 18:32:2898524 [43204960] -> SBNT UP >> Mar 19 8:32:2828664 [4507960] - osm_ucast_mgr_proessMin Hop Tables > confiured on all switchs >> Mar 19 18:32:28 34169 [4466960] -> UNET UP >> Mar 191:33:21814615 [41E0296] -> __osm_trap_cv_process_request: >> Rceied Gneric Notice type:0x01 num:128 Produe2 from LID:0x0148 >> TID:00000000000002c >> Mar 19 8:33:21 81484 4E02960]->osm_report_otice: Repoting eneri >> Notie type:1 num:128 fom LID:0x0148 >> GID:0xfe800000000000,0x005ad0000281b3 > Mar19 18:3321 820835 [41E02960] -> __osm_tap_rcv_proces_request: >> Recived Genei Notc type:001 num:128 Producer:2 fro LI:0x01B>> TID:0x000000000000008 >> Ma 19 1833:2 82090 [41E02960] -> smreort_notice Repoting Generic >> Notice tye:1 num128 fromLD:0x001B >> GD:xfe80000000000000,0x0005ad0000281a7 >> ar 19 18:33:21 82038 [41E02960] -> __s_rap_rcv_processrequest: >> Received Geeri Notce tpe:0x num:128 Producer: from LID:00148 >> TID0x0000000000002d > Mar 19 18:33:21 820992 [E060] -> osm_report_noticeReporting Gnric >> Notice type:1 num:128fom LID:00148 >> GID:0xfe8000000000000,0x005ad00000281b3 >> Mar 19 18:3:21 826779 [402960] -> __osm_trap_rcv_process_rqest:>> Receivd Gneric Notice type0x0 num:18 Prducr:2 from LID:0x001B >>TID0x00000000000039 >> Mar 19 1833:21 82742 [41E02960] - om_report_notice: Reporting Generc >> Ntice tye:1 num:18 from LID:0x001B >> GID:0xfe800000000000,0x0005a00000281a7Mar 19 18:3:22 164580 [45007960 > >> osm_drop_mgr_pocess: ERR 0108: Ukown >> remote side fo nde0x0005a0000027c84 port 18. Addi to liht weep >> samplin list >> Mar 191:3:22 164654 [45007960] -> Direced PathDump of 5 hop path: >> Path= [0][1][11][1][5][17] >> Mar19 18:33:22 164712 [4500796] -> osm_op_mgrprocess: ERR 0108: Unknow >> reote sde for node 0x0005ad00000281b3 port2 Adding to ligt swep >> ampling lit >> Mar 19 18:33:264724 4500760] > irected Path Dump of hop path: >> Path = [0][1[11][1][5] >> Mar 9 18:33:22 13634 4C0960] ->osm_report_notic: Reporing Generic >> Notice type:3 num:64 fromD0x0092 >> GD:0xfe80000000000000,0x005ad0000024bbb > Mar 19 18:33:22 17365[43C0560] - iscovered e ort with >> GUD:0x0005d000027c4 LIDrange [0x152,0x152] of node:Toppin Switch TS120 >> Mar 19 18:3:2217365 [43C05960] -> os_reortnotice:RepotingGeneric > Notice type:3 num64from LID:0x0092 >>GID:0xfe8000000000000,0x05ad0000024bbb >> Mar 19 18:33:22 173662 [43C05960] -> icoerd nepot with >> UID:0x005ad0000024b27 I rage[0xAF,0xAF] o ode:saguaro-23- HCA-1 >> Mar 19 18:33:22 17366 [43C0590] -> osm_report_notice: Reprting Gric >> otice type num:64 from LID:00092 >> GI0xfe8000000000000,0x0005ad0000024bbbMar19 18:33:22 173671[43C05960] -> >> Discovered new por with >> GUID:0x5ad000024da7 LID rage 0xB00xB0] ofnode:auar-23-1 HCA-1 >> Mar 19 18:33:22 173675 [43C0596] -> osm_eport_notice: Rerting Generic >> Notice typ:3num:4 fr LID:0x0092 >> GI:0xfe800000000000,0x005ad0000024bbb >> Mar 9 18:33:22173680 [43C05960] -> Discoveed new port >> withGUD:0x005ad00000249d3 LDrange [B1,0xB1] of node:saguaro-2-2 HC1 >> Mar 918:33:22 173684[43C05960] ->osm_rport_notice Reporting Generic >> Notice type:3 num:6 fo LI:0x002 > GID:0xfe000000000000,0x005ad000002bb > Mar 19 18:33:22 173689 [43C05960] -> Dicovered new port with >> GUI:0x0005ad00024cbb LID range [0xB,xB] f node:saguaro-233 HCA-1 >> Mar 19 1:33:22 173693 [3C0960 ->osm_report_ntice: Rporting Geric >> Notice type:3 num:64 fro LID:0x0092 >> GID:0xfe00000000000,0x005d000004bbb>> Mar 19 1:33:22 173697 43C0596] -> Discoverednew port with >> GUID:0x05ad000024e0b LI rnge [0xB3,0xB3] of nod:aguaro23-4 HCA-1 >> Mar 19 1833:22 73701 [43C5960] -> os_reportotice: Repoting Genric >> Notce type:3 nu:4 from LID:0x0092 >> GID:0xfe800000000000000x005d0000024bb >> Mr 19 1833:22 17706[43C05960] -> Dscovered new port with > GUID:0x0005ad00025043 LID ange [0xB4,xB4] ofnode:saguaro-23-5HA-1 >> Mar 1 18:33:2 173710 43C05960] -> osm_reo_notice: Reprting Generic >> Notice tye:3 m:64 from ID:0x0092 >> ID:0xfe8000000000000x0005ad000002bbb >> Ma 1918:3:22 173715 [43C0596] ->Discverednw port with >> UID:0x0005ad00002510b LID range [0xB5,0xB5 of nde:saguaro-23-6 HCA1 >> Mar 9 18:3:2 173719 [3C0590] -> osm_report_ntic Reportin Generic > Notice type:3 nm:64 from LID:0x0092 >>GID0xfe800000000000,0x0005ad0000024bbb > Mar 1918:33:22 1723 [43C05960]-> Disoveed new ort wth >> GUID:0x0005d000002447 LID rane [0xB6,0xB6] of node:saguaro-23-7HCA-1Mar 1 >> 18:33:22 17327 43C05960] ->osm_rert_notie: Reporting Generic >> Notice type:3 num:64 fom LID:0x0092 >> GI:0fe80000000000000,00005ad000004bbb > Ma 9 18:33:22 1733 [43C05960] - Discoverednew port wit >> GUID:0x0005d000024d8bLID range [0B7,0xB7] of node:saguro-23-8 HCA-1 >> Mr 19 18:33:22 173736 [4C05960] -> os_repr_notice:Reporting Generic >> Notic type: num:64 from LID:0x0092 >> GID:0f80000000000000,0x0005d0000024bbb >> Ma 19 18:33:22 173741 43C5960] ->Dscovred ne portwith >> GID:0x0005ad0000024d6b LI range 0xB8,0xB8] ofnode:sguao-23-9 HCA-1 >> Mar19 1833:22 173744 [43C0960] -> osm_report_notice: Reorting Generic > otice typ:3 num:64 from LID:0x0092 > GI:0xfe800000000000,0x0005a0000024bbb >> Mar 19 18:3:2 173749 [430960] -> Discovered new prt with >> GUI:00005d000024afb LIDrnge [0xA5,0xA5] of noe:sagaro-22-0 HCA-1 >> Mar 19 833:22 13753 [43C0960] -> om_repor_notice: Reporing Generic >> Noticetype: num:64 rom LID:0x002 >> GID:0xfe800000000000,0x0005ad00004bbb >> Mar 19 18:3:22 17758 [43C0596] -> Dicovere nw portith >> GUID:00005ad000002511 LID rang [0x6,0A6] f node:saguao-22-1 HCA-1 >> Mar 1 18:33:22173762 [43C05960] - osm_reort_notice: Reortin Geneic >> Ntice type:3 num:64 from LID:0x0092 >> GI:xfe80000000000000,0x005ad000024bbb >> Mar 19 18:33:22 17376 [C0960] -> Dscovered new port >> wihGUID:x0005ad00024c9b LID range [xA7,0xA7] of node:saguro-222 HCA-1 >> Mar 19 8:33:22 173770 [4C05960] -> osm_report_noice Reprting Gneric >> Notie type:3 nm:64 from LD:0x0092 >> GID:0xfe8000000000000,0x005a000024bbb >> Mar 1918:33:22 173830 43C0590]-> Discovered new port with>> ID:0x0005d000002498f LID range [0xA,08]of node:sguaro-22-3 HCA-1 >> Mar 18:33:22 173834 [43C05960] -> osm_eportnotice: Reportin Geneic >> Notice ype:3num:64 from LID0x0092 >> GD:0xfe8000000000000,0x0005ad000024bb >> Ma 1 18:33:22173839 [43C05960] -> Discovered new port ith >> GUI:0x005ad0000024977LID range [xA,0A9 of node:saguaro-22-4 HA1 >> Mar 19 18:33:22 173843 [3C05960] ->osm_report_notice: Reporting >> GenericNotice ype: num:64 from LID:00092 >> GID:0xfe00000000000,0x0005ad0000024bbb>> Mar 9 :33:22 173848 [43C05960] -> Discovered new port with >> GUD:00005ad0000024feb LID range [0x153,x1of node:sagaro-22-5 HCA-1 >> Mar 19 18:33:22 204620 [43C05960] -> osm_cast_mgr_process: Min Hop >> Tablescnfgued on all switches >> Mar 19 18:33:22 278567 [45A0896] -> SUNET UP >> Mar19 18:33:22 664286 [141960] -> osm_ucast_mgr_process: Min Hop Table >> configured on all switces >> Mr 19 1833:22 734270 [45007960 -> SUBNET UP >> Mar 19 1833:37 650358 [41401960] -> __osm_trap_cv_process_request >> Rceived Geneic Noice type:0x01 num:128 Producer:2 from LID:0x0152 > TID0x0000000000000000 >> Mar 19 18:33:3 65058 [41401960] -> os_report_notice Reporting Generic >> Noticetype:1 num:28 from LID:0x0152GID:0xfe800000000000,0x005ad0000027c84 >> Mar 19 18:33:37 927263 [45A08960] -> __osm_rap_rcv_procs_request: >> Received Generic otice tye:0x01 num:128 Producer:2 fro LID0x0152 >> TID:0x000000000000001 >> Mar 1 18:33:37 927420 [45A090] -> osm_report_notice: eportig Geeric >> Notice type:1 num:128from LID:0x0152 >> GD:xfe8000000000000,0x0005ad0000027c84 >> Mar 19 18:3:37 95572 [4A0896] -> __osm_trap_rcv_process_rquest: >> Received Generic Notice type001 num:128Produce:2 from LID:0152 >> TID:0x00000000000002 >> Mar 1918:3:37 955657 [45A08960] -> osreprt_notice: Reporting Generic >> Noticetype:1num:128 from LD:0x012 >> GID:0xfe800000000000,0x0005ad000002c84 >> Mar 1 18:33:37 97718 [44606960] -> _osm_tap_rcvprocess_request: >> Receivd Generic Notice type:0x01 nu:28 Produr2 from LID:0x0152 >> TID:000000000000003 >> Mar 19 18:33:3 97740 [44606960] ->osm_report_notice: poring Geneic >> Notice type:1 num:128 f LID:0x0152 >> GID:0xfe800000000000,0x0005d0000027c84 >> Mr 19 18:3337 999319 [41E02960] -> __osm_trap_rc_process_rquest: >> Receied Gneri Notice type0x01 num:128 Producer:2 rom LID:0x052 >>TID:0x000000000000004 >> Mar 19 18:33:37 99447 [41E02960] > sm_report_notice: Reporting eneric >> otice type1 num:128 from LID:x152GID:xfe800000000000000x000ad000002784 >> Mar 19 18:33:38 045171 [4606960] -> __osm_trap_rcvprcess_request: >> Received Gneric Notce type:0x0 num:128 Producer:2 frm LID:00152 >> TID:0x00000000000005 >> Mar 9 18338 045271 [44606960] -> osm_report_notice: Reporting Generic >> Ntie ype:1 num:18 from LID:0x05 >> GID:0xfe800000000000000x0005ad00002784 >> Mar 19 18:33:38 06305 [432060] -> __osm_trap_rcv_process_request: > Received eneric Notice typ:0x01 nu:128 Producer:2from ID:0x052 >> TID:0x000000000000006 >> Mar 1918:33 063102 [43204960] -> osm_reprt_notice: porting Generic>> Notice type:1 num:128 from LID:0x0152ID:0xfe8000000000000,0x0005a0000027c4 >> ar 9 18:3338 182624 [803960] -> __osm_trap_rcv_process_request: >> Receved Generic Notice typ:0x01num:12 Produr2from LID:0x0152 >> TID:0x000000000000007 >> Mar 19 18:3338 18720 [4280360 -> osm_reprt_notice: Reporting Geeric >> Notice typ:1 num:128 from LID:0x05 >> GID:xfe800000000000,0x0005ad000007c84 >> Mr 19 18:33:38 19435 [44606960] -> __osm_trap_rc_process_request >> Reeived Genric Notice tpe:0x0 num:18 Prducer:2 from ID:0x012 >> TID:x0000000000000008 >> Mar 1918:33:38 194209[44606960] -> osm_reportnotice Reorting Generc >> Notic type:1 num:28 from LID:0x152 > GID:0xfe000000000000,0x0005ad000007c4 >> Mar 1 18:33:38 379421 [43C05960] -> _om_trap_rcv_processrequest: >> Receive Generi Notice type:x01 num:12Producr:2 fromLID:0x0152 >> TID:0x00000000000009 >> Mar 19 18:33:38 37959 [4305960] -> osm_report_tice: Reporting eneric >> Ntice type:1 num:128 rom LID:0x0152 >> GD:0xfe80000000000,0x005ad0000027c84 >> Mar 19 1833:3 07685 [41401960] -> __osm_trap_cvrocss_request:Received >> GenericNotie type:x01 num:128 Producer:2from LID:0x0152 >> TID:0x0000000000000a >> Mar 19 18:33:38 47758 [4140190] -> os_report_notice: eprting Generic >> Notice typ:1 num:128 rom LID:0x0152 >> GID:0xfe8000000000000,0000ad0000027c84 >> Mar 1 18:33:8 429658 [4A08960] -> __m_trap_rcv_pocess_request: >> Received enric Ntice type:001 num:12 Producer:2 fm LID0x0152 >> TID:0x000000000000000bMa 19 8:33:8 429700 [45A08960] > >> __osm_traprcv_process_reqest: ERR >> 3804: Received trap 11 ties ecutiveyMar 19 18:33:38 544177 [45007960] - >> __osm_trap_rcv_process_reest: >> eceived Generic Notice tpe:0x0 num128 Podcer:2 from LID:0x152 >> ID:0x000000000000000c >> Mar 18:3338 544221 [4507960] -> __osm_trp_rvprocess_requst: ERR >> 304: Received trap12 times consecutiely >> Mar 1918:33:8 545235 [4280960] ->osm_repot_ntic:Rporting Generic >> Notice type:3 num:65 from LI:0x0092 >> GID:0xfe80000000000,0x0005ad000024b >> Mar 9 18:3338 54247 [42803960] -> Removed por with >> GUID:0x0005ad00024b27 ID range 0xAF,0xAF] f node:sauaro-23-0 HC-1 >> Mar 19 18:33:3 545278 [42803960] -> osmeort_notice: Reporing Generc >> Noticetype3 num:65 from LD:0x0092 >> G:xfe8000000000000,0x0005ad0000024bbb >> Mar 19 18:33:38 54586 [428030] > Removd port with >> GUI:0x0005a000024da7 LID range [0xB,0x0] of node:sauao-23-1 HC-1 >> Mar 19 1:3:38 545312 42803960] ->osmreport_noice: eporting Generic >> Notice ype: num:65 from LID:00092 >> GID:0xfe800000000000,0x0005ad0000024bb >> Mar 19 8:33:38 54318 [2803960] -> Reoved portwth >> GUID:0x0005ad00000249d3 LD rang [xB10B1] of node:saguaro-23-2 HA-1 >> Mar19 8:3:38 580005 [42803960 -> osm_ucast_mr_process: in Hop Tabes>> configured on all swiches>> Mar 19 18:3:38 66849[43C0590] -> SUBNET UP >> Mar 19 18:33:38 68520[45A08960] -> __om_tra_rcv_process_reques: >>ReceivedGeneri Notice tpe:x01 num:128 Producer:2 from LID:0x015 >> D:x00000000000000 >> Mar 19 18:33:38 48616 [45A08960] -> _osm_trap_cv_process_requet: ERR >> 3804: ceied trap 13 tmes onsecutiely >> Mar 19 183338 676891[41E0260 -> __osm_trap_rcv_rocess_request: >> Recived Genei Notice tpe:0x01 num:128 Producer:2 fo LID:0x152 > TID:0x000000000000e >> Mar 19 18:33:38 67670 [4102960] -> __osm_rap_rcv_proces_requst: ERR >> 3804: Reived trap 14 tes cosecutively >> Mar 19 18:33:38 698797 [446096] ->__osm_trap_rcv_pcessrequest: >> Receved Geneic Notice typ:0x1 num:128 Producer:2frm LD:0x0152 >> TD:0x00000000000000f >> Mr 1 18:33:8 69860 [44606960]-> __osm_trap_rcv_procss_equest: ERR >> 3804: Receved trap 15 times conecutive >> Mar 19 18:3:38 20538 [43C05960] -> __s_trap_rcv_proces_request: >> Received Generic Notce ype:0x01 num:128 Poducer2from ID:0x0152 > TID:0x000000000000010Mr 19 18:33:38 720612 [43C0960] -> >> __osm_trp_rcv_process_reqest: RR >> 3804: ecived trap16 time onsecutively >> a 19 18:33:38 921253 [42803960] > __osm_trap_rv_processequest: >> eceive Generic Notice type:x01 num:128 Producer:2 from LID0x012 >> TIDx000000000000011 >> Ma 19 18:33:8 9213 [42803960] > __osm_trap_cv_procss_reque: ERR >> 3804: Recived trap 17 imes consecutively >> Mar 198:33:38 97418 [43C05960] -> _osm_trap_rcv_proess_reqest: >> Recived Generic Notice ype:0x01 nm:12Prodcer:2 rom LID:0x152 >> TID:0x000000000000012 >> a19 18:33:38 97479 [43C05960] > __os_trap_rcv_prcess_equest: RR >> 3804: Received trap 1 ties onsecutively >> Mar 191833:38 98519 [483960] > _osm_trap_rcv_rocessreques: >> ReceivedGeneric Noice type:0x1 um:128 Producer:2 from LID:0x015 >> TID:x00000000000013 >> Mar 19 1:33:3 98955[2803960] - __osm_tap_rcv_process_rquest: ERR3804: >> Receivtrap19 times consecutively >>Mar 9 18:33:38 998342 [43204960] -> __os_trap_cv_poces_request: >> ecivd Generic Notice type0x01 num128 Poducer:from ID:0x0152 >> TD:0x0000000000001 >> Mar 19 18:33:38 998380 [4320496] -> _osm_ap_rcv_process_requestRR >> 384:Received trap 20 times conscutiely >> ar 19 18:33:3 03923 3204960] -> _osm_tra_rcv_process_requst: >> Recived Gneric Notice type0x0 num:128 Producr:frm LID0x0152 >> TID:0000000000000015 >> Mar9 :33:39 039334 [4204960] ->__os_traprcvprocess_requs: ER >> 3804:Received tra 21 times consecutiely >> Mar19 18:33:39 06060 [32096] -> __osm_trap_rcv_process_equest: >> Reeid Generic Notice type:01m:128 Producr:2 from >> LID:x0152TID:0x00000000000016 >> Mar 19 18:3:306108 [4320490] -> __sm_trap_cv_prcess_request: ERR >> 304: Reeied tra22 times onsecutivel >>Mr 1 18:33:39 079032[41E02960] -> __osm_tra_cv_process_reques: >> Received eneric Notice tpe:0x01 num:18 Prducer from LID:0x0152 >> TD:0x00000000000017 >> Mr 1 18:33:39 07972 4E0260] -> _osm_trap_rcv_proces_request: ERR >> 380: Receied trp23 time consecutivel > Mar 19 18:3:9 16006 [41E0960] -> osm_eport_notice: Repoting Geric >> Noice ype3 num:5 from LID:0x0092 >> GD:0xfe80000000000000x0005ad0000024bbMar 19 18:33:9 146018 [402960] -> Removed porwith > GUID:0x005ad000002511b LID range [0xA6,x6] of od:saguaro-2-1 HCA-1 >> Mar 19 18:33:39 1404 [41E02960 -> osm_eport_notce: Reportin Generic>> Noticeype:3 num:65 from LID:0x0092 >>GID:0xfe80000000000000,0x005ad0000024bb >>Mar 19 18:33:39 146050 [41E296] -> Rmove ort with >> UID:0x0005d00000db LID range [0xB80xB8] of nod:saguaro-23- HCA-1 >> Mar 19 18:33:39 14082 [41E2960] -> sm_report_notie Reporting Generic >> Notic typ:3 num:6 from LID:0x0092 > GID:0xfe000000000000,0x0005ad000024bb >> Mar 19 8:33:39 146089 [41E02960] -> Removed port wh >> UID:0005ad0000024afb LID range 0xA0xA5] of noe:saguaro-22-0 HCA- >> Mr 19 18:33:39 15720 [4140190 -> osm_report_notice: Reporting Gnerc>> Notie type:3 num:64from LID:0x092 >> GID:0xfe80000000000000,0x0005ad0000bbb >> Mar 19 18:33:39150732 [41401960] -> Discoveed new port with >> GI0005ad0000024b27 LI rage [0xAF,0xAF] of nde:saguaro-23-0 HCA-1>> Mar 1 18:33:39 150736 [4140160] -> om_report_notice: Reporting Genec >>Notic ype:3 um:64 from LID:0x009 >> GID:0xfe0000000000000,0x0005d000024bb >> Mar 19 18:33:39 50742 [41401960] -> Discoverd new port with >> GID:0x00ad0000024d LID range [0xB0,xB0] of node:aguro-23-1 HCA-1 > Mar 19 183:39 15074 [4141960] -> osm_report_notice: Reporting Genei >> Notice tpe:3num64 from LID:0x0092 >> ID:0xf800000000000x0005ad000024bbb >> Mar 19 18:3:39 15750 [41401960] -> Discovered new pot with >> UID:0x0005ad00024d3 LID range [0x1,0xB1] of node:saguaro-3-2 HA-1 >>Mar 19 18:33:3 181553 [411960] -> os_ucast_mgr_process: Min Ho Tabes>> configured on al switches >> Mar 19 18:33:9 218130 [43C5960] -> __om_trap_rcv_proess_request: >>Received eneic Ntice type:0x01 num:128 Producer:2 from ID:0152 >> TID:0x000000000000018 > Mar 19 18:33:39 218197 43C05960] -> _os_trap_rcv_process_request:ERR >> 3804: Receivd trap 2 times cosecutivly >> Mar 1918:33:39 375407 [480390] -> __osm_trap_rcv_process_request: > Receive Generc Notice type:001 um:128 Producer:2 from LID:0x0152 >> T:0x0000000000019 >> Mar 19 18:3339 375456 [4803960 -> __osm_trap_rcv_process_request: ERR >> 3804: Rceived tra 25 tis cnsecutvely >> Mar 19 18:3:39 375588 [43C05960]-> __osm_trap_rcv_procsrequest: >> Received Generc Ntic type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000002e>>r 19 18:33:39 375630 [43C05960] -> osmror_notice Reporting Generic >> Notice type:1 num:128 fo LID:0x0148 >>GID:0xfe80000000000000,0x0005ad0000281b >> Mar19 18:3339 637844 [41401960] -> UBNET UP >> Mar 19 18339 664805 [45A0890] -> __osm_trap_rcv_process_request: >> Received Gener Notce tpe:0x01 num128 Producer:2 from LI:0x0148> TID:0x000000000000002f >> Mar 19 18:33:9 66490 [4508960] ->osm_report_notice: Reporting Generic >>Notice type:1 num:128 from LID:0x0148 >> GID:xfe8000000000000x0005ad0000281b3 >> ar 19 18:33:39 666276 [45A08960] -> __osm_trap_rcprocess_reuest: >> eceived Generic Notice typ:x01 num:128 Poducer:2 from LID:0x001B >> TID:0x0000000000003a >> Mr 9 18:33:39 666364 [45A08960] - osm_reprt_notice Reprting Generic >> Notice type1 num:128 from LID:0001B >> GID:0xfe8000000000000,0x0005ad00000281a7Mar 1 18:33:9 710546 [41E02960] -> >> __osm_tap_rv_roces_request > Received Generic Notice type:0x01 num:128 Producer2 fom LI:0x014 >> ID:0x00000000000003>> Mar 19 18:33:39 71062 [41E02960] -> osm_eport_notice Reportig Generic >> Noice type:1 num:28 from LID:0x048 >> ID:0xfe80000000000000,0x0005ad000281b>> Mar 19 18:33:39 732425 [41E060]->_sm_trap_rcvprocess_request: >> Received Generic Notice type:0x01 num18Producer:2 from ID:0x048 >>TID:0x0000000000000031 >>Mar 19 18:33:3973214 41E02960] -> osm_rport_ntice: Reporing Generic >> Notice type:1num:128 from LID:0x0148 >> GID:0xfe80000000000,0x0005ad0000281b3 >> Mar 1 83339784151 [43204960] -> __osm_trap_rcv_process_request: >> ReceivedGeeric Notice type:0x01 um128 Producer2 from LID:00148 >> TID:000000000000032 >> Mr 19 18:33:3978469 [43204960] -> osm_reort_notice: Reporting neric >> Noice type:1 nu:128 from LID:x0148 > GID:0fe80000000000000,00005ad0000081b >> Mar 19 18:33:39 824170 [4283960] > __osm_trap_rcv_rocss_request: >> eceived Gneric Notice type:001 num:128 Produer:2 from LID:0x001B >> TID:0x00000000003b >> Mar 19 18:33:39 824443 [4283960] -> osm_repot_notice: Reportin Generic >> Notice tye:1 n:128 frm LI:0x01B >> GID:0xfe8000000000000,0x0005ad00000281a7 >> Mar 19 18:3:40 01502 [44606960] - osm_report_noti: eporting Generic >> Notce type:3num:64 rom ID:0x0092 >> GID:0xfe800000000000,0x0005ad0000024bbb >> Mar 9 18:33:40 01070[44606960] - Discovered new port with >> GID:0x00ad0000024d6b LID rne [0xB80xB8] o node:saguaro-23-9 HCA-1 >> Mar 19 1833:40 015074 [44606960] -> osm_repot_notice: Reportng Generic >> Ntice type3 num:64 from LI0x0092 >>GID:0fe80000000000000,0x0005ad0000024bbb > Mar 19 18:33:40 0080 [44606960] -> Discovered new port wit >> GUI:0x0005ad00024afb LID rang [0xA5,0xA5] of node:agua-22-0 HCA-1 >> Mar 9 18:3:40 015083 [4406960] -> osm_repor_notic: Reporting Generic >> Notice type:3 nu64 from LID:0x002 > GID:0xfe80000000000000,x0005ad000002bbb> Mar 19 18:33:40 015088 [44606960] -> Discovered new port with >> GUID:00005ad000002511b LID ange [0xA6,0xA6] of noe:sauaro-22-1 HCA->> Mar19 18:33:40 046164 44606960] -> om_ucast_mgr_prcess: Min Hop Tables >> configured on all switchesar 19 18:33:40 106627 [42803960] -> BNET UP > Mar 1918:33:40 145952 [45007960] -> __osm_trap_cv_process_rquest: >> Received Generic Notice type:0x01 um:18 Producer:2 from LID:0x0148 >> TID:0x00000000000033>> ar 19 18:3340 146076 [4507960] -> os_report_notice: Reporting Generic >>Notice type:1 nu:128 from LID:0x018 >> GID:0xfe8000000000000,0x0005ad00000281b3 >> ar 19 18:33:40 14646 [44606960] -> __os_trap_rcv_process_reqest:Received >> Generic Noice ype:0x01 num:128 Producer:2 from LIDx001B >> ID:x000000000000003c >> Mar 9 8:33:40 16611 [44606960] -> osm_report_notice: Reporting Generi >> otice type:1 um:128 from LID:0x001B >> GD:0xfe8000000000000,0x0005ad00000281a7 >> Mar 19 18:3:40 306176 [41401960]->__osm_trap_rcv_process_request: > Receivd Generic Notice type:0x01 um:128Poucer:2 from LID:0x001B >> TID:0x00000000000000d >> Mar 19 18:33:40 306270 [41401960] -os_report_notic: Reporting Generic >> Ntice type:1 num:128 fro ID:0x001B >> GID:0xfe800000000000000x000ad00000281a >> Mar 19 18:33:40 20009 [4C05960] -> __osm_trap_rcv_rocess_rquest:Received >> Generic Ntice yp:0x01 num:128 Producer:2from LID:00152 >> TID:0x0000000000000019 > Mar 91:33:4420071[43C05960] -> __om_trap_rcv_process_request: ERR >> 3804: Receved trap 26 times conecutivly >> r19 1833:40 433566 [4280390] -> __osm_tap_rcv_process_request: >> Reeive Geei Noticetype:0x01 num:128 Producer:2 frm LID:0x0152 >> TID:0x0000000000001a >> Mar 19 1:33:40 43596 [403960] -> __osm_traprcv_proess_reuest: E >> 3804: Received trap 2 times consecutively >> Mar 19 1833:40 434996 [45007960] -> _osm_trap_cv_pocess_reqest: >> Received Generic otice type:0x01 num:28 Producer:2from >> LID:x001BTID:0x00000000000003e > Mar 19 18:33:40 435041 [450079] -> os_reportotice: Reporting Generic >> otice ype:1 num:18 fromLID:0x001B >> GID0xfe80000000000000,000ad0000281a7 >> Mar 19 18:33:40 485454 [204960 -> osm_ucast_mgr_procss: Mi Hop Table >> confiured on all swtches>> Mar 19 18:33:40 528816 [43C05960] -> os_trap_cvprocess_requet: >> Received Generic Noice type:0x01 num:128 roduer:2 from LID:0x001B >> TID:00000000000003f >> Mar 19 18:33:40 52890 [43C05960 -> osm_reort_notie: Reprting Generic >> Notice type:1 nu:128 fro LID:0x001B >> GID:0xfe8000000000000,0x005ad0000081a7 >> Mar 19 18:33:40 546019[42803960] -> SUBNT UP >> Mar 19 18:3:40551048 [42803960] -> __osm_trap_rcv_process_request: >> Receive Genric Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:x0000000000000034 >> Mar 19 18:33:40 5519 [42803960] -> osm_report_notice: Reporting Gneric >> Notice typ:1 num:128 from LID:00148 >> GID:0xfe80000000000000x0005ad00000281b3 >> Mar 19 18:33:40 594994 [44606960] -> __osm_trap_rc_pocess_request: >> Received Generic Notice type:0x01 num:128 Producer2 from LID:0x001B >> TID:0x0000000000000040 >> Mar 19 18:33:40 595074 [44606960] -> om_report_notice: Reporting Generic >> Noice type:1 num:128 from LD:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:33:40 83973 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Prodcer:2 from LID:0x001B >> TID:0x0000000000000041 >> Mar 19 18:33:40 840064 [43204960] -> osm_report_notice: Reporting Gneric >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x005ad0000281a7 >> Mar 19 18:33:40 861973 [43204960] -> __osm_trap_rcv_process_request: >> Received Genric Notice type:0x01 num:128 Producer: from LID:0x001B >> TID:0x0000000000000042 >> Mar 19 18:33:40 862075 [43204960]-> osm_report_notice: Reporting Generic >> Ntice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x005ad00000281a7 >> Mar 19 18:33:40 83777 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic otice type:0x01 num:128 Producr:2 from LID:0x001B >> TID:0x0000000000000043 >> Mar 19 18:33:40 907658 [4803960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:33:40 947974 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:33:40 965203 [45007960] -> SUBNET UP >> Mar 19 18:33:41 350582 [45007960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:33:41 417662 [43204960] -> SUBNET UP >> Mar 19 18:33:41 571156 [45A08960] -> __osm_trap_rc_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001b >> Mar 19 18:33:41 571256 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 28 times consecutively >> Mar 19 18:35:50 971684 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000035 >> Mar 19 18:35:50 971926 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:35:50 972301 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000044 >> Mar 19 18:35:50 972378 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:35:51 342826 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 342845 [43204960] -> Removed port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 19 18:35:51 342866 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 342873 [43204960] -> Removed port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 19 18:35:51 342895 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 342901 [43204960] -> Removed port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 19 18:35:51 342923 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 342930 [43204960] -> Removed port with >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 >> Mar 19 18:35:51 342968 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 342972 [43204960] -> Removed port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 19 18:35:51 342989 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 342994 [43204960] -> Removed port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 19 18:35:51 343011 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 343016 [43204960] -> Removed port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 19 18:35:51 343033 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 343038 [43204960] -> Removed port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 19 18:35:51 343189 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 343194 [43204960] -> Removed port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 19 18:35:51 343234 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 343239 [43204960] -> Removed port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 19 18:35:51 343253 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 343258 [43204960] -> Removed port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 19 18:35:51 343273 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 343278 [43204960] -> Removed port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 19 18:35:51 343293 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 343298 [43204960] -> Removed port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 19 18:35:51 343314 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 343319 [43204960] -> Removed port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 19 18:35:51 343334 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 343393 [43204960] -> Removed port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 19 18:35:51 343410 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 343415 [43204960] -> Removed port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 19 18:35:51 343430 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:35:51 343435 [43204960] -> Removed port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 19 18:35:51 376525 [43204960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:35:51 433087 [43204960] -> SUBNET UP >> Mar 19 18:35:51 849193 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:35:51 901399 [42803960] -> SUBNET UP >> Mar 19 18:36:44 359407 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000036 >> Mar 19 18:36:44 359652 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:36:44 365352 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000037 >> Mar 19 18:36:44 365427 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:36:44 365432 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000045 >> Mar 19 18:36:44 365567 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:36:44 371481 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000046 >> Mar 19 18:36:44 371591 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:36:44 711649 [43204960] -> osm_drop_mgr_process: ERR 0108: Unknown >> remote side for node 0x0005ad0000027c84 port 19. Adding to light sweep >> sampling list >> Mar 19 18:36:44 711691 [43204960] -> Directed Path Dump of 5 hop path: >> Path = [0][1][11][1][6][18] >> Mar 19 18:36:44 711738 [43204960] -> osm_drop_mgr_process: ERR 0108: Unknown >> remote side for node 0x0005ad00000281b3 port 24. Adding to light sweep >> sampling list >> Mar 19 18:36:44 711748 [43204960] -> Directed Path Dump of 4 hop path: >> Path = [0][1][11][1][6] >> Mar 19 18:36:44 721719 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721730 [43204960] -> Discovered new port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 19 18:36:44 721736 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721744 [43204960] -> Discovered new port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 19 18:36:44 721749 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721756 [43204960] -> Discovered new port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 19 18:36:44 721761 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721767 [43204960] -> Discovered new port with >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 >> Mar 19 18:36:44 721772 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721779 [43204960] -> Discovered new port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 19 18:36:44 721784 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721790 [43204960] -> Discovered new port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 19 18:36:44 721795 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721802 [43204960] -> Discovered new port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 19 18:36:44 721826 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721831 [43204960] -> Discovered new port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 19 18:36:44 721845 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721850 [43204960] -> Discovered new port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 19 18:36:44 721854 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721859 [43204960] -> Discovered new port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 19 18:36:44 721862 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721867 [43204960] -> Discovered new port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 19 18:36:44 721871 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721876 [43204960] -> Discovered new port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 19 18:36:44 721880 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721884 [43204960] -> Discovered new port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 19 18:36:44 721888 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721893 [43204960] -> Discovered new port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 19 18:36:44 721897 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721923 [43204960] -> Discovered new port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 19 18:36:44 721927 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721932 [43204960] -> Discovered new port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 19 18:36:44 721936 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:36:44 721941 [43204960] -> Discovered new port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 19 18:36:44 752683 [43204960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:36:44 820881 [43C05960] -> SUBNET UP >> Mar 19 18:36:45 198990 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:36:45 258878 [44606960] -> SUBNET UP >> Mar 19 18:37:00 446068 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000000 >> Mar 19 18:37:00 446346 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:37:00 564122 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000001 >> Mar 19 18:37:00 564810 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:37:00 589920 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000002 >> Mar 19 18:37:00 590067 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:37:00 611770 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000003 >> Mar 19 18:37:00 611916 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:37:00 800652 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000004 >> Mar 19 18:37:00 817995 [45007960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:37:00 861575 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:37:00 983908 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000005 >> Mar 19 18:37:00 984004 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:37:01 012195 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000006 >> Mar 19 18:37:01 012283 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:37:01 034177 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000007 >> Mar 19 18:37:01 034272 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:37:01 056001 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000008 >> Mar 19 18:37:01 056068 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:37:01 074341 [43204960] -> SUBNET UP >> Mar 19 18:37:01 252871 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000009 >> Mar 19 18:37:01 253037 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:37:01 303407 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000a >> Mar 19 18:37:01 303490 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:37:01 325057 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000b >> Mar 19 18:37:01 325160 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 11 times consecutively >> Mar 19 18:37:01 334059 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000c >> Mar 19 18:37:01 334118 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 12 times consecutively >> Mar 19 18:37:01 474293 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:01 474317 [45007960] -> Removed port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 19 18:37:01 474341 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:01 474348 [45007960] -> Removed port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 19 18:37:01 474371 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:01 474378 [45007960] -> Removed port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 19 18:37:01 509205 [45007960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:37:01 557110 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000d >> Mar 19 18:37:01 557172 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 13 times consecutively >> Mar 19 18:37:01 565676 [43C05960] -> SUBNET UP >> Mar 19 18:37:01 576199 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000e >> Mar 19 18:37:01 576270 [41401960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 14 times consecutively >> Mar 19 18:37:01 599713 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000f >> Mar 19 18:37:01 599779 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 15 times consecutively >> Mar 19 18:37:01 707096 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000010 >> Mar 19 18:37:01 707150 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 16 times consecutively >> Mar 19 18:37:01 921406 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:01 921423 [45A08960] -> Removed port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 19 18:37:01 921448 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:01 921455 [45A08960] -> Removed port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 19 18:37:01 921495 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:01 921502 [45A08960] -> Removed port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 19 18:37:01 925845 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:01 925855 [41E02960] -> Discovered new port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 19 18:37:01 925859 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:01 925864 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 19 18:37:01 925868 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:01 925873 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 19 18:37:01 956691 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:37:01 999372 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000011 >> Mar 19 18:37:01 999470 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 17 times consecutively >> Mar 19 18:37:02 012194 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000012 >> Mar 19 18:37:02 012256 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 18 times consecutively >> Mar 19 18:37:02 014327 [41401960] -> SUBNET UP >> Mar 19 18:37:02 034202 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000013 >> Mar 19 18:37:02 034250 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 19 times consecutively >> Mar 19 18:37:02 056015 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000014 >> Mar 19 18:37:02 056060 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 20 times consecutively >> Mar 19 18:37:02 270731 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000015 >> Mar 19 18:37:02 270777 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 21 times consecutively >> Mar 19 18:37:02 271169 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000038 >> Mar 19 18:37:02 271347 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:37:02 462374 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000039 >> Mar 19 18:37:02 462511 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:37:02 496247 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000003a >> Mar 19 18:37:02 496310 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:37:02 624890 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:02 624902 [45A08960] -> Discovered new port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 19 18:37:02 624908 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:02 624914 [45A08960] -> Discovered new port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 19 18:37:02 624919 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:02 624926 [45A08960] -> Discovered new port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 19 18:37:02 655848 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:37:02 709115 [42803960] -> SUBNET UP >> Mar 19 18:37:03 082995 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000003b >> Mar 19 18:37:03 106373 [43204960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:37:03 136757 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:37:03 178027 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000047 >> Mar 19 18:37:03 178064 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000003c >> Mar 19 18:37:03 178139 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:37:03 178160 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:37:03 315226 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000015 >> Mar 19 18:37:03 315289 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 22 times consecutively >> Mar 19 18:37:03 341474 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000016 >> Mar 19 18:37:03 341549 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 23 times consecutively >> Mar 19 18:37:03 341616 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000003d >> Mar 19 18:37:03 342446 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:37:03 343169 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 19 18:37:03 343262 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x5 >> trans_id................0x14d08 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x11 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][6][16] >> Return path: [0][9][18][D][3][11] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 >> >> 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 19 18:37:03 343371 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 19 18:37:03 343364 [45007960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 19 18:37:03 343415 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x5 >> trans_id................0x14d09 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x12 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][6][16] >> Return path: [0][9][18][D][3][11] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 >> >> 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 19 18:37:03 343409 [45007960] -> PortInfo dump: >> port number.............0x11 >> node_guid...............0x0005ad0000027c84 >> port_guid...............0x0005ad0000027c84 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x11 >> link_width_enabled......0x2 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............INIT >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 19 18:37:03 343481 [45007960] -> Capabilities Mask: >> Mar 19 18:37:03 343532 [45007960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 19 18:37:03 343537 [45007960] -> PortInfo dump: >> port number.............0x12 >> node_guid...............0x0005ad0000027c84 >> port_guid...............0x0005ad0000027c84 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x11 >> link_width_enabled......0x2 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............INIT >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 19 18:37:03 343555 [45007960] -> Capabilities Mask: >> Mar 19 18:37:03 348684 [45007960] -> SUBNET UP >> Mar 19 18:37:03 461748 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000048 >> Mar 19 18:37:03 461958 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:37:03 484827 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000003e >> Mar 19 18:37:03 486448 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:37:03 528040 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000049 >> Mar 19 18:37:03 528154 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:37:03 580196 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000004a >> Mar 19 18:37:03 580534 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:37:03 599784 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000004b >> Mar 19 18:37:03 599879 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:37:03 621883 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000004c >> Mar 19 18:37:03 621940 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:37:03 707894 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:37:03 764678 [43204960] -> SUBNET UP >> Mar 19 18:37:03 783783 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000004d >> Mar 19 18:37:03 783844 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:37:04 000228 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000004e >> Mar 19 18:37:04 000628 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:37:04 022198 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000004f >> Mar 19 18:37:04 022299 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:37:04 043985 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000050 >> Mar 19 18:37:04 044052 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:37:04 155809 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:37:04 210448 [41401960] -> SUBNET UP >> Mar 19 18:37:04 504490 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000017 >> Mar 19 18:37:04 504569 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 24 times consecutively >> Mar 19 18:37:04 570084 [42803960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:37:04 626298 [43C05960] -> SUBNET UP >> Mar 19 18:37:54 424084 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000051 >> Mar 19 18:37:54 424430 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:37:54 424457 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000003f >> Mar 19 18:37:54 424522 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:37:54 722515 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 722536 [44606960] -> Removed port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 19 18:37:54 722558 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 722565 [44606960] -> Removed port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 19 18:37:54 722587 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 722594 [44606960] -> Removed port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 19 18:37:54 722636 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 722641 [44606960] -> Removed port with >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 >> Mar 19 18:37:54 722658 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 722663 [44606960] -> Removed port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 19 18:37:54 722679 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 722684 [44606960] -> Removed port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 19 18:37:54 722701 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 722706 [44606960] -> Removed port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 19 18:37:54 722723 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 722728 [44606960] -> Removed port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 19 18:37:54 722875 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 722880 [44606960] -> Removed port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 19 18:37:54 722909 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 722915 [44606960] -> Removed port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 19 18:37:54 722929 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 722934 [44606960] -> Removed port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 19 18:37:54 722949 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 722955 [44606960] -> Removed port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 19 18:37:54 722970 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 722975 [44606960] -> Removed port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 19 18:37:54 722992 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 722997 [44606960] -> Removed port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 19 18:37:54 723012 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 723073 [44606960] -> Removed port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 19 18:37:54 723090 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 723095 [44606960] -> Removed port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 19 18:37:54 723111 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:37:54 723116 [44606960] -> Removed port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 19 18:37:54 756302 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:37:54 806787 [45A08960] -> SUBNET UP >> Mar 19 18:37:55 149566 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:37:55 198855 [41401960] -> SUBNET UP >> Mar 19 18:38:48 131054 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000040 >> Mar 19 18:38:48 131349 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:38:48 137230 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000052 >> Mar 19 18:38:48 137268 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000041 >> Mar 19 18:38:48 137395 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:38:48 137432 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:38:48 143370 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000053 >> Mar 19 18:38:48 144327 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:38:48 529052 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529065 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 19 18:38:48 529071 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529078 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 >> Mar 19 18:38:48 529083 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529090 [41E02960] -> Discovered new port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 19 18:38:48 529095 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529101 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 19 18:38:48 529106 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529113 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 19 18:38:48 529118 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529124 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 19 18:38:48 529129 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529136 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 19 18:38:48 529141 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529147 [41E02960] -> Discovered new port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 19 18:38:48 529152 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529159 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 19 18:38:48 529164 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529170 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 19 18:38:48 529175 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529182 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 19 18:38:48 529186 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529193 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 19 18:38:48 529198 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529204 [41E02960] -> Discovered new port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 19 18:38:48 529209 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529216 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 19 18:38:48 529271 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529277 [41E02960] -> Discovered new port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 19 18:38:48 529281 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529286 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 19 18:38:48 529290 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:38:48 529294 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 19 18:38:48 560082 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:38:48 630464 [43204960] -> SUBNET UP >> Mar 19 18:38:49 018498 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:38:49 073355 [45007960] -> SUBNET UP >> Mar 19 18:39:04 189829 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000000 >> Mar 19 18:39:04 190072 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:39:04 307827 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000001 >> Mar 19 18:39:04 307940 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:39:04 330104 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000002 >> Mar 19 18:39:04 330210 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:39:04 468676 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000003 >> Mar 19 18:39:04 468758 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:39:04 680305 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000004 >> Mar 19 18:39:04 680400 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:39:04 702144 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000005 >> Mar 19 18:39:04 702286 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:39:04 704346 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:39:04 704354 [43204960] -> Removed port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 19 18:39:04 739059 [43204960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:39:04 739896 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000006 >> Mar 19 18:39:04 783807 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:39:04 797411 [44606960] -> SUBNET UP >> Mar 19 18:39:04 849970 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000007 >> Mar 19 18:39:04 850195 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:39:04 853735 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000008 >> Mar 19 18:39:04 853809 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:39:04 897727 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000009 >> Mar 19 18:39:04 897860 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:39:04 901577 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000a >> Mar 19 18:39:04 901719 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 19 18:39:04 923271 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000b >> Mar 19 18:39:04 923377 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 11 times consecutively >> Mar 19 18:39:05 106246 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000c >> Mar 19 18:39:05 106314 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 12 times consecutively >> Mar 19 18:39:05 178215 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000d >> Mar 19 18:39:05 178258 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 13 times consecutively >> Mar 19 18:39:05 272913 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000e >> Mar 19 18:39:05 272983 [42803960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 14 times consecutively >> Mar 19 18:39:05 339633 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000f >> Mar 19 18:39:05 339679 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 15 times consecutively >> Mar 19 18:39:05 469093 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000010 >> Mar 19 18:39:05 469145 [41401960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 16 times consecutively >> Mar 19 18:39:05 484587 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000011 >> Mar 19 18:39:05 484633 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 17 times consecutively >> Mar 19 18:39:05 574251 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000012 >> Mar 19 18:39:05 574301 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 18 times consecutively >> Mar 19 18:39:05 602665 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000013 >> Mar 19 18:39:05 602700 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 19 times consecutively >> Mar 19 18:39:05 646331 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000014 >> Mar 19 18:39:05 646369 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 20 times consecutively >> Mar 19 18:39:05 834613 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000015 >> Mar 19 18:39:05 834685 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 21 times consecutively >> Mar 19 18:39:05 851128 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000016 >> Mar 19 18:39:05 851166 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 22 times consecutively >> Mar 19 18:39:05 875540 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000017 >> Mar 19 18:39:05 875592 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 23 times consecutively >> Mar 19 18:39:05 897378 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000018 >> Mar 19 18:39:05 897424 [42803960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 24 times consecutively >> Mar 19 18:39:05 907232 [4780B960] -> umad_receiver: ERR 5409: send completed >> with error (method=0x1 attr=0x15 trans_id=0x124ef0001c2fe) -- dropping >> Mar 19 18:39:05 907249 [4780B960] -> umad_receiver: ERR 5411: DR SMP >> Mar 19 18:39:05 907259 [4780B960] -> __osm_sm_mad_ctrl_send_err_cb: ERR >> 3113: MAD completed in error (IB_TIMEOUT) >> Mar 19 18:39:05 907295 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x1 (SubnGet) >> D bit...................0x0 >> status..................0x0 >> hop_ptr.................0x0 >> hop_count...............0x6 >> trans_id................0x1c2fe >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x1 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][6][16][8] >> Return path: [0][0][0][0][0][0][0] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> Mar 19 18:39:05 907372 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:39:05 907384 [41401960] -> Removed port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 19 18:39:05 907407 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:39:05 907414 [41401960] -> Removed port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 19 18:39:05 907480 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:39:05 907485 [41401960] -> Removed port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 19 18:39:05 907577 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:39:05 907582 [41401960] -> Removed port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 19 18:39:05 907618 [41401960] -> osm_drop_mgr_process: ERR 0108: Unknown >> remote side for node 0x0005ad0000027c84 port 8. Adding to light sweep >> sampling list >> Mar 19 18:39:05 907657 [41401960] -> Directed Path Dump of 5 hop path: >> Path = [0][1][11][1][6][16] >> Mar 19 18:39:05 911559 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:39:05 911572 [43204960] -> Discovered new port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 19 18:39:05 927229 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000019 >> Mar 19 18:39:05 927285 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 25 times consecutively >> Mar 19 18:39:05 942538 [43204960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:39:06 000027 [41E02960] -> SUBNET UP >> Mar 19 18:39:06 130255 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001a >> Mar 19 18:39:06 130308 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 26 times consecutively >> Mar 19 18:39:06 131922 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000042 >> Mar 19 18:39:06 132063 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:39:06 154579 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001b >> Mar 19 18:39:06 154681 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 27 times consecutively >> Mar 19 18:39:06 176248 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001c >> Mar 19 18:39:06 176304 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 28 times consecutively >> Mar 19 18:39:06 198132 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001d >> Mar 19 18:39:06 198195 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 29 times consecutively >> Mar 19 18:39:06 230022 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001e >> Mar 19 18:39:06 230108 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 30 times consecutively >> Mar 19 18:39:06 230229 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000043 >> Mar 19 18:39:06 230311 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:39:06 399543 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:39:06 399556 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 19 18:39:06 399562 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:39:06 399569 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 19 18:39:06 399574 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:39:06 399580 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 19 18:39:06 399585 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 19 18:39:06 399592 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 19 18:39:06 430598 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:39:06 494689 [44606960] -> SUBNET UP >> Mar 19 18:39:06 837303 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001f >> Mar 19 18:39:06 837446 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 31 times consecutively >> Mar 19 18:39:06 838528 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000044 >> Mar 19 18:39:06 838636 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:39:06 876308 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:39:07 028376 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000020 >> Mar 19 18:39:07 028459 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 32 times consecutively >> Mar 19 18:39:07 028545 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000045 >> Mar 19 18:39:07 028652 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:39:07 030190 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000054 >> Mar 19 18:39:07 030277 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:39:07 096812 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000046 >> Mar 19 18:39:07 096959 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:39:07 111719 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 19 18:39:07 111759 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x5 >> trans_id................0x1dfac >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x11 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][4][16] >> Return path: [0][9][18][D][1][11] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 >> >> 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 19 18:39:07 111810 [41E02960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 19 18:39:07 111814 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 19 18:39:07 111831 [41E02960] -> PortInfo dump: >> port number.............0x11 >> node_guid...............0x0005ad0000027c84 >> port_guid...............0x0005ad0000027c84 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x11 >> link_width_enabled......0x2 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............INIT >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 19 18:39:07 111868 [41E02960] -> Capabilities Mask: >> Mar 19 18:39:07 111844 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x5 >> trans_id................0x1dfad >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x12 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][4][16] >> Return path: [0][9][18][D][1][11] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 >> >> 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 19 18:39:07 112011 [41401960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 19 18:39:07 112018 [41401960] -> PortInfo dump: >> port number.............0x12 >> node_guid...............0x0005ad0000027c84 >> port_guid...............0x0005ad0000027c84 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x11 >> link_width_enabled......0x2 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............INIT >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 19 18:39:07 112034 [41401960] -> Capabilities Mask: >> Mar 19 18:39:07 117211 [45A08960] -> SUBNET UP >> Mar 19 18:39:07 354540 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000047 >> Mar 19 18:39:07 354686 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 19 18:39:07 383453 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000055 >> Mar 19 18:39:07 383530 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:39:07 497601 [42803960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:39:07 548184 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000056 >> Mar 19 18:39:07 548217 [43C05960] -> SUBNET UP >> Mar 19 18:39:07 548427 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:39:07 878403 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000057 >> Mar 19 18:39:07 887312 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000058 >> Mar 19 18:39:07 888156 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:39:07 929819 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:39:07 929834 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:39:07 931166 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000059 >> Mar 19 18:39:07 931288 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 19 18:39:07 946406 [42803960] -> SUBNET UP >> Mar 19 18:39:08 073735 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000020 >> Mar 19 18:39:08 073811 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 33 times consecutively >> Mar 19 18:39:08 400790 [43204960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 18:39:08 467925 [45A08960] -> SUBNET UP >> Mar 19 20:24:07 009911 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0020 >> TID:0x0000000000000020 >> Mar 19 20:24:07 010153 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0020 >> GID:0xfe80000000000000,0x0005ad00000281ad >> Mar 19 20:24:07 010966 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0001 >> TID:0x000000000000001a >> Mar 19 20:24:07 011064 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0001 >> GID:0xfe80000000000000,0x0005ad0000027c6a >> Mar 19 20:24:07 390927 [43204960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 20:24:07 453747 [43204960] -> SUBNET UP >> Mar 19 20:24:07 839927 [45007960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 20:24:07 895694 [45A08960] -> SUBNET UP >> Mar 19 20:24:08 049066 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0001 >> TID:0x000000000000001a >> Mar 19 20:24:08 049322 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0001 >> GID:0xfe80000000000000,0x0005ad0000027c6a >> Mar 19 20:24:08 433979 [42803960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 20:24:08 487950 [43204960] -> SUBNET UP >> Mar 19 20:26:28 608381 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0020 >> TID:0x0000000000000021 >> Mar 19 20:26:28 608406 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0001 >> TID:0x000000000000001b >> Mar 19 20:26:28 608685 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0020 >> GID:0xfe80000000000000,0x0005ad00000281ad >> Mar 19 20:26:28 608693 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0001 >> GID:0xfe80000000000000,0x0005ad0000027c6a >> Mar 19 20:26:28 972140 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 20:26:29 028682 [43C05960] -> SUBNET UP >> Mar 19 20:26:29 399649 [43204960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 20:26:29 465737 [45007960] -> SUBNET UP >> Mar 19 21:30:38 775260 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0146 >> TID:0x000000000000002f >> Mar 19 21:30:38 775533 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0146 >> GID:0xfe80000000000000,0x0005ad00000281b6 >> Mar 19 21:30:38 777083 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0143 >> TID:0x0000000000000037 >> Mar 19 21:30:38 777242 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0143 >> GID:0xfe80000000000000,0x0005ad00000281b9 >> Mar 19 21:30:39 144779 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 21:30:39 200635 [43204960] -> SUBNET UP >> Mar 19 21:30:39 536003 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 19 21:30:39 591216 [42803960] -> SUBNET UP >> Mar 20 14:06:48 971082 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000021 >> Mar 20 14:06:48 971376 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:06:49 346734 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:06:49 346761 [42803960] -> Removed port with >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 >> Mar 20 14:06:49 381394 [42803960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:06:49 440803 [43204960] -> SUBNET UP >> Mar 20 14:07:09 098449 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000048 >> Mar 20 14:07:09 098708 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:07:09 098733 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000005a >> Mar 20 14:07:09 098777 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:07:09 417844 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 417862 [42803960] -> Removed port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:07:09 417879 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 417885 [42803960] -> Removed port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:07:09 417902 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 417907 [42803960] -> Removed port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 20 14:07:09 417924 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 417929 [42803960] -> Removed port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:07:09 417945 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 417951 [42803960] -> Removed port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:07:09 417967 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 417973 [42803960] -> Removed port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 20 14:07:09 417989 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 417994 [42803960] -> Removed port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:07:09 418131 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 418137 [42803960] -> Removed port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 20 14:07:09 418168 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 418173 [42803960] -> Removed port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:07:09 418188 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 418193 [42803960] -> Removed port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 20 14:07:09 418207 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 418212 [42803960] -> Removed port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:07:09 418227 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 418232 [42803960] -> Removed port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 20 14:07:09 418248 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 418253 [42803960] -> Removed port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:07:09 418285 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 418290 [42803960] -> Removed port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:07:09 418306 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 418362 [42803960] -> Removed port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:07:09 418378 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:07:09 418383 [42803960] -> Removed port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 20 14:07:09 451317 [42803960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:07:09 502755 [41401960] -> SUBNET UP >> Mar 20 14:07:09 902534 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:07:09 955229 [45A08960] -> SUBNET UP >> Mar 20 14:08:03 850926 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000049 >> Mar 20 14:08:03 851134 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:08:03 856880 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000004a >> Mar 20 14:08:03 856955 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:08:03 866819 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000005b >> Mar 20 14:08:03 866977 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:08:03 963024 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000005c >> Mar 20 14:08:03 963130 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:08:04 106856 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000005d >> Mar 20 14:08:04 106995 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:08:04 193747 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193766 [44606960] -> Discovered new port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 20 14:08:04 193771 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193777 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 20 14:08:04 193781 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193786 [44606960] -> Discovered new port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:08:04 193790 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193795 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:08:04 193799 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193804 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:08:04 193808 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193813 [44606960] -> Discovered new port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:08:04 193817 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193822 [44606960] -> Discovered new port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:08:04 193826 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193830 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:08:04 193834 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193839 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:08:04 193843 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193848 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 20 14:08:04 193852 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193857 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 20 14:08:04 193861 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193866 [44606960] -> Discovered new port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 20 14:08:04 193870 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193874 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:08:04 193878 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193883 [44606960] -> Discovered new port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 20 14:08:04 193938 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193944 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:08:04 193948 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:04 193953 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:08:04 224695 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:08:04 281046 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:08:04 281106 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x61eec >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x13 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][17][2][4] >> Return path: [0][9][14][E][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:08:04 281154 [41401960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:08:04 281159 [41401960] -> PortInfo dump: >> port number.............0x13 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:08:04 281172 [41401960] -> Capabilities Mask: >> Mar 20 14:08:04 281187 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:08:04 281213 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x61eed >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][17][2][4] >> Return path: [0][9][14][E][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:08:04 281279 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:08:04 281316 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x61eee >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][17][2][4] >> Return path: [0][9][14][E][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:08:04 281392 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:08:04 281416 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x61eef >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][6] >> Return path: [0][9][18][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:08:04 281515 [44606960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:08:04 281522 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:08:04 281542 [44606960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:08:04 281553 [44606960] -> Capabilities Mask: >> Mar 20 14:08:04 281561 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x61ef0 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][6] >> Return path: [0][9][18][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:08:04 281572 [44606960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:08:04 281590 [44606960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:08:04 281600 [44606960] -> Capabilities Mask: >> Mar 20 14:08:04 281623 [44606960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:08:04 281626 [44606960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:08:04 281635 [44606960] -> Capabilities Mask: >> Mar 20 14:08:04 281637 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:08:04 281652 [44606960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:08:04 281663 [44606960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:08:04 281673 [44606960] -> Capabilities Mask: >> Mar 20 14:08:04 281675 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x61ef1 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][6] >> Return path: [0][9][18][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:08:04 281721 [41E02960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:08:04 281726 [41E02960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:08:04 281736 [41E02960] -> Capabilities Mask: >> Mar 20 14:08:04 287136 [44606960] -> SUBNET UP >> Mar 20 14:08:04 711595 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:08:04 766488 [45A08960] -> SUBNET UP >> Mar 20 14:08:19 947200 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000000 >> Mar 20 14:08:19 947479 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:08:20 086909 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000001 >> Mar 20 14:08:20 087084 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:08:20 108865 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000002 >> Mar 20 14:08:20 109210 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:08:20 109996 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000003 >> Mar 20 14:08:20 110407 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:08:20 222523 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000004 >> Mar 20 14:08:20 222613 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:08:20 404596 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000005 >> Mar 20 14:08:20 404698 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:08:20 476804 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000006 >> Mar 20 14:08:20 476897 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:08:20 572434 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000007 >> Mar 20 14:08:20 572520 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:08:20 621715 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:20 621726 [42803960] -> Removed port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:08:20 656232 [42803960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:08:20 698700 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000008 >> Mar 20 14:08:20 698794 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:08:20 708598 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000009 >> Mar 20 14:08:20 708698 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:08:20 713653 [45007960] -> SUBNET UP >> Mar 20 14:08:20 730554 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000a >> Mar 20 14:08:20 730697 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:08:20 754139 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000b >> Mar 20 14:08:20 754251 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 11 times consecutively >> Mar 20 14:08:20 947339 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000c >> Mar 20 14:08:20 947426 [41401960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 12 times consecutively >> Mar 20 14:08:20 975965 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000d >> Mar 20 14:08:20 976024 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 13 times consecutively >> Mar 20 14:08:20 997569 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000e >> Mar 20 14:08:20 997648 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 14 times consecutively >> Mar 20 14:08:21 019465 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000f >> Mar 20 14:08:21 019512 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 15 times consecutively >> Mar 20 14:08:21 064967 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000010 >> Mar 20 14:08:21 065009 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 16 times consecutively >> Mar 20 14:08:21 082838 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000011 >> Mar 20 14:08:21 082877 [41401960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 17 times consecutively >> Mar 20 14:08:21 100567 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000012 >> Mar 20 14:08:21 100619 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 18 times consecutively >> Mar 20 14:08:21 188128 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:21 188144 [43C05960] -> Removed port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:08:21 188166 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:21 188172 [43C05960] -> Removed port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:08:21 188194 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:21 188199 [43C05960] -> Removed port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:08:21 192421 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:21 192436 [41E02960] -> Discovered new port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:08:21 208455 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000013 >> Mar 20 14:08:21 208499 [41401960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 19 times consecutively >> Mar 20 14:08:21 223240 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:08:21 394585 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000014 >> Mar 20 14:08:21 394665 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 20 times consecutively >> Mar 20 14:08:21 419333 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000015 >> Mar 20 14:08:21 419393 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 21 times consecutively >> Mar 20 14:08:21 441228 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000016 >> Mar 20 14:08:21 441276 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 22 times consecutively >> Mar 20 14:08:21 462915 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000017 >> Mar 20 14:08:21 462968 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 23 times consecutively >> Mar 20 14:08:21 475440 [45007960] -> SUBNET UP >> Mar 20 14:08:21 674045 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000018 >> Mar 20 14:08:21 674084 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000004b >> Mar 20 14:08:21 674137 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 24 times consecutively >> Mar 20 14:08:21 674294 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:08:21 965885 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000004c >> Mar 20 14:08:21 965992 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:08:22 092378 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:22 092395 [41401960] -> Removed port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:08:22 092415 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:22 092420 [41401960] -> Removed port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:08:22 092444 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:22 092449 [41401960] -> Removed port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:08:22 092625 [41401960] -> osm_drop_mgr_process: ERR 0108: Unknown >> remote side for node 0x0005ad00000281b3 port 22. Adding to light sweep >> sampling list >> Mar 20 14:08:22 092655 [41401960] -> Directed Path Dump of 4 hop path: >> Path = [0][1][11][1][4] >> Mar 20 14:08:22 092663 [41401960] -> osm_drop_mgr_process: ERR 0108: Unknown >> remote side for node 0x0005ad00000281b3 port 23. Adding to light sweep >> sampling list >> Mar 20 14:08:22 092672 [41401960] -> Directed Path Dump of 4 hop path: >> Path = [0][1][11][1][4] >> Mar 20 14:08:22 096789 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:22 096801 [41E02960] -> Discovered new port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:08:22 096805 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:22 096810 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:08:22 096814 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:22 096819 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:08:22 127266 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:08:22 184734 [45007960] -> SUBNET UP >> Mar 20 14:08:22 541974 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:22 541985 [41401960] -> Discovered new port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:08:22 541989 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:22 541995 [41401960] -> Discovered new port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:08:22 541998 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:08:22 542003 [41401960] -> Discovered new port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:08:22 572711 [41401960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:08:22 611570 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000004d >> Mar 20 14:08:22 611751 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:08:22 611770 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000005e >> Mar 20 14:08:22 612060 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:08:22 623766 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:08:22 623814 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x66134 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][5] >> Return path: [0][9][18][D][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:08:22 623876 [45007960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:08:22 623888 [45007960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:08:22 623907 [45007960] -> Capabilities Mask: >> Mar 20 14:08:22 623945 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:08:22 623973 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x66135 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][5] >> Return path: [0][9][18][D][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:08:22 624051 [44606960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:08:22 624056 [44606960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:08:22 624069 [44606960] -> Capabilities Mask: >> Mar 20 14:08:22 629289 [45A08960] -> SUBNET UP >> Mar 20 14:08:22 712180 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000018 >> Mar 20 14:08:22 712238 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 25 times consecutively >> Mar 20 14:08:22 869303 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000005f >> Mar 20 14:08:22 869527 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:08:22 892522 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000004e >> Mar 20 14:08:22 892707 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:08:22 957086 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000060 >> Mar 20 14:08:22 957189 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:08:23 080551 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000061 >> Mar 20 14:08:23 080621 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:08:23 102292 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000062 >> Mar 20 14:08:23 102372 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:08:23 124176 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000063 >> Mar 20 14:08:23 124278 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:08:23 285320 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000064 >> Mar 20 14:08:23 285393 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:08:23 403309 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000065 >> Mar 20 14:08:23 403388 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:08:23 425052 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000066 >> Mar 20 14:08:23 425117 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:08:23 447189 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000067 >> Mar 20 14:08:23 447266 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:08:23 535175 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:08:23 595127 [41401960] -> SUBNET UP >> Mar 20 14:08:23 750323 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000018 >> Mar 20 14:08:23 750432 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 26 times consecutively >> Mar 20 14:08:23 960490 [42803960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:08:24 014256 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:08:24 014323 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x67b9d >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][6] >> Return path: [0][9][18][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:08:24 014398 [41401960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:08:24 014408 [41401960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:08:24 014422 [41401960] -> Capabilities Mask: >> Mar 20 14:08:24 019479 [41401960] -> SUBNET UP >> Mar 20 14:11:00 201308 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F >> TID:0x0000000000000018 >> Mar 20 14:11:00 201580 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001F >> GID:0xfe80000000000000,0x0005ad0000027c56 >> Mar 20 14:11:00 554517 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:11:00 554538 [41E02960] -> Removed port with >> GUID:0x0005ad000002516f LID range [0xBA,0xBA] of node:saguaro-24-1 HCA-1 >> Mar 20 14:11:00 589140 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:11:00 641315 [45A08960] -> SUBNET UP >> Mar 20 14:14:16 904140 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000068 >> Mar 20 14:14:16 904369 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:14:16 904462 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000004f >> Mar 20 14:14:16 904600 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:14:17 210726 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 210747 [41401960] -> Removed port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:14:17 210796 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 210802 [41401960] -> Removed port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:14:17 210818 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 210836 [41401960] -> Removed port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 20 14:14:17 210864 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 210869 [41401960] -> Removed port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:14:17 210885 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 210890 [41401960] -> Removed port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:14:17 210908 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 210913 [41401960] -> Removed port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 20 14:14:17 210931 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 210936 [41401960] -> Removed port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:14:17 211090 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 211096 [41401960] -> Removed port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 20 14:14:17 211127 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 211133 [41401960] -> Removed port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:14:17 211147 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 211153 [41401960] -> Removed port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 20 14:14:17 211169 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 211174 [41401960] -> Removed port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:14:17 211189 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 211194 [41401960] -> Removed port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 20 14:14:17 211212 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 211216 [41401960] -> Removed port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:14:17 211232 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 211237 [41401960] -> Removed port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:14:17 211253 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 211317 [41401960] -> Removed port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:14:17 211333 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:14:17 211338 [41401960] -> Removed port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 20 14:14:17 244432 [41401960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:14:17 292747 [42803960] -> SUBNET UP >> Mar 20 14:14:17 698554 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:14:17 750419 [44606960] -> SUBNET UP >> Mar 20 14:15:11 300343 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000050 >> Mar 20 14:15:11 300577 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:15:11 306375 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000069 >> Mar 20 14:15:11 306439 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000051 >> Mar 20 14:15:11 306487 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:15:11 306514 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:15:11 312487 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000006a >> Mar 20 14:15:11 312581 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:15:11 636546 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636559 [45007960] -> Discovered new port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 20 14:15:11 636565 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636572 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 20 14:15:11 636577 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636584 [45007960] -> Discovered new port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:15:11 636589 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636595 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:15:11 636600 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636606 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:15:11 636612 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636618 [45007960] -> Discovered new port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:15:11 636623 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636629 [45007960] -> Discovered new port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:15:11 636634 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636641 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:15:11 636646 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636652 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:15:11 636657 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636663 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 20 14:15:11 636668 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636675 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 20 14:15:11 636680 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636686 [45007960] -> Discovered new port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 20 14:15:11 636691 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636698 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:15:11 636703 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636709 [45007960] -> Discovered new port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 20 14:15:11 636742 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636750 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:15:11 636755 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:11 636761 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:15:11 667436 [45007960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:15:11 731917 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:15:11 732017 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x6b507 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x13 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][16][1][4] >> Return path: [0][9][13][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:15:11 732102 [41401960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:15:11 732106 [41401960] -> PortInfo dump: >> port number.............0x13 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:15:11 732128 [41401960] -> Capabilities Mask: >> Mar 20 14:15:11 732160 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:15:11 732185 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x6b508 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][16][1][4] >> Return path: [0][9][13][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:15:11 732254 [44606960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:15:11 732258 [44606960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:15:11 732269 [44606960] -> Capabilities Mask: >> Mar 20 14:15:11 732300 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:15:11 732334 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x6b509 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][16][1][4] >> Return path: [0][9][13][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:15:11 732420 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:15:11 732419 [45007960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:15:11 732451 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x6b50a >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][16][1][4] >> Return path: [0][9][13][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:15:11 732447 [45007960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:15:11 732471 [45007960] -> Capabilities Mask: >> Mar 20 14:15:11 732511 [45007960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:15:11 732516 [45007960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:15:11 732529 [45007960] -> Capabilities Mask: >> Mar 20 14:15:11 732556 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:15:11 732591 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x6b50b >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][2][5] >> Return path: [0][9][18][E][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:15:11 732653 [43204960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:15:11 732662 [43204960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:15:11 732673 [43204960] -> Capabilities Mask: >> Mar 20 14:15:11 732705 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:15:11 732739 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x6b50c >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][2][5] >> Return path: [0][9][18][E][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:15:11 732809 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:15:11 732805 [41E02960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:15:11 732839 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x6b50d >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][2][5] >> Return path: [0][9][18][E][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:15:11 732837 [41E02960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:15:11 732856 [41E02960] -> Capabilities Mask: >> Mar 20 14:15:11 732898 [41E02960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:15:11 732911 [41E02960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:15:11 732925 [41E02960] -> Capabilities Mask: >> Mar 20 14:15:11 738354 [45A08960] -> SUBNET UP >> Mar 20 14:15:12 115658 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:15:12 172029 [44606960] -> SUBNET UP >> Mar 20 14:15:27 277617 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000000 >> Mar 20 14:15:27 277863 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:15:27 510410 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000001 >> Mar 20 14:15:27 510626 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:15:27 532239 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000002 >> Mar 20 14:15:27 532443 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:15:27 533517 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000003 >> Mar 20 14:15:27 533612 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:15:27 591171 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:27 591185 [41401960] -> Removed port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 20 14:15:27 591206 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:27 591211 [41401960] -> Removed port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:15:27 625811 [41401960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:15:27 668356 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000004 >> Mar 20 14:15:27 668485 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:15:27 682282 [43204960] -> SUBNET UP >> Mar 20 14:15:27 737313 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000005 >> Mar 20 14:15:27 737387 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:15:27 809341 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000006 >> Mar 20 14:15:27 809813 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:15:27 998181 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000007 >> Mar 20 14:15:27 998331 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:15:28 012193 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000008 >> Mar 20 14:15:28 012277 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:15:28 496329 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000009 >> Mar 20 14:15:28 496422 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:15:28 624912 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:28 624940 [43C05960] -> Removed port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:15:28 624965 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:28 624972 [43C05960] -> Removed port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:15:28 625001 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:28 625008 [43C05960] -> Removed port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:15:28 629507 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:28 629518 [42803960] -> Discovered new port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:15:28 649776 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000a >> Mar 20 14:15:28 660297 [42803960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:15:28 699777 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:15:28 716354 [41E02960] -> SUBNET UP >> Mar 20 14:15:28 744686 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000b >> Mar 20 14:15:28 744857 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 11 times consecutively >> Mar 20 14:15:28 811329 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000c >> Mar 20 14:15:28 811392 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 12 times consecutively >> Mar 20 14:15:28 999808 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000d >> Mar 20 14:15:28 999881 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 13 times consecutively >> Mar 20 14:15:29 029918 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000e >> Mar 20 14:15:29 029969 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 14 times consecutively >> Mar 20 14:15:29 031783 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000052 >> Mar 20 14:15:29 031900 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:15:29 037646 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:29 037662 [44606960] -> Removed port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 20 14:15:29 037683 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:29 037690 [44606960] -> Removed port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:15:29 037721 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:29 037726 [44606960] -> Removed port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 20 14:15:29 037741 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:29 037746 [44606960] -> Removed port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:15:29 037766 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:29 037771 [44606960] -> Removed port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 20 14:15:29 361560 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000053 >> Mar 20 14:15:29 361622 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:15:29 433665 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:29 433674 [43204960] -> Discovered new port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 20 14:15:29 433680 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:29 433687 [43204960] -> Discovered new port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:15:29 433692 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:29 433698 [43204960] -> Discovered new port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:15:29 433703 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:29 433709 [43204960] -> Discovered new port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:15:29 464434 [43204960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:15:29 522011 [42803960] -> SUBNET UP >> Mar 20 14:15:29 699605 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000006b >> Mar 20 14:15:29 699782 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:15:29 701115 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000054 >> Mar 20 14:15:29 701301 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:15:29 818974 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000055 >> Mar 20 14:15:29 819054 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:15:29 992006 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000056 >> Mar 20 14:15:29 992080 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:15:30 184132 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000006c >> Mar 20 14:15:30 184205 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:15:30 207030 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000057 >> Mar 20 14:15:30 207101 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:15:30 250541 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000006d >> Mar 20 14:15:30 250635 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:15:30 317366 [45A08960] -> osm_drop_mgr_process: ERR 0108: Unknown >> remote side for node 0x0005ad00000281a7 port 22. Adding to light sweep >> sampling list >> Mar 20 14:15:30 317409 [45A08960] -> Directed Path Dump of 4 hop path: >> Path = [0][1][17][1][4] >> Mar 20 14:15:30 494183 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000006e >> Mar 20 14:15:30 494247 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:15:30 521869 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:30 521879 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 20 14:15:30 521885 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:30 521891 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 20 14:15:30 521896 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:30 521903 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:15:30 521908 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:30 521914 [43C05960] -> Discovered new port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 20 14:15:30 521919 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:15:30 521926 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:15:30 552581 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:15:30 553014 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000006f >> Mar 20 14:15:30 592863 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:15:30 607595 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:15:30 607666 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x6f744 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][14][1][6] >> Return path: [0][9][15][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:15:30 607770 [44606960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:15:30 607777 [44606960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:15:30 607794 [44606960] -> Capabilities Mask: >> Mar 20 14:15:30 607914 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:15:30 607958 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x6f745 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][14][1][6] >> Return path: [0][9][15][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:15:30 608014 [43204960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:15:30 608018 [43204960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:15:30 608031 [43204960] -> Capabilities Mask: >> Mar 20 14:15:30 613309 [41E02960] -> SUBNET UP >> Mar 20 14:15:30 995108 [41401960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:15:31 050102 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:15:31 050180 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x70486 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][4] >> Return path: [0][9][18][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:15:31 050233 [45A08960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:15:31 050238 [45A08960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:15:31 050251 [45A08960] -> Capabilities Mask: >> Mar 20 14:15:31 055273 [42803960] -> SUBNET UP >> Mar 20 14:15:31 106129 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000e >> Mar 20 14:15:31 106193 [41401960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 15 times consecutively >> Mar 20 14:17:18 456260 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000058 >> Mar 20 14:17:18 456512 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:17:18 456649 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000070 >> Mar 20 14:17:18 456761 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:17:18 769730 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 769751 [45007960] -> Removed port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:17:18 769773 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 769780 [45007960] -> Removed port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:17:18 769803 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 769809 [45007960] -> Removed port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 20 14:17:18 769832 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 769838 [45007960] -> Removed port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:17:18 769858 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 769865 [45007960] -> Removed port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:17:18 769888 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 769895 [45007960] -> Removed port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 20 14:17:18 769927 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 769932 [45007960] -> Removed port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:17:18 770075 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 770081 [45007960] -> Removed port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 20 14:17:18 770109 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 770114 [45007960] -> Removed port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:17:18 770130 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 770135 [45007960] -> Removed port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 20 14:17:18 770150 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 770155 [45007960] -> Removed port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:17:18 770171 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 770176 [45007960] -> Removed port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 20 14:17:18 770193 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 770198 [45007960] -> Removed port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:17:18 770216 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 770221 [45007960] -> Removed port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:17:18 770238 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 770301 [45007960] -> Removed port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:17:18 770318 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:17:18 770323 [45007960] -> Removed port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 20 14:17:18 803377 [45007960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:17:18 855545 [44606960] -> SUBNET UP >> Mar 20 14:17:19 249722 [43204960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:17:19 300999 [45A08960] -> SUBNET UP >> Mar 20 14:18:11 663850 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000059 >> Mar 20 14:18:11 664195 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:18:11 670836 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000071 >> Mar 20 14:18:11 670964 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000005a >> Mar 20 14:18:11 671199 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:18:11 672933 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:18:11 677654 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000072 >> Mar 20 14:18:11 677826 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:18:12 026661 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026675 [44606960] -> Discovered new port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 20 14:18:12 026681 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026688 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 20 14:18:12 026693 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026700 [44606960] -> Discovered new port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:18:12 026705 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026711 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:18:12 026716 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026723 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:18:12 026727 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026740 [44606960] -> Discovered new port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:18:12 026745 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026751 [44606960] -> Discovered new port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:18:12 026758 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026764 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:18:12 026769 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026776 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:18:12 026781 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026787 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 20 14:18:12 026792 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026798 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 20 14:18:12 026803 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026809 [44606960] -> Discovered new port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 20 14:18:12 026814 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026821 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:18:12 026826 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026832 [44606960] -> Discovered new port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 20 14:18:12 026869 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026877 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:18:12 026882 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:12 026888 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:18:12 057534 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:18:12 133316 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:18:12 133419 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x72d97 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][14][3][6] >> Return path: [0][9][15][F][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:18:12 133466 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:18:12 133467 [43204960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:18:12 133478 [43204960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:18:12 133490 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x72d98 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][14][3][6] >> Return path: [0][9][15][F][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:18:12 133493 [43204960] -> Capabilities Mask: >> Mar 20 14:18:12 133566 [45A08960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:18:12 133595 [45A08960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:18:12 133614 [45A08960] -> Capabilities Mask: >> Mar 20 14:18:12 133583 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:18:12 133671 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x72d99 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][14][3][6] >> Return path: [0][9][15][F][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:18:12 133760 [43C05960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:18:12 133788 [43C05960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:18:12 133807 [43C05960] -> Capabilities Mask: >> Mar 20 14:18:12 139330 [41401960] -> SUBNET UP >> Mar 20 14:18:12 496444 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:18:12 558965 [41401960] -> SUBNET UP >> Mar 20 14:18:27 748551 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000000 >> Mar 20 14:18:27 748795 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:18:27 888669 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000001 >> Mar 20 14:18:27 888902 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:18:27 910605 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000002 >> Mar 20 14:18:27 910710 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:18:27 911951 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000003 >> Mar 20 14:18:27 912119 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:18:28 012957 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000004 >> Mar 20 14:18:28 013058 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:18:28 075266 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000005 >> Mar 20 14:18:28 075397 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:18:28 259000 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000006 >> Mar 20 14:18:28 259121 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:18:28 308865 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000007 >> Mar 20 14:18:28 309000 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:18:28 330606 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000008 >> Mar 20 14:18:28 330714 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:18:28 444107 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000009 >> Mar 20 14:18:28 444191 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:18:28 466156 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000a >> Mar 20 14:18:28 466234 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:18:28 478021 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000b >> Mar 20 14:18:28 478070 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 11 times consecutively >> Mar 20 14:18:28 489091 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:28 489106 [43204960] -> Removed port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:18:28 521430 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000c >> Mar 20 14:18:28 521499 [42803960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 12 times consecutively >> Mar 20 14:18:28 523658 [43204960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:18:28 580295 [43204960] -> SUBNET UP >> Mar 20 14:18:28 611805 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000d >> Mar 20 14:18:28 611893 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 13 times consecutively >> Mar 20 14:18:28 661292 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000e >> Mar 20 14:18:28 661351 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 14 times consecutively >> Mar 20 14:18:28 871670 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000f >> Mar 20 14:18:28 871732 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 15 times consecutively >> Mar 20 14:18:28 934440 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000010 >> Mar 20 14:18:28 934505 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 16 times consecutively >> Mar 20 14:18:28 941281 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:28 941303 [45A08960] -> Removed port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:18:28 941329 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:28 941336 [45A08960] -> Removed port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:18:28 941356 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:28 941363 [45A08960] -> Removed port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:18:28 941388 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:28 941395 [45A08960] -> Removed port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:18:28 941420 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:28 941426 [45A08960] -> Removed port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:18:28 945507 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:28 945515 [45A08960] -> Discovered new port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:18:28 956576 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000011 >> Mar 20 14:18:28 956665 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 17 times consecutively >> Mar 20 14:18:28 976211 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:18:29 033513 [42803960] -> SUBNET UP >> Mar 20 14:18:29 071283 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000012 >> Mar 20 14:18:29 071345 [41401960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 18 times consecutively >> Mar 20 14:18:29 352103 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000013 >> Mar 20 14:18:29 352155 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 19 times consecutively >> Mar 20 14:18:29 376386 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000014 >> Mar 20 14:18:29 376461 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 20 times consecutively >> Mar 20 14:18:29 420228 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000015 >> Mar 20 14:18:29 420282 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 21 times consecutively >> Mar 20 14:18:29 421294 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000016 >> Mar 20 14:18:29 421345 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 22 times consecutively >> Mar 20 14:18:29 461135 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000017 >> Mar 20 14:18:29 461179 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 23 times consecutively >> Mar 20 14:18:29 633008 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000018 >> Mar 20 14:18:29 633050 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000005b >> Mar 20 14:18:29 633062 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 24 times consecutively >> Mar 20 14:18:29 633350 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:18:29 733039 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000005c >> Mar 20 14:18:29 733238 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:18:29 947440 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:29 947452 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:18:29 947457 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:29 947462 [44606960] -> Discovered new port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:18:29 947465 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:29 947470 [44606960] -> Discovered new port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:18:29 947474 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:29 947479 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:18:29 947482 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:29 947487 [44606960] -> Discovered new port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:18:29 978182 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:18:30 027730 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:18:30 027819 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x762b8 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][4] >> Return path: [0][9][18][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:18:30 027897 [41401960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:18:30 027901 [41401960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:18:30 027914 [41401960] -> Capabilities Mask: >> Mar 20 14:18:30 027993 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:18:30 028043 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x762b9 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][4] >> Return path: [0][9][18][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:18:30 028098 [45A08960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:18:30 028109 [45A08960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:18:30 028124 [45A08960] -> Capabilities Mask: >> Mar 20 14:18:30 033824 [44606960] -> SUBNET UP >> Mar 20 14:18:30 418497 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:30 418522 [43C05960] -> Removed port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:18:30 453167 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:18:30 494719 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000005d >> Mar 20 14:18:30 494877 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:18:30 662496 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000073 >> Mar 20 14:18:30 662564 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:18:30 662645 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000005e >> Mar 20 14:18:30 662759 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:18:30 707085 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000005f >> Mar 20 14:18:30 707179 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:18:30 728948 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000060 >> Mar 20 14:18:30 729041 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:18:30 872332 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000061 >> Mar 20 14:18:30 872412 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:18:30 899764 [45A08960] -> SUBNET UP >> Mar 20 14:18:31 047423 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000062 >> Mar 20 14:18:31 047611 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:18:31 165201 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000063 >> Mar 20 14:18:31 165272 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:18:31 182461 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000074 >> Mar 20 14:18:31 182653 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:18:31 248834 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000075 >> Mar 20 14:18:31 248893 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:18:31 499830 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000076 >> Mar 20 14:18:31 499908 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:18:31 521824 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000077 >> Mar 20 14:18:31 521891 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:18:31 543713 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000078 >> Mar 20 14:18:31 543784 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:18:31 589490 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:18:31 589499 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:18:31 620166 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:18:31 672647 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:18:31 672739 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x77d11 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][4] >> Return path: [0][9][18][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:18:31 672817 [43C05960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:18:31 672823 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:18:31 672833 [43C05960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:18:31 672852 [43C05960] -> Capabilities Mask: >> Mar 20 14:18:31 672861 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x77d12 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][4] >> Return path: [0][9][18][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:18:31 672918 [45007960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:18:31 672922 [45007960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:18:31 672936 [45007960] -> Capabilities Mask: >> Mar 20 14:18:31 678085 [45A08960] -> SUBNET UP >> Mar 20 14:18:31 723715 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000018 >> Mar 20 14:18:31 723815 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 25 times consecutively >> Mar 20 14:18:32 061932 [41401960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:18:32 113545 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:18:32 113610 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x78a4d >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x13 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][15][4][4] >> Return path: [0][9][18][D][4] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:18:32 113712 [42803960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:18:32 113725 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:18:32 113730 [42803960] -> PortInfo dump: >> port number.............0x13 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x4 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:18:32 113745 [42803960] -> Capabilities Mask: >> Mar 20 14:18:32 113751 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x78a4e >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][15][4][4] >> Return path: [0][9][18][D][4] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:18:32 113803 [44606960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:18:32 113807 [44606960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x4 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:18:32 113820 [44606960] -> Capabilities Mask: >> Mar 20 14:18:32 113845 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:18:32 113907 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x78a4f >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][15][4][4] >> Return path: [0][9][18][D][4] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:18:32 113958 [41E02960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:18:32 113963 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:18:32 113969 [41E02960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x4 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:18:32 113986 [41E02960] -> Capabilities Mask: >> Mar 20 14:18:32 113992 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x78a50 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][15][4][4] >> Return path: [0][9][18][D][4] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:18:32 114052 [45A08960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:18:32 114066 [45A08960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x4 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:18:32 114089 [45A08960] -> Capabilities Mask: >> Mar 20 14:18:32 114052 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:18:32 114171 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x78a51 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][13][1][6] >> Return path: [0][9][13][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:18:32 114224 [42803960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:18:32 114228 [42803960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:18:32 114242 [42803960] -> Capabilities Mask: >> Mar 20 14:18:32 119326 [42803960] -> SUBNET UP >> Mar 20 14:23:02 506774 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000019 >> Mar 20 14:23:02 507064 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:23:02 861642 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:23:02 861653 [43204960] -> Discovered new port with >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:Topspin IB-DC >> Mar 20 14:23:02 893030 [43204960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:23:02 943693 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:23:02 943765 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x5 >> trans_id................0x79aff >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x1 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][5][18] >> Return path: [0][9][18][D][2][13] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 13 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 2C 4C 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:23:02 943854 [43C05960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:23:02 943870 [43C05960] -> PortInfo dump: >> port number.............0x1 >> node_guid...............0x0005ad0000027c84 >> port_guid...............0x0005ad0000027c84 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x13 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0x2C >> vl_enforce..............0x4C >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:23:02 943886 [43C05960] -> Capabilities Mask: >> Mar 20 14:23:02 948898 [43C05960] -> SUBNET UP >> Mar 20 14:23:03 237496 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x00AF >> TID:0x0000000000000000 >> Mar 20 14:23:03 237710 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:4 num:144 from LID:0x00AF >> GID:0xfe80000000000000,0x0005ad0000024b27 >> Mar 20 14:23:03 605548 [45007960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:23:03 662757 [41401960] -> SUBNET UP >> Mar 20 14:24:54 675782 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000079 >> Mar 20 14:24:54 676077 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:24:54 677026 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000064 >> Mar 20 14:24:54 677118 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:24:55 047478 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 047501 [43204960] -> Removed port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:24:55 047520 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 047525 [43204960] -> Removed port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:24:55 047541 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 047546 [43204960] -> Removed port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 20 14:24:55 047563 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 047569 [43204960] -> Removed port with >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:Topspin IB-DC >> Mar 20 14:24:55 047586 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 047591 [43204960] -> Removed port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:24:55 047607 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 047612 [43204960] -> Removed port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:24:55 047630 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 047635 [43204960] -> Removed port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 20 14:24:55 047652 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 047657 [43204960] -> Removed port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:24:55 047798 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 047803 [43204960] -> Removed port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 20 14:24:55 047836 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 047842 [43204960] -> Removed port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:24:55 047857 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 047862 [43204960] -> Removed port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 20 14:24:55 047877 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 047882 [43204960] -> Removed port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:24:55 047896 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 047902 [43204960] -> Removed port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 20 14:24:55 047918 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 047923 [43204960] -> Removed port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:24:55 047939 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 047988 [43204960] -> Removed port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:24:55 048005 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 048010 [43204960] -> Removed port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:24:55 048025 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:24:55 048030 [43204960] -> Removed port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 20 14:24:55 081006 [43204960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:24:55 130875 [45A08960] -> SUBNET UP >> Mar 20 14:24:55 484995 [42803960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:24:55 535902 [42803960] -> SUBNET UP >> Mar 20 14:25:48 653788 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000065 >> Mar 20 14:25:48 654009 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:25:48 659749 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000007a >> Mar 20 14:25:48 659790 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000066 >> Mar 20 14:25:48 659814 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:25:48 659963 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:25:48 665972 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000007b >> Mar 20 14:25:48 666050 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:25:49 025384 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025396 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 20 14:25:49 025401 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025406 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 >> Mar 20 14:25:49 025410 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025416 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 20 14:25:49 025420 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025425 [41E02960] -> Discovered new port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:25:49 025428 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025433 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:25:49 025437 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025442 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:25:49 025446 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025451 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:25:49 025461 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025466 [41E02960] -> Discovered new port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:25:49 025470 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025475 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:25:49 025483 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025488 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 20 14:25:49 025491 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025496 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 20 14:25:49 025500 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025505 [41E02960] -> Discovered new port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 20 14:25:49 025508 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025513 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:25:49 025517 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025522 [41E02960] -> Discovered new port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 20 14:25:49 025556 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025562 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:25:49 025565 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025570 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:25:49 025574 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:25:49 025579 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:25:49 056324 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:25:49 126247 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:25:49 126356 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x7d165 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x13 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][15][1][6] >> Return path: [0][9][18][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:25:49 126409 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:25:49 126442 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x7d166 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][15][1][6] >> Return path: [0][9][18][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:25:49 126496 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:25:49 126489 [42803960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:25:49 126535 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x7d167 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][15][1][6] >> Return path: [0][9][18][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:25:49 126526 [42803960] -> PortInfo dump: >> port number.............0x13 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:25:49 126567 [42803960] -> Capabilities Mask: >> Mar 20 14:25:49 126613 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:25:49 126617 [42803960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:25:49 126658 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x7d168 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][15][1][6] >> Return path: [0][9][18][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:25:49 126653 [42803960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:25:49 126687 [42803960] -> Capabilities Mask: >> Mar 20 14:25:49 126703 [43204960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:25:49 126709 [43204960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:25:49 126744 [43204960] -> Capabilities Mask: >> Mar 20 14:25:49 126765 [43C05960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:25:49 126770 [43C05960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:25:49 126874 [43C05960] -> Capabilities Mask: >> Mar 20 14:25:49 126975 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:25:49 127015 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x7d169 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][13][1][6] >> Return path: [0][9][13][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:25:49 127066 [45A08960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:25:49 127072 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:25:49 127084 [45A08960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:25:49 127103 [45A08960] -> Capabilities Mask: >> Mar 20 14:25:49 127121 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x7d16a >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][13][1][6] >> Return path: [0][9][13][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:25:49 127188 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:25:49 127220 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x7d16b >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][13][1][6] >> Return path: [0][9][13][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:25:49 127326 [44606960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:25:49 127339 [44606960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:25:49 127357 [44606960] -> Capabilities Mask: >> Mar 20 14:25:49 127378 [45007960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:25:49 127397 [45007960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:25:49 127410 [45007960] -> Capabilities Mask: >> Mar 20 14:25:49 132961 [43204960] -> SUBNET UP >> Mar 20 14:25:49 523879 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:25:49 580522 [42803960] -> SUBNET UP >> Mar 20 14:26:04 718574 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000000 >> Mar 20 14:26:04 718819 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:26:04 836781 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000001 >> Mar 20 14:26:04 836881 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:26:04 858762 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000002 >> Mar 20 14:26:04 860242 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:26:04 997451 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000003 >> Mar 20 14:26:04 997647 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:26:05 180722 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000004 >> Mar 20 14:26:05 180855 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:26:05 209122 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000005 >> Mar 20 14:26:05 209200 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:26:05 347419 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000006 >> Mar 20 14:26:05 347488 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:26:05 378670 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000007 >> Mar 20 14:26:05 378739 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:26:05 409112 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:05 409121 [41401960] -> Removed port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:26:05 443639 [41401960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:26:05 483503 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000008 >> Mar 20 14:26:05 486002 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:26:05 499183 [44606960] -> SUBNET UP >> Mar 20 14:26:05 499856 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000009 >> Mar 20 14:26:05 499941 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:26:05 521857 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000a >> Mar 20 14:26:05 521971 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:26:05 532569 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000b >> Mar 20 14:26:05 532624 [41401960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 11 times consecutively >> Mar 20 14:26:05 633813 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000c >> Mar 20 14:26:05 633869 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 12 times consecutively >> Mar 20 14:26:05 655421 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000d >> Mar 20 14:26:05 655501 [41401960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 13 times consecutively >> Mar 20 14:26:05 702652 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000e >> Mar 20 14:26:05 702745 [42803960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 14 times consecutively >> Mar 20 14:26:05 817201 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:05 817216 [43204960] -> Removed port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:26:05 817235 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:05 817241 [43204960] -> Removed port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:26:05 817259 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:05 817264 [43204960] -> Removed port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:26:05 821322 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:05 821330 [41E02960] -> Discovered new port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:26:05 847950 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000f >> Mar 20 14:26:05 848031 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 15 times consecutively >> Mar 20 14:26:05 852036 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:26:05 893954 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000010 >> Mar 20 14:26:05 894021 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 16 times consecutively >> Mar 20 14:26:05 910489 [44606960] -> SUBNET UP >> Mar 20 14:26:05 999993 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000011 >> Mar 20 14:26:06 000039 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 17 times consecutively >> Mar 20 14:26:06 021880 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000012 >> Mar 20 14:26:06 021970 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 18 times consecutively >> Mar 20 14:26:06 043912 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000013 >> Mar 20 14:26:06 044001 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 19 times consecutively >> Mar 20 14:26:06 052878 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000014 >> Mar 20 14:26:06 052975 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 20 times consecutively >> Mar 20 14:26:06 147560 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000015 >> Mar 20 14:26:06 147616 [42803960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 21 times consecutively >> Mar 20 14:26:06 158945 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000016 >> Mar 20 14:26:06 158978 [41401960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 22 times consecutively >> Mar 20 14:26:06 346046 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000017 >> Mar 20 14:26:06 346106 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 23 times consecutively >> Mar 20 14:26:06 405311 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000018 >> Mar 20 14:26:06 405349 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 24 times consecutively >> Mar 20 14:26:06 632882 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000019 >> Mar 20 14:26:06 632923 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 25 times consecutively >> Mar 20 14:26:06 634031 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000067 >> Mar 20 14:26:06 634110 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:26:06 883831 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001a >> Mar 20 14:26:06 883879 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 26 times consecutively >> Mar 20 14:26:06 885475 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000068 >> Mar 20 14:26:06 885560 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:26:06 982877 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001b >> Mar 20 14:26:06 982926 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 27 times consecutively >> Mar 20 14:26:06 992809 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000069 >> Mar 20 14:26:06 992871 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:26:06 992909 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001c >> Mar 20 14:26:06 992943 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 28 times consecutively >> Mar 20 14:26:06 993058 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:06 993065 [41E02960] -> Discovered new port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:26:06 993069 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:06 993074 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:26:07 023890 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:26:07 085081 [41E02960] -> SUBNET UP >> Mar 20 14:26:07 348105 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001d >> Mar 20 14:26:07 348218 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 29 times consecutively >> Mar 20 14:26:07 348958 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000006a >> Mar 20 14:26:07 349041 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:26:07 540871 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000006b >> Mar 20 14:26:07 540983 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:26:07 541063 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000007c >> Mar 20 14:26:07 541131 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:26:07 585394 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000006c >> Mar 20 14:26:07 585464 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:26:07 607406 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000006d >> Mar 20 14:26:07 607486 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:26:07 850410 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000006e >> Mar 20 14:26:07 850483 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:26:07 956365 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000007d >> Mar 20 14:26:07 956404 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:07 956413 [42803960] -> Discovered new port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:26:07 987136 [42803960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:26:08 018887 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:26:08 032634 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:26:08 032679 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x813ce >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][12][4][5] >> Return path: [0][9][14][D][5] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 05 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:26:08 032749 [41E02960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:26:08 032757 [41E02960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x5 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:26:08 032774 [41E02960] -> Capabilities Mask: >> Mar 20 14:26:08 033119 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:26:08 033154 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x813cf >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][12][4][5] >> Return path: [0][9][14][D][5] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 05 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:26:08 033202 [43C05960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:26:08 033213 [43C05960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x5 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:26:08 033231 [43C05960] -> Capabilities Mask: >> Mar 20 14:26:08 038497 [45A08960] -> SUBNET UP >> Mar 20 14:26:08 055480 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000007e >> Mar 20 14:26:08 055587 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:26:08 372288 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000007f >> Mar 20 14:26:08 376158 [42803960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:26:08 418607 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000080 >> Mar 20 14:26:08 420668 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:26:08 420714 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:26:08 430046 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:26:08 430147 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x820fa >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][15][1][4] >> Return path: [0][9][18][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:26:08 430236 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:26:08 430236 [43C05960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:26:08 430267 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x820fb >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][12][1][6] >> Return path: [0][9][14][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:26:08 430262 [43C05960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:26:08 430286 [43C05960] -> Capabilities Mask: >> Mar 20 14:26:08 430350 [43C05960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:26:08 430362 [43C05960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:26:08 430375 [43C05960] -> Capabilities Mask: >> Mar 20 14:26:08 435317 [43C05960] -> SUBNET UP >> Mar 20 14:26:08 583769 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001e >> Mar 20 14:26:08 583903 [41401960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 30 times consecutively >> Mar 20 14:26:08 854841 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:26:08 913273 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:26:08 913349 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x82e32 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x13 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][17][2][5] >> Return path: [0][9][14][E][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:26:08 913415 [45A08960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:26:08 913432 [45A08960] -> PortInfo dump: >> port number.............0x13 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:26:08 913449 [45A08960] -> Capabilities Mask: >> Mar 20 14:26:08 913598 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:26:08 913676 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x82e33 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][17][2][5] >> Return path: [0][9][14][E][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:26:08 913727 [43C05960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:26:08 913732 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:26:08 913734 [43C05960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:26:08 913752 [43C05960] -> Capabilities Mask: >> Mar 20 14:26:08 913766 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x82e34 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][17][2][5] >> Return path: [0][9][14][E][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:26:08 913828 [41E02960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:26:08 913833 [41E02960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:26:08 913848 [41E02960] -> Capabilities Mask: >> Mar 20 14:26:08 918887 [41E02960] -> SUBNET UP >> Mar 20 14:26:48 657517 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000006f >> Mar 20 14:26:48 657779 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:26:48 658393 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000081 >> Mar 20 14:26:48 658465 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:26:48 979610 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 979629 [41401960] -> Removed port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:26:48 979652 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 979660 [41401960] -> Removed port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:26:48 979682 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 979688 [41401960] -> Removed port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 20 14:26:48 979721 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 979727 [41401960] -> Removed port with >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 >> Mar 20 14:26:48 979770 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 979782 [41401960] -> Removed port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:26:48 979799 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 979804 [41401960] -> Removed port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:26:48 979822 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 979827 [41401960] -> Removed port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 20 14:26:48 979845 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 979849 [41401960] -> Removed port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:26:48 980028 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 980033 [41401960] -> Removed port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 20 14:26:48 980061 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 980066 [41401960] -> Removed port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:26:48 980081 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 980087 [41401960] -> Removed port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 20 14:26:48 980102 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 980107 [41401960] -> Removed port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:26:48 980122 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 980127 [41401960] -> Removed port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 20 14:26:48 980143 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 980148 [41401960] -> Removed port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:26:48 980163 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 980239 [41401960] -> Removed port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:26:48 980256 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 980261 [41401960] -> Removed port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:26:48 980365 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:26:48 980371 [41401960] -> Removed port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 20 14:26:49 013365 [41401960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:26:49 065887 [43C05960] -> SUBNET UP >> Mar 20 14:26:49 407010 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:26:49 459477 [44606960] -> SUBNET UP >> Mar 20 14:27:42 754098 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000070 >> Mar 20 14:27:42 754349 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:27:42 760115 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000071 >> Mar 20 14:27:42 760178 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000082 >> Mar 20 14:27:42 760236 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:27:42 760406 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:27:42 766931 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000083 >> Mar 20 14:27:42 767049 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:27:43 085327 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085345 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 20 14:27:43 085349 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085355 [43C05960] -> Discovered new port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:27:43 085359 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085364 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:27:43 085368 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085373 [43C05960] -> Discovered new port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:27:43 085377 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085382 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 >> Mar 20 14:27:43 085386 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085390 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 20 14:27:43 085394 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085399 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:27:43 085403 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085407 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 20 14:27:43 085411 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085416 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:27:43 085420 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085425 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:27:43 085428 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085433 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:27:43 085437 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085442 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 20 14:27:43 085446 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085450 [43C05960] -> Discovered new port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 20 14:27:43 085454 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085459 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:27:43 085511 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085517 [43C05960] -> Discovered new port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 20 14:27:43 085521 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085526 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:27:43 085530 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:43 085534 [43C05960] -> Discovered new port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:27:43 116308 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:27:43 179935 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:27:43 179980 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x85669 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x13 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][16][1][4] >> Return path: [0][9][13][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:27:43 180019 [41401960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:27:43 180037 [41401960] -> PortInfo dump: >> port number.............0x13 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:27:43 180050 [41401960] -> Capabilities Mask: >> Mar 20 14:27:43 180092 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:27:43 180137 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x8566a >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][16][1][4] >> Return path: [0][9][13][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:27:43 180185 [44606960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:27:43 180189 [44606960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:27:43 180199 [44606960] -> Capabilities Mask: >> Mar 20 14:27:43 180239 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:27:43 180263 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x8566b >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][16][1][4] >> Return path: [0][9][13][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:27:43 180307 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:27:43 180319 [42803960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:27:43 180332 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x8566c >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][16][1][4] >> Return path: [0][9][13][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:27:43 180336 [42803960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:27:43 180364 [42803960] -> Capabilities Mask: >> Mar 20 14:27:43 180389 [42803960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:27:43 180410 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:27:43 180392 [42803960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:27:43 180415 [42803960] -> Capabilities Mask: >> Mar 20 14:27:43 180436 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x8566d >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][2][5] >> Return path: [0][9][18][E][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:27:43 180490 [41E02960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:27:43 180494 [41E02960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:27:43 180504 [41E02960] -> Capabilities Mask: >> Mar 20 14:27:43 180536 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:27:43 180560 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x8566e >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][2][5] >> Return path: [0][9][18][E][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:27:43 180606 [45007960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:27:43 180615 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:27:43 180634 [45007960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:27:43 180657 [45007960] -> Capabilities Mask: >> Mar 20 14:27:43 180678 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x8566f >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][2][5] >> Return path: [0][9][18][E][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:27:43 180769 [43C05960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:27:43 180775 [43C05960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:27:43 180789 [43C05960] -> Capabilities Mask: >> Mar 20 14:27:43 186228 [43204960] -> SUBNET UP >> Mar 20 14:27:43 557268 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:27:43 611082 [45A08960] -> SUBNET UP >> Mar 20 14:27:58 852744 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000000 >> Mar 20 14:27:58 852982 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:27:58 970772 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000001 >> Mar 20 14:27:58 970864 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:27:58 992628 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000002 >> Mar 20 14:27:58 992712 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:27:59 132331 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000003 >> Mar 20 14:27:59 132484 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:27:59 314893 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000004 >> Mar 20 14:27:59 315006 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:27:59 343241 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000005 >> Mar 20 14:27:59 343320 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:27:59 481698 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000006 >> Mar 20 14:27:59 481775 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:27:59 512746 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000007 >> Mar 20 14:27:59 512853 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:27:59 548851 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:59 548861 [41E02960] -> Removed port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:27:59 583414 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:27:59 583817 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000008 >> Mar 20 14:27:59 623971 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:27:59 626182 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000009 >> Mar 20 14:27:59 626329 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:27:59 634080 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000a >> Mar 20 14:27:59 634442 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:27:59 641962 [45A08960] -> SUBNET UP >> Mar 20 14:27:59 656231 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000b >> Mar 20 14:27:59 656307 [41401960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 11 times consecutively >> Mar 20 14:27:59 689788 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000c >> Mar 20 14:27:59 690249 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 12 times consecutively >> Mar 20 14:27:59 758521 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000d >> Mar 20 14:27:59 758646 [42803960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 13 times consecutively >> Mar 20 14:27:59 970740 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000e >> Mar 20 14:27:59 970812 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 14 times consecutively >> Mar 20 14:27:59 985557 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:59 985577 [41E02960] -> Removed port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:27:59 985601 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:59 985615 [41E02960] -> Removed port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:27:59 985649 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:59 985656 [41E02960] -> Removed port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:27:59 989767 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:27:59 989787 [42803960] -> Discovered new port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:28:00 014445 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000f >> Mar 20 14:28:00 014524 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 15 times consecutively >> Mar 20 14:28:00 020896 [42803960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:28:00 086824 [43204960] -> SUBNET UP >> Mar 20 14:28:00 124057 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000010 >> Mar 20 14:28:00 124108 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 16 times consecutively >> Mar 20 14:28:00 131596 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000011 >> Mar 20 14:28:00 131643 [41401960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 17 times consecutively >> Mar 20 14:28:00 412484 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000012 >> Mar 20 14:28:00 412528 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 18 times consecutively >> Mar 20 14:28:00 436877 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000013 >> Mar 20 14:28:00 436921 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 19 times consecutively >> Mar 20 14:28:00 458745 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000014 >> Mar 20 14:28:00 458816 [42803960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 20 times consecutively >> Mar 20 14:28:00 480551 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000015 >> Mar 20 14:28:00 480599 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 21 times consecutively >> Mar 20 14:28:00 695340 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000016 >> Mar 20 14:28:00 695386 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 22 times consecutively >> Mar 20 14:28:00 695726 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000072 >> Mar 20 14:28:00 695886 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:28:00 719764 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000017 >> Mar 20 14:28:00 719825 [41401960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 23 times consecutively >> Mar 20 14:28:00 743680 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000018 >> Mar 20 14:28:00 743775 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 24 times consecutively >> Mar 20 14:28:00 763599 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000019 >> Mar 20 14:28:00 763654 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 25 times consecutively >> Mar 20 14:28:00 813393 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001a >> Mar 20 14:28:00 813473 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 26 times consecutively >> Mar 20 14:28:00 831287 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001b >> Mar 20 14:28:00 831302 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000073 >> Mar 20 14:28:00 831383 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 27 times consecutively >> Mar 20 14:28:00 831424 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:28:00 841593 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001c >> Mar 20 14:28:00 841644 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 28 times consecutively >> Mar 20 14:28:01 050511 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:28:01 050529 [41E02960] -> Discovered new port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:28:01 050535 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:28:01 050542 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:28:01 050547 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:28:01 050554 [41E02960] -> Discovered new port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:28:01 081322 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:28:01 142873 [43204960] -> SUBNET UP >> Mar 20 14:28:01 460275 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001d >> Mar 20 14:28:01 460358 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 29 times consecutively >> Mar 20 14:28:01 488474 [45007960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:28:01 538634 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:28:01 538712 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x898d1 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][6] >> Return path: [0][9][18][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:28:01 538752 [42803960] -> osm_pi_rcv_process_set: ERR 0F10: >> Received error status for SetResp() >> Mar 20 14:28:01 538758 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:28:01 538767 [42803960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............DOWN >> state_info2.............0x42 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:28:01 538795 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x898d2 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][6] >> Return path: [0][9][18][D][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:28:01 538810 [42803960] -> Capabilities Mask: >> Mar 20 14:28:01 538849 [42803960] -> osm_pi_rcv_process_set: ERR 0F10: >> Received error status for SetResp() >> Mar 20 14:28:01 538856 [42803960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............DOWN >> state_info2.............0x42 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:28:01 538871 [42803960] -> Capabilities Mask: >> Mar 20 14:28:01 539658 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:28:01 539696 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x5 >> trans_id................0x898d3 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x11 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][15][1][4][17] >> Return path: [0][9][18][D][1][16] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 16 02 03 02 >> >> 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:28:01 539778 [45A08960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:28:01 539784 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:28:01 539798 [45A08960] -> PortInfo dump: >> port number.............0x11 >> node_guid...............0x0005ad0000027c84 >> port_guid...............0x0005ad0000027c84 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x16 >> link_width_enabled......0x2 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............DOWN >> state_info2.............0x42 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:28:01 539834 [45A08960] -> Capabilities Mask: >> Mar 20 14:28:01 539844 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x5 >> trans_id................0x898d4 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x12 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][15][1][4][17] >> Return path: [0][9][18][D][1][16] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 16 02 03 02 >> >> 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:28:01 539903 [45007960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:28:01 539908 [45007960] -> PortInfo dump: >> port number.............0x12 >> node_guid...............0x0005ad0000027c84 >> port_guid...............0x0005ad0000027c84 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x16 >> link_width_enabled......0x2 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............DOWN >> state_info2.............0x42 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:28:01 539924 [45007960] -> Capabilities Mask: >> Mar 20 14:28:01 545091 [45007960] -> SUBNET UP >> Mar 20 14:28:01 652647 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000084 >> Mar 20 14:28:01 652864 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:28:01 879631 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000074 >> Mar 20 14:28:01 880104 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:28:01 962839 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000075 >> Mar 20 14:28:01 965155 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:28:02 006432 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000085 >> Mar 20 14:28:02 030610 [42803960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:28:02 068999 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:28:02 081130 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:28:02 081198 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x8a604 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][4][4] >> Return path: [0][9][18][D][4] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:28:02 081249 [43204960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:28:02 081257 [43204960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x4 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:28:02 081275 [43204960] -> Capabilities Mask: >> Mar 20 14:28:02 081650 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:28:02 081713 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x8a605 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][4][4] >> Return path: [0][9][18][D][4] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:28:02 081782 [43C05960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:28:02 081787 [43C05960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x4 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:28:02 081802 [43C05960] -> Capabilities Mask: >> Mar 20 14:28:02 087435 [45A08960] -> SUBNET UP >> Mar 20 14:28:02 170696 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000086 >> Mar 20 14:28:02 170819 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:28:02 459228 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:28:02 500761 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000087 >> Mar 20 14:28:02 500979 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:28:02 510190 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:28:02 510258 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x8b330 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][17][1][5] >> Return path: [0][9][14][D][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:28:02 510366 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:28:02 510370 [45007960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:28:02 510384 [45007960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:28:02 510394 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x8b331 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][3][6] >> Return path: [0][9][18][F][3] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:28:02 510398 [45007960] -> Capabilities Mask: >> Mar 20 14:28:02 510481 [41401960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:28:02 510491 [41401960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x3 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:28:02 510509 [41401960] -> Capabilities Mask: >> Mar 20 14:28:02 510511 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000088 >> Mar 20 14:28:02 515576 [41401960] -> SUBNET UP >> Mar 20 14:28:02 515695 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:28:02 532552 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000089 >> Mar 20 14:28:02 538569 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:28:02 695997 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001e >> Mar 20 14:28:02 696096 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 30 times consecutively >> Mar 20 14:28:02 918226 [45007960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:28:02 975244 [43204960] -> SUBNET UP >> Mar 20 14:28:03 325494 [43204960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:28:03 379145 [41401960] -> SUBNET UP >> Mar 20 14:29:12 561841 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F >> TID:0x0000000000000019 >> Mar 20 14:29:12 562033 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001F >> GID:0xfe80000000000000,0x0005ad0000027c56 >> Mar 20 14:29:12 562751 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F >> TID:0x000000000000001a >> Mar 20 14:29:12 562902 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001F >> GID:0xfe80000000000000,0x0005ad0000027c56 >> Mar 20 14:29:12 571346 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0084 >> TID:0x0000000000000018 >> Mar 20 14:29:12 571569 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0084 >> GID:0xfe80000000000000,0x0005ad0000027c70 >> Mar 20 14:29:12 914371 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F >> TID:0x000000000000001b >> Mar 20 14:29:12 916287 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:12 916297 [44606960] -> Discovered new port with >> GUID:0x0005ad000002502f LID range [0x2,0x2] of node:Topspin IB-DC >> Mar 20 14:29:12 946985 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:29:12 976839 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001F >> GID:0xfe80000000000000,0x0005ad0000027c56 >> Mar 20 14:29:12 987963 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:29:12 988004 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x5 >> trans_id................0x8dbb2 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0xD >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][2][4][D] >> Return path: [0][9][18][E][1][12] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 12 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 2C 4C 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:29:12 988078 [43C05960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:29:12 988089 [43C05960] -> PortInfo dump: >> port number.............0xD >> node_guid...............0x0005ad0000027c70 >> port_guid...............0x0005ad0000027c70 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x12 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0x2C >> vl_enforce..............0x4C >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:29:12 988105 [43C05960] -> Capabilities Mask: >> Mar 20 14:29:12 993136 [44606960] -> SUBNET UP >> Mar 20 14:29:13 300755 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x0002 >> TID:0x0000000000000000 >> Mar 20 14:29:13 300874 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:4 num:144 from LID:0x0002 >> GID:0xfe80000000000000,0x0005ad000002502f >> Mar 20 14:29:13 338077 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:13 338099 [41E02960] -> Discovered new port with >> GUID:0x0005ad000002516f LID range [0xBA,0xBA] of node:Topspin IB-DC >> Mar 20 14:29:13 368879 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:29:13 431763 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:29:13 431806 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x5 >> trans_id................0x8e8e9 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0xA >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][14][1][6][12] >> Return path: [0][9][15][D][3][11] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 11 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 2C 4C 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:29:13 432093 [43204960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:29:13 432116 [43204960] -> PortInfo dump: >> port number.............0xA >> node_guid...............0x0005ad0000027c56 >> port_guid...............0x0005ad0000027c56 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x11 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0x2C >> vl_enforce..............0x4C >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:29:13 432135 [43204960] -> Capabilities Mask: >> Mar 20 14:29:13 437219 [45007960] -> SUBNET UP >> Mar 20 14:29:13 690992 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x00BA >> TID:0x0000000000000000 >> Mar 20 14:29:13 691128 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:4 num:144 from LID:0x00BA >> GID:0xfe80000000000000,0x0005ad000002516f >> Mar 20 14:29:13 835017 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:29:13 891082 [42803960] -> SUBNET UP >> Mar 20 14:29:14 235714 [42803960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:29:14 289127 [41E02960] -> SUBNET UP >> Mar 20 14:29:17 689267 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000008a >> Mar 20 14:29:17 689511 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:29:17 689975 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000076 >> Mar 20 14:29:17 690097 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:29:18 025237 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025255 [44606960] -> Removed port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:29:18 025273 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025279 [44606960] -> Removed port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:29:18 025296 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025300 [44606960] -> Removed port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 20 14:29:18 025317 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025323 [44606960] -> Removed port with >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 >> Mar 20 14:29:18 025340 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025345 [44606960] -> Removed port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:29:18 025362 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025367 [44606960] -> Removed port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:29:18 025385 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025390 [44606960] -> Removed port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 20 14:29:18 025406 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025411 [44606960] -> Removed port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:29:18 025571 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025576 [44606960] -> Removed port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 20 14:29:18 025612 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025619 [44606960] -> Removed port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:29:18 025634 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025639 [44606960] -> Removed port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 20 14:29:18 025655 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025660 [44606960] -> Removed port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:29:18 025678 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025683 [44606960] -> Removed port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 20 14:29:18 025700 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025705 [44606960] -> Removed port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:29:18 025721 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025777 [44606960] -> Removed port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:29:18 025794 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025800 [44606960] -> Removed port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:29:18 025816 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:29:18 025821 [44606960] -> Removed port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 20 14:29:18 058968 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:29:18 112970 [43C05960] -> SUBNET UP >> Mar 20 14:29:18 511156 [45007960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:29:18 573846 [41E02960] -> SUBNET UP >> Mar 20 14:30:11 599965 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000077 >> Mar 20 14:30:11 600182 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:30:11 606044 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000078 >> Mar 20 14:30:11 606078 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000008b >> Mar 20 14:30:11 606178 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:30:11 606207 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:30:11 612375 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000008c >> Mar 20 14:30:11 612499 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:30:11 947057 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947074 [45007960] -> Discovered new port with >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 >> Mar 20 14:30:11 947079 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947084 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 >> Mar 20 14:30:11 947088 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947093 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 >> Mar 20 14:30:11 947097 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947102 [45007960] -> Discovered new port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:30:11 947106 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947118 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:30:11 947138 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947143 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:30:11 947148 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947153 [45007960] -> Discovered new port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:30:11 947157 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947162 [45007960] -> Discovered new port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:30:11 947166 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947170 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:30:11 947174 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947179 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:30:11 947183 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947188 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 >> Mar 20 14:30:11 947191 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947196 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 >> Mar 20 14:30:11 947200 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947205 [45007960] -> Discovered new port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 20 14:30:11 947209 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947214 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 >> Mar 20 14:30:11 947282 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947288 [45007960] -> Discovered new port with >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 >> Mar 20 14:30:11 947291 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947296 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 >> Mar 20 14:30:11 947300 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:11 947305 [45007960] -> Discovered new port with >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 >> Mar 20 14:30:11 978149 [45007960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:30:12 042474 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:30:12 042577 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x92b38 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x13 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][16][1][5] >> Return path: [0][9][13][D][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:30:12 042668 [45007960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:30:12 042676 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:30:12 042682 [45007960] -> PortInfo dump: >> port number.............0x13 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:30:12 042701 [45007960] -> Capabilities Mask: >> Mar 20 14:30:12 042714 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x92b39 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][16][1][5] >> Return path: [0][9][13][D][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:30:12 042845 [41401960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:30:12 042856 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:30:12 042851 [41401960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:30:12 042897 [41401960] -> Capabilities Mask: >> Mar 20 14:30:12 042907 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x92b3a >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][16][1][5] >> Return path: [0][9][13][D][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:30:12 043013 [43204960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:30:12 043015 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:30:12 043038 [43204960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:30:12 043090 [43204960] -> Capabilities Mask: >> Mar 20 14:30:12 043094 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x92b3b >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][16][1][5] >> Return path: [0][9][13][D][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:30:12 043173 [44606960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:30:12 043178 [44606960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:30:12 043191 [44606960] -> Capabilities Mask: >> Mar 20 14:30:12 043222 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:30:12 043247 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x92b3c >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][12][1][4] >> Return path: [0][9][14][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:30:12 043318 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:30:12 043314 [41E02960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:30:12 043357 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x92b3d >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][12][1][4] >> Return path: [0][9][14][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:30:12 043367 [41E02960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:30:12 043422 [41E02960] -> Capabilities Mask: >> Mar 20 14:30:12 043513 [43C05960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:30:12 043518 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:30:12 043519 [43C05960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:30:12 043535 [43C05960] -> Capabilities Mask: >> Mar 20 14:30:12 043553 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x92b3e >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][12][1][4] >> Return path: [0][9][14][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:30:12 043658 [42803960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:30:12 043663 [42803960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:30:12 043678 [42803960] -> Capabilities Mask: >> Mar 20 14:30:12 049088 [43204960] -> SUBNET UP >> Mar 20 14:30:12 442903 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:30:12 497312 [45007960] -> SUBNET UP >> Mar 20 14:30:27 571421 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000000 >> Mar 20 14:30:27 571674 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:30:27 782498 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000001 >> Mar 20 14:30:27 782616 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:30:27 804302 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000002 >> Mar 20 14:30:27 805103 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:30:27 924983 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000003 >> Mar 20 14:30:27 925088 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:30:27 934314 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:27 934327 [43204960] -> Removed port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:30:27 969077 [43204960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:30:28 017451 [41E02960] -> SUBNET UP >> Mar 20 14:30:28 030947 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000004 >> Mar 20 14:30:28 031177 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:30:28 120040 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000005 >> Mar 20 14:30:28 120190 [42803960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:30:28 148805 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000006 >> Mar 20 14:30:28 149108 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:30:28 170453 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000007 >> Mar 20 14:30:28 170971 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:30:28 336861 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:28 336884 [43C05960] -> Removed port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:30:28 336910 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:28 336916 [43C05960] -> Removed port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:30:28 336945 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:28 336951 [43C05960] -> Removed port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:30:28 371497 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:30:28 410709 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000008 >> Mar 20 14:30:28 410894 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:30:28 415926 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000009 >> Mar 20 14:30:28 419624 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:30:28 426978 [45A08960] -> SUBNET UP >> Mar 20 14:30:28 438003 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000a >> Mar 20 14:30:28 438182 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0152 >> GID:0xfe80000000000000,0x0005ad0000027c84 >> Mar 20 14:30:28 470141 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000b >> Mar 20 14:30:28 470197 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 11 times consecutively >> Mar 20 14:30:28 652535 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000c >> Mar 20 14:30:28 652623 [42803960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 12 times consecutively >> Mar 20 14:30:28 681514 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000d >> Mar 20 14:30:28 681636 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 13 times consecutively >> Mar 20 14:30:28 703052 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000e >> Mar 20 14:30:28 703092 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 14 times consecutively >> Mar 20 14:30:28 724753 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000000f >> Mar 20 14:30:28 724809 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 15 times consecutively >> Mar 20 14:30:28 855519 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000010 >> Mar 20 14:30:28 855671 [42803960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 16 times consecutively >> Mar 20 14:30:28 877289 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000011 >> Mar 20 14:30:28 877354 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 17 times consecutively >> Mar 20 14:30:28 899021 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000012 >> Mar 20 14:30:28 899062 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 18 times consecutively >> Mar 20 14:30:29 006886 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000013 >> Mar 20 14:30:29 006950 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 19 times consecutively >> Mar 20 14:30:29 099965 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000014 >> Mar 20 14:30:29 100020 [44606960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 20 times consecutively >> Mar 20 14:30:29 146532 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000015 >> Mar 20 14:30:29 146578 [41E02960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 21 times consecutively >> Mar 20 14:30:29 356891 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000016 >> Mar 20 14:30:29 356938 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 22 times consecutively >> Mar 20 14:30:29 383112 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000017 >> Mar 20 14:30:29 383157 [43204960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 23 times consecutively >> Mar 20 14:30:29 383710 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x0000000000000079 >> Mar 20 14:30:29 383790 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:30:29 407890 [42803960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000018 >> Mar 20 14:30:29 407935 [42803960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 24 times consecutively >> Mar 20 14:30:29 429653 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x0000000000000019 >> Mar 20 14:30:29 429700 [45A08960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 25 times consecutively >> Mar 20 14:30:29 451352 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001a >> Mar 20 14:30:29 451401 [45007960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 26 times consecutively >> Mar 20 14:30:29 479843 [4780B960] -> umad_receiver: ERR 5409: send completed >> with error (method=0x1 attr=0x11 trans_id=0x124ef00095cbf) -- dropping >> Mar 20 14:30:29 479855 [4780B960] -> umad_receiver: ERR 5411: DR SMP >> Mar 20 14:30:29 479865 [4780B960] -> __osm_sm_mad_ctrl_send_err_cb: ERR >> 3113: MAD completed in error (IB_TIMEOUT) >> Mar 20 14:30:29 479901 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x1 (SubnGet) >> D bit...................0x0 >> status..................0x0 >> hop_ptr.................0x0 >> hop_count...............0x6 >> trans_id................0x95cbf >> attr_id.................0x11 (NodeInfo) >> resv....................0x0 >> attr_mod................0x0 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][5][17][C] >> Return path: [0][0][0][0][0][0][0] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:30:29 480017 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:29 480030 [44606960] -> Removed port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:30:29 480092 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:29 480099 [44606960] -> Removed port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 20 14:30:29 480121 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:29 480128 [44606960] -> Removed port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:30:29 480152 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:65 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:29 480160 [44606960] -> Removed port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:30:29 480325 [44606960] -> osm_drop_mgr_process: ERR 0108: Unknown >> remote side for node 0x0005ad0000027c84 port 12. Adding to light sweep >> sampling list >> Mar 20 14:30:29 480343 [44606960] -> Directed Path Dump of 5 hop path: >> Path = [0][1][11][1][5][17] >> Mar 20 14:30:29 665327 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000007a >> Mar 20 14:30:29 665355 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001b >> Mar 20 14:30:29 665397 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:30:29 665404 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 27 times consecutively >> Mar 20 14:30:29 680658 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:29 680668 [45A08960] -> Discovered new port with >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 >> Mar 20 14:30:29 680672 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:29 680678 [45A08960] -> Discovered new port with >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 >> Mar 20 14:30:29 680681 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:29 680686 [45A08960] -> Discovered new port with >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 >> Mar 20 14:30:29 680690 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:29 680695 [45A08960] -> Discovered new port with >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 >> Mar 20 14:30:29 711542 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:30:29 768280 [41401960] -> SUBNET UP >> Mar 20 14:30:30 113195 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:30 113206 [45A08960] -> Discovered new port with >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 >> Mar 20 14:30:30 113211 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:30 113216 [45A08960] -> Discovered new port with >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 >> Mar 20 14:30:30 113220 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:30 113225 [45A08960] -> Discovered new port with >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 >> Mar 20 14:30:30 113228 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:3 num:64 from LID:0x0092 >> GID:0xfe80000000000000,0x0005ad0000024bbb >> Mar 20 14:30:30 113233 [45A08960] -> Discovered new port with >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 >> Mar 20 14:30:30 144149 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:30:30 195765 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:30:30 195850 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x96dcd >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x16 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][14][2][4] >> Return path: [0][9][15][E][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:30:30 195929 [43C05960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:30:30 195942 [43C05960] -> PortInfo dump: >> port number.............0x16 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:30:30 195968 [43C05960] -> Capabilities Mask: >> Mar 20 14:30:30 196144 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:30:30 196179 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x96dce >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x17 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][14][2][4] >> Return path: [0][9][15][E][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:30:30 196248 [43204960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:30:30 196254 [43204960] -> PortInfo dump: >> port number.............0x17 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:30:30 196269 [43204960] -> Capabilities Mask: >> Mar 20 14:30:30 201633 [45007960] -> SUBNET UP >> Mar 20 14:30:30 278051 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001c >> Mar 20 14:30:30 278107 [43C05960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 28 times consecutively >> Mar 20 14:30:30 278656 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000007b >> Mar 20 14:30:30 278871 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:30:30 279653 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000008d >> Mar 20 14:30:30 279765 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:30:30 568539 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000008e >> Mar 20 14:30:30 568617 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:30:30 607916 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 >> TID:0x000000000000007c >> Mar 20 14:30:30 625139 [44606960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:30:30 663838 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x0148 >> GID:0xfe80000000000000,0x0005ad00000281b3 >> Mar 20 14:30:30 664569 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x000000000000008f >> Mar 20 14:30:30 664747 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:30:30 679262 [45A08960] -> SUBNET UP >> Mar 20 14:30:30 784024 [43204960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000090 >> Mar 20 14:30:30 784123 [43204960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:30:30 804217 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000091 >> Mar 20 14:30:30 807807 [41401960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:30:30 825500 [45A08960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000092 >> Mar 20 14:30:30 825600 [45A08960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:30:30 988887 [43C05960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000093 >> Mar 20 14:30:30 988978 [43C05960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:30:31 059298 [41401960] -> osm_ucast_mgr_process: Min Hop Tables >> configured on all switches >> Mar 20 14:30:31 106840 [41E02960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000094 >> Mar 20 14:30:31 111335 [41E02960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:30:31 112465 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:30:31 112497 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x98837 >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][16][1][5] >> Return path: [0][9][13][D][2] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 >> >> 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:30:31 112593 [44606960] -> osm_pi_rcv_process_set: ERR 0F10: >> Received error status for SetResp() >> Mar 20 14:30:31 112627 [44606960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281a7 >> port_guid...............0x0005ad00000281a7 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x2 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............DOWN >> state_info2.............0x42 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:30:31 112673 [44606960] -> Capabilities Mask: >> Mar 20 14:30:31 113808 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR >> 3111: Error status = 0x1C00 >> Mar 20 14:30:31 113838 [4780B960] -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x81 (SubnGetResp) >> D bit...................0x1 >> status..................0x1C00 >> hop_ptr.................0x0 >> hop_count...............0x4 >> trans_id................0x9883e >> attr_id.................0x15 (PortInfo) >> resv....................0x0 >> attr_mod................0x18 >> m_key...................0x0000000000000000 >> dr_slid.................0xFFFF >> dr_dlid.................0xFFFF >> >> Initial path: [0][1][11][1][4] >> Return path: [0][9][18][D][1] >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 >> >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 >> >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 >> >> Mar 20 14:30:31 113925 [43204960] -> osm_pi_rcv_process_set: Received error >> status 0x1c for SetResp() during ACTIVE transition >> Mar 20 14:30:31 113930 [43204960] -> PortInfo dump: >> port number.............0x18 >> node_guid...............0x0005ad00000281b3 >> port_guid...............0x0005ad00000281b3 >> m_key...................0x0000000000000000 >> subnet_prefix...........0x0000000000000000 >> base_lid................0x0 >> master_sm_base_lid......0x0 >> capability_mask.........0x0 >> diag_code...............0x0 >> m_key_lease_period......0x0 >> local_port_num..........0x1 >> link_width_enabled......0x3 >> link_width_supported....0x3 >> link_width_active.......0x2 >> link_speed_supported....0x1 >> port_state..............ACTIVE >> state_info2.............0x52 >> m_key_protect_bits......0x0 >> lmc.....................0x0 >> link_speed..............0x11 >> mtu_smsl................0x40 >> vl_cap_init_type........0x40 >> vl_high_limit...........0x0 >> vl_arb_high_cap.........0x8 >> vl_arb_low_cap..........0x8 >> init_rep_mtu_cap........0x4 >> vl_stall_life...........0xF2 >> vl_enforce..............0x40 >> m_key_violations........0x0 >> p_key_violations........0x0 >> q_key_violations........0x0 >> guid_cap................0x0 >> client_reregister.......0x0 >> subnet_timeout..........0x0 >> resp_time_value.........0x0 >> error_threshold.........0x88 >> Mar 20 14:30:31 113946 [43204960] -> Capabilities Mask: >> Mar 20 14:30:31 119007 [43204960] -> SUBNET UP >> Mar 20 14:30:31 128758 [45007960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000095 >> Mar 20 14:30:31 128851 [45007960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:30:31 150370 [44606960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B >> TID:0x0000000000000096 >> Mar 20 14:30:31 150468 [44606960] -> osm_report_notice: Reporting Generic >> Notice type:1 num:128 from LID:0x001B >> GID:0xfe80000000000000,0x0005ad00000281a7 >> Mar 20 14:30:31 316422 [41401960] -> __osm_trap_rcv_process_request: >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 >> TID:0x000000000000001c >> Mar 20 14:30:31 316498 [41401960] -> __osm_trap_rcv_process_request: ERR >> 3804: Received trap 29 times consecutively >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Fri Mar 23 12:27:18 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Mar 2007 14:27:18 -0500 Subject: [ofa-general] osm error messages In-Reply-To: References: Message-ID: <1174678032.24305.169706.camel@hal.voltaire.com> On Fri, 2007-03-23 at 12:41, Douglas Fuller wrote: > On /21/07 2:53 PM, "Hal Rosenstock" halr at voltaire.cm> wrote: > > > On Wed, 200-03-21 at 13:29, Douglas Fulle wrote: > >> I'm seeing some sporadic error activity from OpnSM (FED 1.1; osm.log>> below) that ay correlate with some ob failures -- I'mtrying to get to the > >> bottom of this. > >> > >> efore seeing this, I isolatedand disabled with ibortstat what ppeared > >> to be a ba intenal port n one of our core switches. That leads me to > >> suspectI have a switchmisbehaving somwhere. > >> > >> ithout any other ntervention, things seem to check out (wth > >> ibdiagnet/ibchecknet). An thought? Need any more nformatin? > > > > Is something bouncingyour subnet or was this just what ibporttte did > > ? It could be if this was a coreswitch. > > Nothing should be. The same thing appears to happen onceevery couple days > -- it is very difficult to correlate wth anything. And does it just go away ? Is some part of your subnet not accesible ? > > Also, you may have someSMAs which have gone nonresponsive to SMPs > > (IB_TIMEOUs) but the links are up. I can't be surenot knowng what the > > exact scenario was. If you do, you will like want to chase these and do > > something abot them if you haven't already. > > Hmm, what could causethat? All my hosts are responsive whenever I check > (though it hasn't been during one of these stors of activity). Are all your switches responsive ? What switches are you using ? -- Hal > > All the messages reltin to ACTIVE-> ACTIVE transition can be ignored. > > > > Also, it looks likesomething i removing characters n the log. > > Yeah, there are characters missing in the whole message. rious. > > Thans again, > --Doug > > > > > -- Hal> > >> Thanks,>> --Doug Fuller > >> > >> Ma 19 18:8:50 000354 [AB000160] -> OpenSM ev:openib-2.0.5 OpnIBsvn > >> Exported evision > >> Mar19 18:28:0 000466 [AB00160] -> OpenSM Rev:openib-2.0.5 OnB svn > > Exporte revision > >> Mar 19 18:28:50 007666 [AB000160] -> om_vendor_bind: Binding to port > >> 0xad0000024bb > >> Mr 19 18:28:50 011279 [AB00160 ->osm_vendo_bind: Binding to port > >> 05ad0000024bbb > >> Mar 19 18:2850 438326 [44606960] -> Entering MASTER stt > >> Mar 19 18:28:5 438628 [4606960] > osm_report_notice: Reporting Geneic > >> Notice type:3num:66 from LID:0x0000 > >> GID:0xfe8000000000000,0x0005ad000024bbb > >> Mar 19 1828:50438661 [4460660] -> sm_report_notice: Reorting Geneic > >> Notie type:3 num:6 from LID:0x0000 > >> GID:0xf8000000000000,0x0005ad0000024bbb > >> Mar 1 18:28:50 50476 41401960] -> osm_cat_mgr_process: Min Hop Tabes > >> onfigured on all witches > >> Mar 19 18:2850 639453 [44606960] -> SUNET UP > >> Mar 19 18:28:50 853613 1E02960 -> __osm_traprcv_process_reqest > >> Rceived Generic Notice type:0x04 num:144 Poducer:1 from LID:0x0092 > >> TID:0x00000000000018 > >> Mar 19 1828:5 853813 [4E0960] > osm_report_notice: Reporting Generic > > Notice typ:4 num:144 from LID:0x0092 > >>GID:0xfe8000000000000,0x0005ad0000024bb > >> Mar 1 18:28:51 273470 [4460960] -> osm_ucast_gr_process: MinHopTables > >> configured on all switches > >> Mar 19 18:28:51 33730 [43C05960] -> SUBNET UP > >> Ma 19 18:3033 565682 [4320490] -> __osm_trap_rcv_process_requst: > >> Received Generic Notice type:0x1 um:128 Poducer:2 from LID:0x0001 > >> TID:0x000000000000019 > >> Mar 19 18:30:33 565958 [4320496] -> sm_reprt_notice: Reporting Generic > >> Noicetype:1 num:128 from ID:0x0001 > >> GID:0xfe80000000000000,0x005d0000027c6 > >> Mar 19 18:30:33 963901 [41401960] > osm_rport_notice: Reporting Generic > >> Noticetyp:3 num:64 fro LID:x0092 > >> GID:0xfe80000000000000,0x05ad0000024bbb > >> Mar 19 18:30:33 963914 4140196] -> Discovered nw port with > >> GUID0x0005ad00000297b LI range [0x3,0x37] of node:saguro-14-9 HCA-1 > >>Mar 19 18:30:33 994698 [4401960] > om_ucast_mgr_procss: Min Hp Tables > >> configured n all switches > >> Mar 19 18:30:34 054763 [45A08960]-> UBNET UP > >> Ma 1 18:30:34 351397 43C060] -> __osm_tra_rcv_process_request: > >> Received Gneri Nice type:0x04 num:144 Producer:1 fomLID:0x0037 > >> TID:0x00000000000000 > > Mar 19 18:30:4 351615 [4C05960] -> osm_report_notice Reportig Genric > >> otice type: num:144 from LID0x0037 > >> GID:0xfe80000000000000,0x0005ad000497b > >> ar 19 18:30:34 777488 [45A0896 > osm_ucast_mgr_process:Min Hop Tables > > configured onall switces > >> Ma 1 18:30:34 832664 [4A08960] -> SUBNE UP > >> Ma 19 18:32:27 476136 [45A0890] -> _osm_trap_cv_process_reqest: > >> Received Generic Notice typ:0x01 um:128Producer:2 from LID:0x018 > >> TID:0x00000000000002b > >> Mar 19 18:3:27 476340 [43204960] ->__osm_trap_cv_process_request: > >> Reeivd Gneric Noti type:0x01 num:128 Poducer:2 from LID:0x001B > > TID:0x000000000000037 > >> Mar 19 18:32: 476389[45A08960] -> osm_reort_notice: Reporting Generic > >> Notice type: num:128 from LID:0x0148 > >>GID0xfe800000000000,0x0005ad00000281b3 > >> Mar 19 18::27 47485 [4320460] -> osm_report_ntice: Reportig Generic > >> Notice tye:1 num:128 from ID0x001B > >> GID0xfe8000000000000,0x0005ad0000081a7Mar 9 18:32:27 817617 [42803960] - > >> osm_reportnotie: Reporting eneric > >> Notice type: nm:65 frm LID:0x002 > >> GID:0xfe80000000000000,0x05ad000024bbb > >> Mr 19 18:32:27 817637 [4280396] -> Remove port wth > >> GUID:0x0005ad0000024e0b LID rane [0xB3,xB3] of nodesaguaro-23-4 C- > >> ar19 18:32:27 817655[42803960] -> sm_report_notice: Reporting Generc > >> Notice type:3 num:65from LID:0x092 > >> GID:0fe800000000000000005ad0000024bbb > >> Mar 9 18:32:27 8176 [42803960] -> emove port with > >> GUID:0x0005ad000002510b LID range [0xB5,0B5] of node:saguaro-23-6 HCA- > >> Mar 1 18::2 817694 [42803960] -> osreport_notice Reporting Generic > >> Ntice type: num:65 from LID:0x0092 > >> GID:0xfe80000000000000,00005d000024bbb > >> Mar 19 18:3:781769 [4280360] -> Rmoved port wth > >> GUID0x0005ad000002511b LID range 0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 19 18:322717716 4280390] -> osm_report_ntice: ReprtingGneric > >> Notice type:3 nm:65 from LID:0092 > >> GID:0xfe80000000000000,0x0005ad000002bbb > >> Mar 1918:32:27 81771 [42803960] - Remved port with > > GUID0x0005a0000024b7 LID range [0xAF,0xAF] of node:sguaro-23-0 HCA-1 > >> Mar 19 18:3:27 817738 [4280390] - osm_reportnotice: ReportingGeneric > >> otic type:3 num:65 from LID:0x0092 > >> GD:0xf800000000000000x0005ad0000024bbb > >>Mar 19 18:32:27 817743 4203960] -> Removed ort with > >> GUID:0x000ad000025043 LI range [0xB4,0xB4] of node:sguaro-35 HA-1 > >> Mar 19 18:32:27 817758 [42803960] - osm_report_notce: ReporinGeneric > >> Notice type:3 num65 from LID:0x009 > >> GID:0xfe8000000000000,00005ad0000024bbb > >> Mar 19 1:32:27 817763 [2803960 -> Remoed port wih > >> GUID:0x0005ad000024d7 LID rane [0xB6,0xB6] of node:saguar-23-7 HCA-1 > > Mar 19 18:32:27 817780 [42803960] -> osmport_notice: Reporting Genric > >> Notce type:3 nu:65 fromLID:0x009 > >> ID:0xfe80000000000000,0x0005ad0000024bb > >> Mar 1 18:32:27 17785 [42803960] -> Remved port with > >> UID0x0005ad0000024d6bLID range [0xB8,0xB8] node:saguao-23-9 HCA-1 > >> Mar 19 18:32:27 817803 [48036] -> osm_report_notce: Reporting Generic > >> Notice tye:3 num65from LD:0x0092 > >> GID:0xfe8000000000000,0x0005ad000024bbb > >> Mar 19 8:3227 817808 [4283960] -> Removed porwith > >> GUID:0x0005ad000004977 LID rane 0xA9,0xA9] of node:saguro-224 HCA1 > >> Mar 19 18:32:27 817932 [4803960] -> osm_report_notice: Rporting Generic > >> Ntice type:3 num:65 from LID:x009 > > GID:0xfe80000000000000,0x000ad0024bbb > >> Mar 19 18:32:27 817938 [428090] -> Removed port with > >> GD:0x0005d0000027c84 LID range [0x1,0152] of node:Topspin Switch TS20 > >> M 19 18:32:27 817970 [4280390] -> osm_report_notice: Reporing Generic>> Notice type:3 num:65 from LID:0x0092 > >> GID:fe8000000000000,0x0005ad000024bbb > >> Ma 19 18:32:27 817977 [4280360 > Removd port with>> GUID0x0005ad0000024d8b LID range [0xB,0xB7] of nde:aguaro-23-8 HCA-1Mar 19 > >>1:32:27 81792 [42803960] -> osm_report_ntice: eporting Generic > >> Notic tye:3 num:65 from LID:0x0092 > >> GID:0xfe800000000000000x005ad000004bbb > >> Mar 19 8:32:27 81797 [42803960 -> Removed por with > >> GID:0x0005ad00000249f ID range[0xA8,0xA8] of node:saguaro-22-3 HCA- > >> Ma 19 18:32:27 81811 [42803960] > osm_report_notice Reportin Generic > >> Notice type:3 num:6 from LID:0x0092 > >> GID:0xe80000000000000,0x0005ad0000024b > > Mar 9 18:32:27 818016 [42803960] - emoved port with > > UID:x0005ad000004c9b LID range [0xA7,0xA7] of node:saguaro-2-2 HCA-1 > > Mar 1 18:32:7 818032 [42803960] -> osm_report_ntice: Reporing Generic > >> Notice tpe:3 num:65 from LD:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb>> a 19 18:3:7 818037 [42803960] -> Rmoved port wit > >> GUID:0x0005ad00004da7 LD range[0xB0,0xB0]of node:saguaro-23-1 HCA-1 > >> Mar 19 1:32:27 818054 [428060] -> osm_repot_notice: ReportingGeneric > >> Notice typ:3 num:65rom LID:0x0092 > >> GID:0xfe80000000000000,x0005ad0000024bbb > >> Mar19 18:3:27 818115 [428060] -> Remoed port wth > >> GUID:0x0005ad000024cbb LID range [B2,0xB2] of ode:saguar-23-3 HCA-1 > >> Mar 1 18:3:27 81812 [42803960] ->osm_report_otice: Reorting Generic > >> Notie tpe:3 num:65 from LID:0x092 > >> GID:0xfe8000000000000,0x005ad000002bbb > >> Mar1918:32:27 818137 [803960] -> Removedport ith > >> GUID:0x0005ad00000249d3 LID range[xB1,0B1] of node:saguaro-23-2 HCA-1>> Mar 19 8:322 818153 [4280360] -> om_report_notice: Reporting Gneric > >> Notice typ:3 num:65 from LID:0x0092 > >> GID0xfe800000000000,0x0005d0000024bbb > >> Mar 19 1832:7 818158 [42803960]->Removedpot with > >> GUID:0x0005ad00002feb LID range [0x153,0x153] of noesaguaro-2-5 HCA-1 > >> Mar 19 8:32:7818173 [42803960] -> osm_report_notic Reporting Geeric > >> Notice typ:3 num:65 from LID:0x092 > >> GID:0xfe8000000000000,0x0005ad000024bbb > >> ar 19 18:322 818178 [42803960 -> Removed prt wih > > GUID:0x0005ad0000024afb LID rng [0xA5,0xA5] of node:saguaro-22-0 HC-1 > >> Mar 19 183227 85129[42803960] -> osm_ucast_mgr_procss: Min Hop Tbls > >> conigured on all sitches > >> Mar 19 18:32:2898524 [43204960] -> SBNT UP > >> Mar 19 8:32:2828664 [4507960] - osm_ucast_mgr_proessMin Hop Tables > > confiured on all switchs > >> Mar 19 18:32:28 34169 [4466960] -> UNET UP > >> Mar 191:33:21814615 [41E0296] -> __osm_trap_cv_process_request: > >> Rceied Gneric Notice type:0x01 num:128 Produe2 from LID:0x0148 > >> TID:00000000000002c > >> Mar 19 8:33:21 81484 4E02960]->osm_report_otice: Repoting eneri > >> Notie type:1 num:128 fom LID:0x0148 > >> GID:0xfe800000000000,0x005ad0000281b3 > > Mar19 18:3321 820835 [41E02960] -> __osm_tap_rcv_proces_request: > >> Recived Genei Notc type:001 num:128 Producer:2 fro LI:0x01B>> TID:0x000000000000008 > >> Ma 19 1833:2 82090 [41E02960] -> smreort_notice Repoting Generic > >> Notice tye:1 num128 fromLD:0x001B > >> GD:xfe80000000000000,0x0005ad0000281a7 > >> ar 19 18:33:21 82038 [41E02960] -> __s_rap_rcv_processrequest: > >> Received Geeri Notce tpe:0x num:128 Producer: from LID:00148 > >> TID0x0000000000002d > > Mar 19 18:33:21 820992 [E060] -> osm_report_noticeReporting Gnric > >> Notice type:1 num:128fom LID:00148 > >> GID:0xfe8000000000000,0x005ad00000281b3 > >> Mar 19 18:3:21 826779 [402960] -> __osm_trap_rcv_process_rqest:>> Receivd Gneric Notice type0x0 num:18 Prducr:2 from LID:0x001B > >>TID0x00000000000039 > >> Mar 19 1833:21 82742 [41E02960] - om_report_notice: Reporting Generc > >> Ntice tye:1 num:18 from LID:0x001B > >> GID:0xfe800000000000,0x0005a00000281a7Mar 19 18:3:22 164580 [45007960 > > >> osm_drop_mgr_pocess: ERR 0108: Ukown > >> remote side fo nde0x0005a0000027c84 port 18. Addi to liht weep > >> samplin list > >> Mar 191:3:22 164654 [45007960] -> Direced PathDump of 5 hop path: > >> Path= [0][1][11][1][5][17] > >> Mar19 18:33:22 164712 [4500796] -> osm_op_mgrprocess: ERR 0108: Unknow > >> reote sde for node 0x0005ad00000281b3 port2 Adding to ligt swep > >> ampling lit > >> Mar 19 18:33:264724 4500760] > irected Path Dump of hop path: > >> Path = [0][1[11][1][5] > >> Mar 9 18:33:22 13634 4C0960] ->osm_report_notic: Reporing Generic > >> Notice type:3 num:64 fromD0x0092 > >> GD:0xfe80000000000000,0x005ad0000024bbb > > Mar 19 18:33:22 17365[43C0560] - iscovered e ort with > >> GUD:0x0005d000027c4 LIDrange [0x152,0x152] of node:Toppin Switch TS120 > >> Mar 19 18:3:2217365 [43C05960] -> os_reortnotice:RepotingGeneric > > Notice type:3 num64from LID:0x0092 > >>GID:0xfe8000000000000,0x05ad0000024bbb > >> Mar 19 18:33:22 173662 [43C05960] -> icoerd nepot with > >> UID:0x005ad0000024b27 I rage[0xAF,0xAF] o ode:saguaro-23- HCA-1 > >> Mar 19 18:33:22 17366 [43C0590] -> osm_report_notice: Reprting Gric > >> otice type num:64 from LID:00092 > >> GI0xfe8000000000000,0x0005ad0000024bbbMar19 18:33:22 173671[43C05960] -> > >> Discovered new por with > >> GUID:0x5ad000024da7 LID rage 0xB00xB0] ofnode:auar-23-1 HCA-1 > >> Mar 19 18:33:22 173675 [43C0596] -> osm_eport_notice: Rerting Generic > >> Notice typ:3num:4 fr LID:0x0092 > >> GI:0xfe800000000000,0x005ad0000024bbb > >> Mar 9 18:33:22173680 [43C05960] -> Discoveed new port > >> withGUD:0x005ad00000249d3 LDrange [B1,0xB1] of node:saguaro-2-2 HC1 > >> Mar 918:33:22 173684[43C05960] ->osm_rport_notice Reporting Generic > >> Notice type:3 num:6 fo LI:0x002 > > GID:0xfe000000000000,0x005ad000002bb > > Mar 19 18:33:22 173689 [43C05960] -> Dicovered new port with > >> GUI:0x0005ad00024cbb LID range [0xB,xB] f node:saguaro-233 HCA-1 > >> Mar 19 1:33:22 173693 [3C0960 ->osm_report_ntice: Rporting Geric > >> Notice type:3 num:64 fro LID:0x0092 > >> GID:0xfe00000000000,0x005d000004bbb>> Mar 19 1:33:22 173697 43C0596] -> Discoverednew port with > >> GUID:0x05ad000024e0b LI rnge [0xB3,0xB3] of nod:aguaro23-4 HCA-1 > >> Mar 19 1833:22 73701 [43C5960] -> os_reportotice: Repoting Genric > >> Notce type:3 nu:4 from LID:0x0092 > >> GID:0xfe800000000000000x005d0000024bb > >> Mr 19 1833:22 17706[43C05960] -> Dscovered new port with > > GUID:0x0005ad00025043 LID ange [0xB4,xB4] ofnode:saguaro-23-5HA-1 > >> Mar 1 18:33:2 173710 43C05960] -> osm_reo_notice: Reprting Generic > >> Notice tye:3 m:64 from ID:0x0092 > >> ID:0xfe8000000000000x0005ad000002bbb > >> Ma 1918:3:22 173715 [43C0596] ->Discverednw port with > >> UID:0x0005ad00002510b LID range [0xB5,0xB5 of nde:saguaro-23-6 HCA1 > >> Mar 9 18:3:2 173719 [3C0590] -> osm_report_ntic Reportin Generic > > Notice type:3 nm:64 from LID:0x0092 > >>GID0xfe800000000000,0x0005ad0000024bbb > > Mar 1918:33:22 1723 [43C05960]-> Disoveed new ort wth > >> GUID:0x0005d000002447 LID rane [0xB6,0xB6] of node:saguaro-23-7HCA-1Mar 1 > >> 18:33:22 17327 43C05960] ->osm_rert_notie: Reporting Generic > >> Notice type:3 num:64 fom LID:0x0092 > >> GI:0fe80000000000000,00005ad000004bbb > > Ma 9 18:33:22 1733 [43C05960] - Discoverednew port wit > >> GUID:0x0005d000024d8bLID range [0B7,0xB7] of node:saguro-23-8 HCA-1 > >> Mr 19 18:33:22 173736 [4C05960] -> os_repr_notice:Reporting Generic > >> Notic type: num:64 from LID:0x0092 > >> GID:0f80000000000000,0x0005d0000024bbb > >> Ma 19 18:33:22 173741 43C5960] ->Dscovred ne portwith > >> GID:0x0005ad0000024d6b LI range 0xB8,0xB8] ofnode:sguao-23-9 HCA-1 > >> Mar19 1833:22 173744 [43C0960] -> osm_report_notice: Reorting Generic > > otice typ:3 num:64 from LID:0x0092 > > GI:0xfe800000000000,0x0005a0000024bbb > >> Mar 19 18:3:2 173749 [430960] -> Discovered new prt with > >> GUI:00005d000024afb LIDrnge [0xA5,0xA5] of noe:sagaro-22-0 HCA-1 > >> Mar 19 833:22 13753 [43C0960] -> om_repor_notice: Reporing Generic > >> Noticetype: num:64 rom LID:0x002 > >> GID:0xfe800000000000,0x0005ad00004bbb > >> Mar 19 18:3:22 17758 [43C0596] -> Dicovere nw portith > >> GUID:00005ad000002511 LID rang [0x6,0A6] f node:saguao-22-1 HCA-1 > >> Mar 1 18:33:22173762 [43C05960] - osm_reort_notice: Reortin Geneic > >> Ntice type:3 num:64 from LID:0x0092 > >> GI:xfe80000000000000,0x005ad000024bbb > >> Mar 19 18:33:22 17376 [C0960] -> Dscovered new port > >> wihGUID:x0005ad00024c9b LID range [xA7,0xA7] of node:saguro-222 HCA-1 > >> Mar 19 8:33:22 173770 [4C05960] -> osm_report_noice Reprting Gneric > >> Notie type:3 nm:64 from LD:0x0092 > >> GID:0xfe8000000000000,0x005a000024bbb > >> Mar 1918:33:22 173830 43C0590]-> Discovered new port with>> ID:0x0005d000002498f LID range [0xA,08]of node:sguaro-22-3 HCA-1 > >> Mar 18:33:22 173834 [43C05960] -> osm_eportnotice: Reportin Geneic > >> Notice ype:3num:64 from LID0x0092 > >> GD:0xfe8000000000000,0x0005ad000024bb > >> Ma 1 18:33:22173839 [43C05960] -> Discovered new port ith > >> GUI:0x005ad0000024977LID range [xA,0A9 of node:saguaro-22-4 HA1 > >> Mar 19 18:33:22 173843 [3C05960] ->osm_report_notice: Reporting > >> GenericNotice ype: num:64 from LID:00092 > >> GID:0xfe00000000000,0x0005ad0000024bbb>> Mar 9 :33:22 173848 [43C05960] -> Discovered new port with > >> GUD:00005ad0000024feb LID range [0x153,x1of node:sagaro-22-5 HCA-1 > >> Mar 19 18:33:22 204620 [43C05960] -> osm_cast_mgr_process: Min Hop > >> Tablescnfgued on all switches > >> Mar 19 18:33:22 278567 [45A0896] -> SUNET UP > >> Mar19 18:33:22 664286 [141960] -> osm_ucast_mgr_process: Min Hop Table > >> configured on all switces > >> Mr 19 1833:22 734270 [45007960 -> SUBNET UP > >> Mar 19 1833:37 650358 [41401960] -> __osm_trap_cv_process_request > >> Rceived Geneic Noice type:0x01 num:128 Producer:2 from LID:0x0152 > > TID0x0000000000000000 > >> Mar 19 18:33:3 65058 [41401960] -> os_report_notice Reporting Generic > >> Noticetype:1 num:28 from LID:0x0152GID:0xfe800000000000,0x005ad0000027c84 > >> Mar 19 18:33:37 927263 [45A08960] -> __osm_rap_rcv_procs_request: > >> Received Generic otice tye:0x01 num:128 Producer:2 fro LID0x0152 > >> TID:0x000000000000001 > >> Mar 1 18:33:37 927420 [45A090] -> osm_report_notice: eportig Geeric > >> Notice type:1 num:128from LID:0x0152 > >> GD:xfe8000000000000,0x0005ad0000027c84 > >> Mar 19 18:3:37 95572 [4A0896] -> __osm_trap_rcv_process_rquest: > >> Received Generic Notice type001 num:128Produce:2 from LID:0152 > >> TID:0x00000000000002 > >> Mar 1918:3:37 955657 [45A08960] -> osreprt_notice: Reporting Generic > >> Noticetype:1num:128 from LD:0x012 > >> GID:0xfe800000000000,0x0005ad000002c84 > >> Mar 1 18:33:37 97718 [44606960] -> _osm_tap_rcvprocess_request: > >> Receivd Generic Notice type:0x01 nu:28 Produr2 from LID:0x0152 > >> TID:000000000000003 > >> Mar 19 18:33:3 97740 [44606960] ->osm_report_notice: poring Geneic > >> Notice type:1 num:128 f LID:0x0152 > >> GID:0xfe800000000000,0x0005d0000027c84 > >> Mr 19 18:3337 999319 [41E02960] -> __osm_trap_rc_process_rquest: > >> Receied Gneri Notice type0x01 num:128 Producer:2 rom LID:0x052 > >>TID:0x000000000000004 > >> Mar 19 18:33:37 99447 [41E02960] > sm_report_notice: Reporting eneric > >> otice type1 num:128 from LID:x152GID:xfe800000000000000x000ad000002784 > >> Mar 19 18:33:38 045171 [4606960] -> __osm_trap_rcvprcess_request: > >> Received Gneric Notce type:0x0 num:128 Producer:2 frm LID:00152 > >> TID:0x00000000000005 > >> Mar 9 18338 045271 [44606960] -> osm_report_notice: Reporting Generic > >> Ntie ype:1 num:18 from LID:0x05 > >> GID:0xfe800000000000000x0005ad00002784 > >> Mar 19 18:33:38 06305 [432060] -> __osm_trap_rcv_process_request: > > Received eneric Notice typ:0x01 nu:128 Producer:2from ID:0x052 > >> TID:0x000000000000006 > >> Mar 1918:33 063102 [43204960] -> osm_reprt_notice: porting Generic>> Notice type:1 num:128 from LID:0x0152ID:0xfe8000000000000,0x0005a0000027c4 > >> ar 9 18:3338 182624 [803960] -> __osm_trap_rcv_process_request: > >> Receved Generic Notice typ:0x01num:12 Produr2from LID:0x0152 > >> TID:0x000000000000007 > >> Mar 19 18:3338 18720 [4280360 -> osm_reprt_notice: Reporting Geeric > >> Notice typ:1 num:128 from LID:0x05 > >> GID:xfe800000000000,0x0005ad000007c84 > >> Mr 19 18:33:38 19435 [44606960] -> __osm_trap_rc_process_request > >> Reeived Genric Notice tpe:0x0 num:18 Prducer:2 from ID:0x012 > >> TID:x0000000000000008 > >> Mar 1918:33:38 194209[44606960] -> osm_reportnotice Reorting Generc > >> Notic type:1 num:28 from LID:0x152 > > GID:0xfe000000000000,0x0005ad000007c4 > >> Mar 1 18:33:38 379421 [43C05960] -> _om_trap_rcv_processrequest: > >> Receive Generi Notice type:x01 num:12Producr:2 fromLID:0x0152 > >> TID:0x00000000000009 > >> Mar 19 18:33:38 37959 [4305960] -> osm_report_tice: Reporting eneric > >> Ntice type:1 num:128 rom LID:0x0152 > >> GD:0xfe80000000000,0x005ad0000027c84 > >> Mar 19 1833:3 07685 [41401960] -> __osm_trap_cvrocss_request:Received > >> GenericNotie type:x01 num:128 Producer:2from LID:0x0152 > >> TID:0x0000000000000a > >> Mar 19 18:33:38 47758 [4140190] -> os_report_notice: eprting Generic > >> Notice typ:1 num:128 rom LID:0x0152 > >> GID:0xfe8000000000000,0000ad0000027c84 > >> Mar 1 18:33:8 429658 [4A08960] -> __m_trap_rcv_pocess_request: > >> Received enric Ntice type:001 num:12 Producer:2 fm LID0x0152 > >> TID:0x000000000000000bMa 19 8:33:8 429700 [45A08960] > > >> __osm_traprcv_process_reqest: ERR > >> 3804: Received trap 11 ties ecutiveyMar 19 18:33:38 544177 [45007960] - > >> __osm_trap_rcv_process_reest: > >> eceived Generic Notice tpe:0x0 num128 Podcer:2 from LID:0x152 > >> ID:0x000000000000000c > >> Mar 18:3338 544221 [4507960] -> __osm_trp_rvprocess_requst: ERR > >> 304: Received trap12 times consecutiely > >> Mar 1918:33:8 545235 [4280960] ->osm_repot_ntic:Rporting Generic > >> Notice type:3 num:65 from LI:0x0092 > >> GID:0xfe80000000000,0x0005ad000024b > >> Mar 9 18:3338 54247 [42803960] -> Removed por with > >> GUID:0x0005ad00024b27 ID range 0xAF,0xAF] f node:sauaro-23-0 HC-1 > >> Mar 19 18:33:3 545278 [42803960] -> osmeort_notice: Reporing Generc > >> Noticetype3 num:65 from LD:0x0092 > >> G:xfe8000000000000,0x0005ad0000024bbb > >> Mar 19 18:33:38 54586 [428030] > Removd port with > >> GUI:0x0005a000024da7 LID range [0xB,0x0] of node:sauao-23-1 HC-1 > >> Mar 19 1:3:38 545312 42803960] ->osmreport_noice: eporting Generic > >> Notice ype: num:65 from LID:00092 > >> GID:0xfe800000000000,0x0005ad0000024bb > >> Mar 19 8:33:38 54318 [2803960] -> Reoved portwth > >> GUID:0x0005ad00000249d3 LD rang [xB10B1] of node:saguaro-23-2 HA-1 > >> Mar19 8:3:38 580005 [42803960 -> osm_ucast_mr_process: in Hop Tabes>> configured on all swiches>> Mar 19 18:3:38 66849[43C0590] -> SUBNET UP > >> Mar 19 18:33:38 68520[45A08960] -> __om_tra_rcv_process_reques: > >>ReceivedGeneri Notice tpe:x01 num:128 Producer:2 from LID:0x015 > >> D:x00000000000000 > >> Mar 19 18:33:38 48616 [45A08960] -> _osm_trap_cv_process_requet: ERR > >> 3804: ceied trap 13 tmes onsecutiely > >> Mar 19 183338 676891[41E0260 -> __osm_trap_rcv_rocess_request: > >> Recived Genei Notice tpe:0x01 num:128 Producer:2 fo LID:0x152 > > TID:0x000000000000e > >> Mar 19 18:33:38 67670 [4102960] -> __osm_rap_rcv_proces_requst: ERR > >> 3804: Reived trap 14 tes cosecutively > >> Mar 19 18:33:38 698797 [446096] ->__osm_trap_rcv_pcessrequest: > >> Receved Geneic Notice typ:0x1 num:128 Producer:2frm LD:0x0152 > >> TD:0x00000000000000f > >> Mr 1 18:33:8 69860 [44606960]-> __osm_trap_rcv_procss_equest: ERR > >> 3804: Receved trap 15 times conecutive > >> Mar 19 18:3:38 20538 [43C05960] -> __s_trap_rcv_proces_request: > >> Received Generic Notce ype:0x01 num:128 Poducer2from ID:0x0152 > > TID:0x000000000000010Mr 19 18:33:38 720612 [43C0960] -> > >> __osm_trp_rcv_process_reqest: RR > >> 3804: ecived trap16 time onsecutively > >> a 19 18:33:38 921253 [42803960] > __osm_trap_rv_processequest: > >> eceive Generic Notice type:x01 num:128 Producer:2 from LID0x012 > >> TIDx000000000000011 > >> Ma 19 18:33:8 9213 [42803960] > __osm_trap_cv_procss_reque: ERR > >> 3804: Recived trap 17 imes consecutively > >> Mar 198:33:38 97418 [43C05960] -> _osm_trap_rcv_proess_reqest: > >> Recived Generic Notice ype:0x01 nm:12Prodcer:2 rom LID:0x152 > >> TID:0x000000000000012 > >> a19 18:33:38 97479 [43C05960] > __os_trap_rcv_prcess_equest: RR > >> 3804: Received trap 1 ties onsecutively > >> Mar 191833:38 98519 [483960] > _osm_trap_rcv_rocessreques: > >> ReceivedGeneric Noice type:0x1 um:128 Producer:2 from LID:0x015 > >> TID:x00000000000013 > >> Mar 19 1:33:3 98955[2803960] - __osm_tap_rcv_process_rquest: ERR3804: > >> Receivtrap19 times consecutively > >>Mar 9 18:33:38 998342 [43204960] -> __os_trap_cv_poces_request: > >> ecivd Generic Notice type0x01 num128 Poducer:from ID:0x0152 > >> TD:0x0000000000001 > >> Mar 19 18:33:38 998380 [4320496] -> _osm_ap_rcv_process_requestRR > >> 384:Received trap 20 times conscutiely > >> ar 19 18:33:3 03923 3204960] -> _osm_tra_rcv_process_requst: > >> Recived Gneric Notice type0x0 num:128 Producr:frm LID0x0152 > >> TID:0000000000000015 > >> Mar9 :33:39 039334 [4204960] ->__os_traprcvprocess_requs: ER > >> 3804:Received tra 21 times consecutiely > >> Mar19 18:33:39 06060 [32096] -> __osm_trap_rcv_process_equest: > >> Reeid Generic Notice type:01m:128 Producr:2 from > >> LID:x0152TID:0x00000000000016 > >> Mar 19 18:3:306108 [4320490] -> __sm_trap_cv_prcess_request: ERR > >> 304: Reeied tra22 times onsecutivel > >>Mr 1 18:33:39 079032[41E02960] -> __osm_tra_cv_process_reques: > >> Received eneric Notice tpe:0x01 num:18 Prducer from LID:0x0152 > >> TD:0x00000000000017 > >> Mr 1 18:33:39 07972 4E0260] -> _osm_trap_rcv_proces_request: ERR > >> 380: Receied trp23 time consecutivel > > Mar 19 18:3:9 16006 [41E0960] -> osm_eport_notice: Repoting Geric > >> Noice ype3 num:5 from LID:0x0092 > >> GD:0xfe80000000000000x0005ad0000024bbMar 19 18:33:9 146018 [402960] -> Removed porwith > > GUID:0x005ad000002511b LID range [0xA6,x6] of od:saguaro-2-1 HCA-1 > >> Mar 19 18:33:39 1404 [41E02960 -> osm_eport_notce: Reportin Generic>> Noticeype:3 num:65 from LID:0x0092 > >>GID:0xfe80000000000000,0x005ad0000024bb > >>Mar 19 18:33:39 146050 [41E296] -> Rmove ort with > >> UID:0x0005d00000db LID range [0xB80xB8] of nod:saguaro-23- HCA-1 > >> Mar 19 18:33:39 14082 [41E2960] -> sm_report_notie Reporting Generic > >> Notic typ:3 num:6 from LID:0x0092 > > GID:0xfe000000000000,0x0005ad000024bb > >> Mar 19 8:33:39 146089 [41E02960] -> Removed port wh > >> UID:0005ad0000024afb LID range 0xA0xA5] of noe:saguaro-22-0 HCA- > >> Mr 19 18:33:39 15720 [4140190 -> osm_report_notice: Reporting Gnerc>> Notie type:3 num:64from LID:0x092 > >> GID:0xfe80000000000000,0x0005ad0000bbb > >> Mar 19 18:33:39150732 [41401960] -> Discoveed new port with > >> GI0005ad0000024b27 LI rage [0xAF,0xAF] of nde:saguaro-23-0 HCA-1>> Mar 1 18:33:39 150736 [4140160] -> om_report_notice: Reporting Genec > >>Notic ype:3 um:64 from LID:0x009 > >> GID:0xfe0000000000000,0x0005d000024bb > >> Mar 19 18:33:39 50742 [41401960] -> Discoverd new port with > >> GID:0x00ad0000024d LID range [0xB0,xB0] of node:aguro-23-1 HCA-1 > > Mar 19 183:39 15074 [4141960] -> osm_report_notice: Reporting Genei > >> Notice tpe:3num64 from LID:0x0092 > >> ID:0xf800000000000x0005ad000024bbb > >> Mar 19 18:3:39 15750 [41401960] -> Discovered new pot with > >> UID:0x0005ad00024d3 LID range [0x1,0xB1] of node:saguaro-3-2 HA-1 > >>Mar 19 18:33:3 181553 [411960] -> os_ucast_mgr_process: Min Ho Tabes>> configured on al switches > >> Mar 19 18:33:9 218130 [43C5960] -> __om_trap_rcv_proess_request: > >>Received eneic Ntice type:0x01 num:128 Producer:2 from ID:0152 > >> TID:0x000000000000018 > > Mar 19 18:33:39 218197 43C05960] -> _os_trap_rcv_process_request:ERR > >> 3804: Receivd trap 2 times cosecutivly > >> Mar 1918:33:39 375407 [480390] -> __osm_trap_rcv_process_request: > > Receive Generc Notice type:001 um:128 Producer:2 from LID:0x0152 > >> T:0x0000000000019 > >> Mar 19 18:3339 375456 [4803960 -> __osm_trap_rcv_process_request: ERR > >> 3804: Rceived tra 25 tis cnsecutvely > >> Mar 19 18:3:39 375588 [43C05960]-> __osm_trap_rcv_procsrequest: > >> Received Generc Ntic type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000002e>>r 19 18:33:39 375630 [43C05960] -> osmror_notice Reporting Generic > >> Notice type:1 num:128 fo LID:0x0148 > >>GID:0xfe80000000000000,0x0005ad0000281b > >> Mar19 18:3339 637844 [41401960] -> UBNET UP > >> Mar 19 18339 664805 [45A0890] -> __osm_trap_rcv_process_request: > >> Received Gener Notce tpe:0x01 num128 Producer:2 from LI:0x0148> TID:0x000000000000002f > >> Mar 19 18:33:9 66490 [4508960] ->osm_report_notice: Reporting Generic > >>Notice type:1 num:128 from LID:0x0148 > >> GID:xfe8000000000000x0005ad0000281b3 > >> ar 19 18:33:39 666276 [45A08960] -> __osm_trap_rcprocess_reuest: > >> eceived Generic Notice typ:x01 num:128 Poducer:2 from LID:0x001B > >> TID:0x0000000000003a > >> Mr 9 18:33:39 666364 [45A08960] - osm_reprt_notice Reprting Generic > >> Notice type1 num:128 from LID:0001B > >> GID:0xfe8000000000000,0x0005ad00000281a7Mar 1 18:33:9 710546 [41E02960] -> > >> __osm_tap_rv_roces_request > > Received Generic Notice type:0x01 num:128 Producer2 fom LI:0x014 > >> ID:0x00000000000003>> Mar 19 18:33:39 71062 [41E02960] -> osm_eport_notice Reportig Generic > >> Noice type:1 num:28 from LID:0x048 > >> ID:0xfe80000000000000,0x0005ad000281b>> Mar 19 18:33:39 732425 [41E060]->_sm_trap_rcvprocess_request: > >> Received Generic Notice type:0x01 num18Producer:2 from ID:0x048 > >>TID:0x0000000000000031 > >>Mar 19 18:33:3973214 41E02960] -> osm_rport_ntice: Reporing Generic > >> Notice type:1num:128 from LID:0x0148 > >> GID:0xfe80000000000,0x0005ad0000281b3 > >> Mar 1 83339784151 [43204960] -> __osm_trap_rcv_process_request: > >> ReceivedGeeric Notice type:0x01 um128 Producer2 from LID:00148 > >> TID:000000000000032 > >> Mr 19 18:33:3978469 [43204960] -> osm_reort_notice: Reporting neric > >> Noice type:1 nu:128 from LID:x0148 > > GID:0fe80000000000000,00005ad0000081b > >> Mar 19 18:33:39 824170 [4283960] > __osm_trap_rcv_rocss_request: > >> eceived Gneric Notice type:001 num:128 Produer:2 from LID:0x001B > >> TID:0x00000000003b > >> Mar 19 18:33:39 824443 [4283960] -> osm_repot_notice: Reportin Generic > >> Notice tye:1 n:128 frm LI:0x01B > >> GID:0xfe8000000000000,0x0005ad00000281a7 > >> Mar 19 18:3:40 01502 [44606960] - osm_report_noti: eporting Generic > >> Notce type:3num:64 rom ID:0x0092 > >> GID:0xfe800000000000,0x0005ad0000024bbb > >> Mar 9 18:33:40 01070[44606960] - Discovered new port with > >> GID:0x00ad0000024d6b LID rne [0xB80xB8] o node:saguaro-23-9 HCA-1 > >> Mar 19 1833:40 015074 [44606960] -> osm_repot_notice: Reportng Generic > >> Ntice type3 num:64 from LI0x0092 > >>GID:0fe80000000000000,0x0005ad0000024bbb > > Mar 19 18:33:40 0080 [44606960] -> Discovered new port wit > >> GUI:0x0005ad00024afb LID rang [0xA5,0xA5] of node:agua-22-0 HCA-1 > >> Mar 9 18:3:40 015083 [4406960] -> osm_repor_notic: Reporting Generic > >> Notice type:3 nu64 from LID:0x002 > > GID:0xfe80000000000000,x0005ad000002bbb> Mar 19 18:33:40 015088 [44606960] -> Discovered new port with > >> GUID:00005ad000002511b LID ange [0xA6,0xA6] of noe:sauaro-22-1 HCA->> Mar19 18:33:40 046164 44606960] -> om_ucast_mgr_prcess: Min Hop Tables > >> configured on all switchesar 19 18:33:40 106627 [42803960] -> BNET UP > > Mar 1918:33:40 145952 [45007960] -> __osm_trap_cv_process_rquest: > >> Received Generic Notice type:0x01 um:18 Producer:2 from LID:0x0148 > >> TID:0x00000000000033>> ar 19 18:3340 146076 [4507960] -> os_report_notice: Reporting Generic > >>Notice type:1 nu:128 from LID:0x018 > >> GID:0xfe8000000000000,0x0005ad00000281b3 > >> ar 19 18:33:40 14646 [44606960] -> __os_trap_rcv_process_reqest:Received > >> Generic Noice ype:0x01 num:128 Producer:2 from LIDx001B > >> ID:x000000000000003c > >> Mar 9 8:33:40 16611 [44606960] -> osm_report_notice: Reporting Generi > >> otice type:1 um:128 from LID:0x001B > >> GD:0xfe8000000000000,0x0005ad00000281a7 > >> Mar 19 18:3:40 306176 [41401960]->__osm_trap_rcv_process_request: > > Receivd Generic Notice type:0x01 um:128Poucer:2 from LID:0x001B > >> TID:0x00000000000000d > >> Mar 19 18:33:40 306270 [41401960] -os_report_notic: Reporting Generic > >> Ntice type:1 num:128 fro ID:0x001B > >> GID:0xfe800000000000000x000ad00000281a > >> Mar 19 18:33:40 20009 [4C05960] -> __osm_trap_rcv_rocess_rquest:Received > >> Generic Ntice yp:0x01 num:128 Producer:2from LID:00152 > >> TID:0x0000000000000019 > > Mar 91:33:4420071[43C05960] -> __om_trap_rcv_process_request: ERR > >> 3804: Receved trap 26 times conecutivly > >> r19 1833:40 433566 [4280390] -> __osm_tap_rcv_process_request: > >> Reeive Geei Noticetype:0x01 num:128 Producer:2 frm LID:0x0152 > >> TID:0x0000000000001a > >> Mar 19 1:33:40 43596 [403960] -> __osm_traprcv_proess_reuest: E > >> 3804: Received trap 2 times consecutively > >> Mar 19 1833:40 434996 [45007960] -> _osm_trap_cv_pocess_reqest: > >> Received Generic otice type:0x01 num:28 Producer:2from > >> LID:x001BTID:0x00000000000003e > > Mar 19 18:33:40 435041 [450079] -> os_reportotice: Reporting Generic > >> otice ype:1 num:18 fromLID:0x001B > >> GID0xfe80000000000000,000ad0000281a7 > >> Mar 19 18:33:40 485454 [204960 -> osm_ucast_mgr_procss: Mi Hop Table > >> confiured on all swtches>> Mar 19 18:33:40 528816 [43C05960] -> os_trap_cvprocess_requet: > >> Received Generic Noice type:0x01 num:128 roduer:2 from LID:0x001B > >> TID:00000000000003f > >> Mar 19 18:33:40 52890 [43C05960 -> osm_reort_notie: Reprting Generic > >> Notice type:1 nu:128 fro LID:0x001B > >> GID:0xfe8000000000000,0x005ad0000081a7 > >> Mar 19 18:33:40 546019[42803960] -> SUBNT UP > >> Mar 19 18:3:40551048 [42803960] -> __osm_trap_rcv_process_request: > >> Receive Genric Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:x0000000000000034 > >> Mar 19 18:33:40 5519 [42803960] -> osm_report_notice: Reporting Gneric > >> Notice typ:1 num:128 from LID:00148 > >> GID:0xfe80000000000000x0005ad00000281b3 > >> Mar 19 18:33:40 594994 [44606960] -> __osm_trap_rc_pocess_request: > >> Received Generic Notice type:0x01 num:128 Producer2 from LID:0x001B > >> TID:0x0000000000000040 > >> Mar 19 18:33:40 595074 [44606960] -> om_report_notice: Reporting Generic > >> Noice type:1 num:128 from LD:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:33:40 83973 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Prodcer:2 from LID:0x001B > >> TID:0x0000000000000041 > >> Mar 19 18:33:40 840064 [43204960] -> osm_report_notice: Reporting Gneric > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x005ad0000281a7 > >> Mar 19 18:33:40 861973 [43204960] -> __osm_trap_rcv_process_request: > >> Received Genric Notice type:0x01 num:128 Producer: from LID:0x001B > >> TID:0x0000000000000042 > >> Mar 19 18:33:40 862075 [43204960]-> osm_report_notice: Reporting Generic > >> Ntice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x005ad00000281a7 > >> Mar 19 18:33:40 83777 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic otice type:0x01 num:128 Producr:2 from LID:0x001B > >> TID:0x0000000000000043 > >> Mar 19 18:33:40 907658 [4803960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:33:40 947974 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:33:40 965203 [45007960] -> SUBNET UP > >> Mar 19 18:33:41 350582 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:33:41 417662 [43204960] -> SUBNET UP > >> Mar 19 18:33:41 571156 [45A08960] -> __osm_trap_rc_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001b > >> Mar 19 18:33:41 571256 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 28 times consecutively > >> Mar 19 18:35:50 971684 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000035 > >> Mar 19 18:35:50 971926 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:35:50 972301 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000044 > >> Mar 19 18:35:50 972378 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:35:51 342826 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 342845 [43204960] -> Removed port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 19 18:35:51 342866 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 342873 [43204960] -> Removed port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 19 18:35:51 342895 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 342901 [43204960] -> Removed port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 19 18:35:51 342923 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 342930 [43204960] -> Removed port with > >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > >> Mar 19 18:35:51 342968 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 342972 [43204960] -> Removed port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 19 18:35:51 342989 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 342994 [43204960] -> Removed port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 19 18:35:51 343011 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 343016 [43204960] -> Removed port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 19 18:35:51 343033 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 343038 [43204960] -> Removed port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 19 18:35:51 343189 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 343194 [43204960] -> Removed port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 19 18:35:51 343234 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 343239 [43204960] -> Removed port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 19 18:35:51 343253 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 343258 [43204960] -> Removed port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 19 18:35:51 343273 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 343278 [43204960] -> Removed port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 19 18:35:51 343293 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 343298 [43204960] -> Removed port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 19 18:35:51 343314 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 343319 [43204960] -> Removed port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 19 18:35:51 343334 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 343393 [43204960] -> Removed port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 19 18:35:51 343410 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 343415 [43204960] -> Removed port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 19 18:35:51 343430 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:35:51 343435 [43204960] -> Removed port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 19 18:35:51 376525 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:35:51 433087 [43204960] -> SUBNET UP > >> Mar 19 18:35:51 849193 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:35:51 901399 [42803960] -> SUBNET UP > >> Mar 19 18:36:44 359407 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000036 > >> Mar 19 18:36:44 359652 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:36:44 365352 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000037 > >> Mar 19 18:36:44 365427 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:36:44 365432 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000045 > >> Mar 19 18:36:44 365567 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:36:44 371481 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000046 > >> Mar 19 18:36:44 371591 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:36:44 711649 [43204960] -> osm_drop_mgr_process: ERR 0108: Unknown > >> remote side for node 0x0005ad0000027c84 port 19. Adding to light sweep > >> sampling list > >> Mar 19 18:36:44 711691 [43204960] -> Directed Path Dump of 5 hop path: > >> Path = [0][1][11][1][6][18] > >> Mar 19 18:36:44 711738 [43204960] -> osm_drop_mgr_process: ERR 0108: Unknown > >> remote side for node 0x0005ad00000281b3 port 24. Adding to light sweep > >> sampling list > >> Mar 19 18:36:44 711748 [43204960] -> Directed Path Dump of 4 hop path: > >> Path = [0][1][11][1][6] > >> Mar 19 18:36:44 721719 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721730 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 19 18:36:44 721736 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721744 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 19 18:36:44 721749 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721756 [43204960] -> Discovered new port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 19 18:36:44 721761 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721767 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > >> Mar 19 18:36:44 721772 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721779 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 19 18:36:44 721784 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721790 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 19 18:36:44 721795 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721802 [43204960] -> Discovered new port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 19 18:36:44 721826 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721831 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 19 18:36:44 721845 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721850 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 19 18:36:44 721854 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721859 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 19 18:36:44 721862 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721867 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 19 18:36:44 721871 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721876 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 19 18:36:44 721880 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721884 [43204960] -> Discovered new port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 19 18:36:44 721888 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721893 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 19 18:36:44 721897 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721923 [43204960] -> Discovered new port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 19 18:36:44 721927 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721932 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 19 18:36:44 721936 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:36:44 721941 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 19 18:36:44 752683 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:36:44 820881 [43C05960] -> SUBNET UP > >> Mar 19 18:36:45 198990 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:36:45 258878 [44606960] -> SUBNET UP > >> Mar 19 18:37:00 446068 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000000 > >> Mar 19 18:37:00 446346 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:37:00 564122 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000001 > >> Mar 19 18:37:00 564810 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:37:00 589920 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000002 > >> Mar 19 18:37:00 590067 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:37:00 611770 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000003 > >> Mar 19 18:37:00 611916 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:37:00 800652 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000004 > >> Mar 19 18:37:00 817995 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:37:00 861575 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:37:00 983908 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000005 > >> Mar 19 18:37:00 984004 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:37:01 012195 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000006 > >> Mar 19 18:37:01 012283 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:37:01 034177 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000007 > >> Mar 19 18:37:01 034272 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:37:01 056001 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000008 > >> Mar 19 18:37:01 056068 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:37:01 074341 [43204960] -> SUBNET UP > >> Mar 19 18:37:01 252871 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000009 > >> Mar 19 18:37:01 253037 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:37:01 303407 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000a > >> Mar 19 18:37:01 303490 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:37:01 325057 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000b > >> Mar 19 18:37:01 325160 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 11 times consecutively > >> Mar 19 18:37:01 334059 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000c > >> Mar 19 18:37:01 334118 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 12 times consecutively > >> Mar 19 18:37:01 474293 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:01 474317 [45007960] -> Removed port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 19 18:37:01 474341 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:01 474348 [45007960] -> Removed port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 19 18:37:01 474371 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:01 474378 [45007960] -> Removed port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 19 18:37:01 509205 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:37:01 557110 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000d > >> Mar 19 18:37:01 557172 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 13 times consecutively > >> Mar 19 18:37:01 565676 [43C05960] -> SUBNET UP > >> Mar 19 18:37:01 576199 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000e > >> Mar 19 18:37:01 576270 [41401960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 14 times consecutively > >> Mar 19 18:37:01 599713 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000f > >> Mar 19 18:37:01 599779 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 15 times consecutively > >> Mar 19 18:37:01 707096 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000010 > >> Mar 19 18:37:01 707150 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 16 times consecutively > >> Mar 19 18:37:01 921406 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:01 921423 [45A08960] -> Removed port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 19 18:37:01 921448 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:01 921455 [45A08960] -> Removed port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 19 18:37:01 921495 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:01 921502 [45A08960] -> Removed port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 19 18:37:01 925845 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:01 925855 [41E02960] -> Discovered new port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 19 18:37:01 925859 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:01 925864 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 19 18:37:01 925868 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:01 925873 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 19 18:37:01 956691 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:37:01 999372 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000011 > >> Mar 19 18:37:01 999470 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 17 times consecutively > >> Mar 19 18:37:02 012194 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000012 > >> Mar 19 18:37:02 012256 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 18 times consecutively > >> Mar 19 18:37:02 014327 [41401960] -> SUBNET UP > >> Mar 19 18:37:02 034202 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000013 > >> Mar 19 18:37:02 034250 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 19 times consecutively > >> Mar 19 18:37:02 056015 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000014 > >> Mar 19 18:37:02 056060 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 20 times consecutively > >> Mar 19 18:37:02 270731 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000015 > >> Mar 19 18:37:02 270777 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 21 times consecutively > >> Mar 19 18:37:02 271169 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000038 > >> Mar 19 18:37:02 271347 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:37:02 462374 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000039 > >> Mar 19 18:37:02 462511 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:37:02 496247 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000003a > >> Mar 19 18:37:02 496310 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:37:02 624890 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:02 624902 [45A08960] -> Discovered new port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 19 18:37:02 624908 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:02 624914 [45A08960] -> Discovered new port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 19 18:37:02 624919 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:02 624926 [45A08960] -> Discovered new port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 19 18:37:02 655848 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:37:02 709115 [42803960] -> SUBNET UP > >> Mar 19 18:37:03 082995 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000003b > >> Mar 19 18:37:03 106373 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:37:03 136757 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:37:03 178027 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000047 > >> Mar 19 18:37:03 178064 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000003c > >> Mar 19 18:37:03 178139 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:37:03 178160 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:37:03 315226 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000015 > >> Mar 19 18:37:03 315289 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 22 times consecutively > >> Mar 19 18:37:03 341474 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000016 > >> Mar 19 18:37:03 341549 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 23 times consecutively > >> Mar 19 18:37:03 341616 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000003d > >> Mar 19 18:37:03 342446 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:37:03 343169 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 19 18:37:03 343262 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x5 > >> trans_id................0x14d08 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x11 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][6][16] > >> Return path: [0][9][18][D][3][11] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 > >> > >> 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 19 18:37:03 343371 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 19 18:37:03 343364 [45007960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 19 18:37:03 343415 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x5 > >> trans_id................0x14d09 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x12 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][6][16] > >> Return path: [0][9][18][D][3][11] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 > >> > >> 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 19 18:37:03 343409 [45007960] -> PortInfo dump: > >> port number.............0x11 > >> node_guid...............0x0005ad0000027c84 > >> port_guid...............0x0005ad0000027c84 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x11 > >> link_width_enabled......0x2 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............INIT > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 19 18:37:03 343481 [45007960] -> Capabilities Mask: > >> Mar 19 18:37:03 343532 [45007960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 19 18:37:03 343537 [45007960] -> PortInfo dump: > >> port number.............0x12 > >> node_guid...............0x0005ad0000027c84 > >> port_guid...............0x0005ad0000027c84 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x11 > >> link_width_enabled......0x2 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............INIT > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 19 18:37:03 343555 [45007960] -> Capabilities Mask: > >> Mar 19 18:37:03 348684 [45007960] -> SUBNET UP > >> Mar 19 18:37:03 461748 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000048 > >> Mar 19 18:37:03 461958 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:37:03 484827 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000003e > >> Mar 19 18:37:03 486448 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:37:03 528040 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000049 > >> Mar 19 18:37:03 528154 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:37:03 580196 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000004a > >> Mar 19 18:37:03 580534 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:37:03 599784 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000004b > >> Mar 19 18:37:03 599879 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:37:03 621883 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000004c > >> Mar 19 18:37:03 621940 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:37:03 707894 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:37:03 764678 [43204960] -> SUBNET UP > >> Mar 19 18:37:03 783783 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000004d > >> Mar 19 18:37:03 783844 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:37:04 000228 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000004e > >> Mar 19 18:37:04 000628 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:37:04 022198 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000004f > >> Mar 19 18:37:04 022299 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:37:04 043985 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000050 > >> Mar 19 18:37:04 044052 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:37:04 155809 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:37:04 210448 [41401960] -> SUBNET UP > >> Mar 19 18:37:04 504490 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000017 > >> Mar 19 18:37:04 504569 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 24 times consecutively > >> Mar 19 18:37:04 570084 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:37:04 626298 [43C05960] -> SUBNET UP > >> Mar 19 18:37:54 424084 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000051 > >> Mar 19 18:37:54 424430 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:37:54 424457 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000003f > >> Mar 19 18:37:54 424522 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:37:54 722515 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 722536 [44606960] -> Removed port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 19 18:37:54 722558 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 722565 [44606960] -> Removed port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 19 18:37:54 722587 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 722594 [44606960] -> Removed port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 19 18:37:54 722636 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 722641 [44606960] -> Removed port with > >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > >> Mar 19 18:37:54 722658 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 722663 [44606960] -> Removed port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 19 18:37:54 722679 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 722684 [44606960] -> Removed port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 19 18:37:54 722701 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 722706 [44606960] -> Removed port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 19 18:37:54 722723 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 722728 [44606960] -> Removed port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 19 18:37:54 722875 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 722880 [44606960] -> Removed port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 19 18:37:54 722909 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 722915 [44606960] -> Removed port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 19 18:37:54 722929 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 722934 [44606960] -> Removed port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 19 18:37:54 722949 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 722955 [44606960] -> Removed port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 19 18:37:54 722970 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 722975 [44606960] -> Removed port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 19 18:37:54 722992 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 722997 [44606960] -> Removed port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 19 18:37:54 723012 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 723073 [44606960] -> Removed port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 19 18:37:54 723090 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 723095 [44606960] -> Removed port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 19 18:37:54 723111 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:37:54 723116 [44606960] -> Removed port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 19 18:37:54 756302 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:37:54 806787 [45A08960] -> SUBNET UP > >> Mar 19 18:37:55 149566 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:37:55 198855 [41401960] -> SUBNET UP > >> Mar 19 18:38:48 131054 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000040 > >> Mar 19 18:38:48 131349 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:38:48 137230 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000052 > >> Mar 19 18:38:48 137268 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000041 > >> Mar 19 18:38:48 137395 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:38:48 137432 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:38:48 143370 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000053 > >> Mar 19 18:38:48 144327 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:38:48 529052 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529065 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 19 18:38:48 529071 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529078 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > >> Mar 19 18:38:48 529083 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529090 [41E02960] -> Discovered new port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 19 18:38:48 529095 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529101 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 19 18:38:48 529106 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529113 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 19 18:38:48 529118 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529124 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 19 18:38:48 529129 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529136 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 19 18:38:48 529141 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529147 [41E02960] -> Discovered new port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 19 18:38:48 529152 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529159 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 19 18:38:48 529164 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529170 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 19 18:38:48 529175 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529182 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 19 18:38:48 529186 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529193 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 19 18:38:48 529198 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529204 [41E02960] -> Discovered new port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 19 18:38:48 529209 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529216 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 19 18:38:48 529271 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529277 [41E02960] -> Discovered new port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 19 18:38:48 529281 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529286 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 19 18:38:48 529290 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:38:48 529294 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 19 18:38:48 560082 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:38:48 630464 [43204960] -> SUBNET UP > >> Mar 19 18:38:49 018498 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:38:49 073355 [45007960] -> SUBNET UP > >> Mar 19 18:39:04 189829 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000000 > >> Mar 19 18:39:04 190072 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:39:04 307827 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000001 > >> Mar 19 18:39:04 307940 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:39:04 330104 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000002 > >> Mar 19 18:39:04 330210 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:39:04 468676 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000003 > >> Mar 19 18:39:04 468758 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:39:04 680305 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000004 > >> Mar 19 18:39:04 680400 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:39:04 702144 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000005 > >> Mar 19 18:39:04 702286 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:39:04 704346 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:39:04 704354 [43204960] -> Removed port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 19 18:39:04 739059 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:39:04 739896 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000006 > >> Mar 19 18:39:04 783807 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:39:04 797411 [44606960] -> SUBNET UP > >> Mar 19 18:39:04 849970 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000007 > >> Mar 19 18:39:04 850195 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:39:04 853735 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000008 > >> Mar 19 18:39:04 853809 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:39:04 897727 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000009 > >> Mar 19 18:39:04 897860 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:39:04 901577 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000a > >> Mar 19 18:39:04 901719 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 19 18:39:04 923271 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000b > >> Mar 19 18:39:04 923377 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 11 times consecutively > >> Mar 19 18:39:05 106246 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000c > >> Mar 19 18:39:05 106314 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 12 times consecutively > >> Mar 19 18:39:05 178215 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000d > >> Mar 19 18:39:05 178258 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 13 times consecutively > >> Mar 19 18:39:05 272913 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000e > >> Mar 19 18:39:05 272983 [42803960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 14 times consecutively > >> Mar 19 18:39:05 339633 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000f > >> Mar 19 18:39:05 339679 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 15 times consecutively > >> Mar 19 18:39:05 469093 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000010 > >> Mar 19 18:39:05 469145 [41401960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 16 times consecutively > >> Mar 19 18:39:05 484587 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000011 > >> Mar 19 18:39:05 484633 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 17 times consecutively > >> Mar 19 18:39:05 574251 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000012 > >> Mar 19 18:39:05 574301 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 18 times consecutively > >> Mar 19 18:39:05 602665 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000013 > >> Mar 19 18:39:05 602700 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 19 times consecutively > >> Mar 19 18:39:05 646331 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000014 > >> Mar 19 18:39:05 646369 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 20 times consecutively > >> Mar 19 18:39:05 834613 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000015 > >> Mar 19 18:39:05 834685 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 21 times consecutively > >> Mar 19 18:39:05 851128 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000016 > >> Mar 19 18:39:05 851166 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 22 times consecutively > >> Mar 19 18:39:05 875540 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000017 > >> Mar 19 18:39:05 875592 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 23 times consecutively > >> Mar 19 18:39:05 897378 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000018 > >> Mar 19 18:39:05 897424 [42803960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 24 times consecutively > >> Mar 19 18:39:05 907232 [4780B960] -> umad_receiver: ERR 5409: send completed > >> with error (method=0x1 attr=0x15 trans_id=0x124ef0001c2fe) -- dropping > >> Mar 19 18:39:05 907249 [4780B960] -> umad_receiver: ERR 5411: DR SMP > >> Mar 19 18:39:05 907259 [4780B960] -> __osm_sm_mad_ctrl_send_err_cb: ERR > >> 3113: MAD completed in error (IB_TIMEOUT) > >> Mar 19 18:39:05 907295 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x1 (SubnGet) > >> D bit...................0x0 > >> status..................0x0 > >> hop_ptr.................0x0 > >> hop_count...............0x6 > >> trans_id................0x1c2fe > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x1 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][6][16][8] > >> Return path: [0][0][0][0][0][0][0] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 19 18:39:05 907372 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:39:05 907384 [41401960] -> Removed port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 19 18:39:05 907407 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:39:05 907414 [41401960] -> Removed port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 19 18:39:05 907480 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:39:05 907485 [41401960] -> Removed port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 19 18:39:05 907577 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:39:05 907582 [41401960] -> Removed port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 19 18:39:05 907618 [41401960] -> osm_drop_mgr_process: ERR 0108: Unknown > >> remote side for node 0x0005ad0000027c84 port 8. Adding to light sweep > >> sampling list > >> Mar 19 18:39:05 907657 [41401960] -> Directed Path Dump of 5 hop path: > >> Path = [0][1][11][1][6][16] > >> Mar 19 18:39:05 911559 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:39:05 911572 [43204960] -> Discovered new port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 19 18:39:05 927229 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000019 > >> Mar 19 18:39:05 927285 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 25 times consecutively > >> Mar 19 18:39:05 942538 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:39:06 000027 [41E02960] -> SUBNET UP > >> Mar 19 18:39:06 130255 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001a > >> Mar 19 18:39:06 130308 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 26 times consecutively > >> Mar 19 18:39:06 131922 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000042 > >> Mar 19 18:39:06 132063 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:39:06 154579 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001b > >> Mar 19 18:39:06 154681 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 27 times consecutively > >> Mar 19 18:39:06 176248 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001c > >> Mar 19 18:39:06 176304 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 28 times consecutively > >> Mar 19 18:39:06 198132 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001d > >> Mar 19 18:39:06 198195 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 29 times consecutively > >> Mar 19 18:39:06 230022 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001e > >> Mar 19 18:39:06 230108 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 30 times consecutively > >> Mar 19 18:39:06 230229 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000043 > >> Mar 19 18:39:06 230311 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:39:06 399543 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:39:06 399556 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 19 18:39:06 399562 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:39:06 399569 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 19 18:39:06 399574 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:39:06 399580 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 19 18:39:06 399585 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 19 18:39:06 399592 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 19 18:39:06 430598 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:39:06 494689 [44606960] -> SUBNET UP > >> Mar 19 18:39:06 837303 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001f > >> Mar 19 18:39:06 837446 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 31 times consecutively > >> Mar 19 18:39:06 838528 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000044 > >> Mar 19 18:39:06 838636 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:39:06 876308 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:39:07 028376 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000020 > >> Mar 19 18:39:07 028459 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 32 times consecutively > >> Mar 19 18:39:07 028545 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000045 > >> Mar 19 18:39:07 028652 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:39:07 030190 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000054 > >> Mar 19 18:39:07 030277 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:39:07 096812 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000046 > >> Mar 19 18:39:07 096959 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:39:07 111719 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 19 18:39:07 111759 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x5 > >> trans_id................0x1dfac > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x11 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][4][16] > >> Return path: [0][9][18][D][1][11] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 > >> > >> 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 19 18:39:07 111810 [41E02960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 19 18:39:07 111814 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 19 18:39:07 111831 [41E02960] -> PortInfo dump: > >> port number.............0x11 > >> node_guid...............0x0005ad0000027c84 > >> port_guid...............0x0005ad0000027c84 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x11 > >> link_width_enabled......0x2 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............INIT > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 19 18:39:07 111868 [41E02960] -> Capabilities Mask: > >> Mar 19 18:39:07 111844 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x5 > >> trans_id................0x1dfad > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x12 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][4][16] > >> Return path: [0][9][18][D][1][11] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 11 02 03 02 > >> > >> 12 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 19 18:39:07 112011 [41401960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 19 18:39:07 112018 [41401960] -> PortInfo dump: > >> port number.............0x12 > >> node_guid...............0x0005ad0000027c84 > >> port_guid...............0x0005ad0000027c84 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x11 > >> link_width_enabled......0x2 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............INIT > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 19 18:39:07 112034 [41401960] -> Capabilities Mask: > >> Mar 19 18:39:07 117211 [45A08960] -> SUBNET UP > >> Mar 19 18:39:07 354540 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000047 > >> Mar 19 18:39:07 354686 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 19 18:39:07 383453 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000055 > >> Mar 19 18:39:07 383530 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:39:07 497601 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:39:07 548184 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000056 > >> Mar 19 18:39:07 548217 [43C05960] -> SUBNET UP > >> Mar 19 18:39:07 548427 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:39:07 878403 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000057 > >> Mar 19 18:39:07 887312 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000058 > >> Mar 19 18:39:07 888156 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:39:07 929819 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:39:07 929834 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:39:07 931166 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000059 > >> Mar 19 18:39:07 931288 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 19 18:39:07 946406 [42803960] -> SUBNET UP > >> Mar 19 18:39:08 073735 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000020 > >> Mar 19 18:39:08 073811 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 33 times consecutively > >> Mar 19 18:39:08 400790 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 18:39:08 467925 [45A08960] -> SUBNET UP > >> Mar 19 20:24:07 009911 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0020 > >> TID:0x0000000000000020 > >> Mar 19 20:24:07 010153 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0020 > >> GID:0xfe80000000000000,0x0005ad00000281ad > >> Mar 19 20:24:07 010966 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0001 > >> TID:0x000000000000001a > >> Mar 19 20:24:07 011064 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0001 > >> GID:0xfe80000000000000,0x0005ad0000027c6a > >> Mar 19 20:24:07 390927 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 20:24:07 453747 [43204960] -> SUBNET UP > >> Mar 19 20:24:07 839927 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 20:24:07 895694 [45A08960] -> SUBNET UP > >> Mar 19 20:24:08 049066 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0001 > >> TID:0x000000000000001a > >> Mar 19 20:24:08 049322 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0001 > >> GID:0xfe80000000000000,0x0005ad0000027c6a > >> Mar 19 20:24:08 433979 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 20:24:08 487950 [43204960] -> SUBNET UP > >> Mar 19 20:26:28 608381 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0020 > >> TID:0x0000000000000021 > >> Mar 19 20:26:28 608406 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0001 > >> TID:0x000000000000001b > >> Mar 19 20:26:28 608685 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0020 > >> GID:0xfe80000000000000,0x0005ad00000281ad > >> Mar 19 20:26:28 608693 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0001 > >> GID:0xfe80000000000000,0x0005ad0000027c6a > >> Mar 19 20:26:28 972140 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 20:26:29 028682 [43C05960] -> SUBNET UP > >> Mar 19 20:26:29 399649 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 20:26:29 465737 [45007960] -> SUBNET UP > >> Mar 19 21:30:38 775260 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0146 > >> TID:0x000000000000002f > >> Mar 19 21:30:38 775533 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0146 > >> GID:0xfe80000000000000,0x0005ad00000281b6 > >> Mar 19 21:30:38 777083 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0143 > >> TID:0x0000000000000037 > >> Mar 19 21:30:38 777242 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0143 > >> GID:0xfe80000000000000,0x0005ad00000281b9 > >> Mar 19 21:30:39 144779 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 21:30:39 200635 [43204960] -> SUBNET UP > >> Mar 19 21:30:39 536003 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 19 21:30:39 591216 [42803960] -> SUBNET UP > >> Mar 20 14:06:48 971082 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000021 > >> Mar 20 14:06:48 971376 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:06:49 346734 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:06:49 346761 [42803960] -> Removed port with > >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > >> Mar 20 14:06:49 381394 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:06:49 440803 [43204960] -> SUBNET UP > >> Mar 20 14:07:09 098449 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000048 > >> Mar 20 14:07:09 098708 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:07:09 098733 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000005a > >> Mar 20 14:07:09 098777 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:07:09 417844 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 417862 [42803960] -> Removed port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:07:09 417879 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 417885 [42803960] -> Removed port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:07:09 417902 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 417907 [42803960] -> Removed port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 20 14:07:09 417924 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 417929 [42803960] -> Removed port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:07:09 417945 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 417951 [42803960] -> Removed port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:07:09 417967 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 417973 [42803960] -> Removed port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 20 14:07:09 417989 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 417994 [42803960] -> Removed port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:07:09 418131 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 418137 [42803960] -> Removed port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 20 14:07:09 418168 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 418173 [42803960] -> Removed port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:07:09 418188 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 418193 [42803960] -> Removed port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 20 14:07:09 418207 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 418212 [42803960] -> Removed port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:07:09 418227 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 418232 [42803960] -> Removed port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 20 14:07:09 418248 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 418253 [42803960] -> Removed port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:07:09 418285 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 418290 [42803960] -> Removed port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:07:09 418306 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 418362 [42803960] -> Removed port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:07:09 418378 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:07:09 418383 [42803960] -> Removed port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 20 14:07:09 451317 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:07:09 502755 [41401960] -> SUBNET UP > >> Mar 20 14:07:09 902534 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:07:09 955229 [45A08960] -> SUBNET UP > >> Mar 20 14:08:03 850926 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000049 > >> Mar 20 14:08:03 851134 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:08:03 856880 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000004a > >> Mar 20 14:08:03 856955 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:08:03 866819 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000005b > >> Mar 20 14:08:03 866977 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:08:03 963024 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000005c > >> Mar 20 14:08:03 963130 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:08:04 106856 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000005d > >> Mar 20 14:08:04 106995 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:08:04 193747 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193766 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 20 14:08:04 193771 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193777 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 20 14:08:04 193781 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193786 [44606960] -> Discovered new port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:08:04 193790 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193795 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:08:04 193799 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193804 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:08:04 193808 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193813 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:08:04 193817 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193822 [44606960] -> Discovered new port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:08:04 193826 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193830 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:08:04 193834 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193839 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:08:04 193843 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193848 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 20 14:08:04 193852 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193857 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 20 14:08:04 193861 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193866 [44606960] -> Discovered new port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 20 14:08:04 193870 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193874 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:08:04 193878 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193883 [44606960] -> Discovered new port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 20 14:08:04 193938 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193944 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:08:04 193948 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:04 193953 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:08:04 224695 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:08:04 281046 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:08:04 281106 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x61eec > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x13 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][17][2][4] > >> Return path: [0][9][14][E][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:08:04 281154 [41401960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:08:04 281159 [41401960] -> PortInfo dump: > >> port number.............0x13 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:08:04 281172 [41401960] -> Capabilities Mask: > >> Mar 20 14:08:04 281187 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:08:04 281213 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x61eed > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][17][2][4] > >> Return path: [0][9][14][E][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:08:04 281279 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:08:04 281316 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x61eee > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][17][2][4] > >> Return path: [0][9][14][E][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:08:04 281392 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:08:04 281416 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x61eef > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][6] > >> Return path: [0][9][18][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:08:04 281515 [44606960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:08:04 281522 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:08:04 281542 [44606960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:08:04 281553 [44606960] -> Capabilities Mask: > >> Mar 20 14:08:04 281561 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x61ef0 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][6] > >> Return path: [0][9][18][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:08:04 281572 [44606960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:08:04 281590 [44606960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:08:04 281600 [44606960] -> Capabilities Mask: > >> Mar 20 14:08:04 281623 [44606960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:08:04 281626 [44606960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:08:04 281635 [44606960] -> Capabilities Mask: > >> Mar 20 14:08:04 281637 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:08:04 281652 [44606960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:08:04 281663 [44606960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:08:04 281673 [44606960] -> Capabilities Mask: > >> Mar 20 14:08:04 281675 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x61ef1 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][6] > >> Return path: [0][9][18][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:08:04 281721 [41E02960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:08:04 281726 [41E02960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:08:04 281736 [41E02960] -> Capabilities Mask: > >> Mar 20 14:08:04 287136 [44606960] -> SUBNET UP > >> Mar 20 14:08:04 711595 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:08:04 766488 [45A08960] -> SUBNET UP > >> Mar 20 14:08:19 947200 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000000 > >> Mar 20 14:08:19 947479 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:08:20 086909 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000001 > >> Mar 20 14:08:20 087084 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:08:20 108865 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000002 > >> Mar 20 14:08:20 109210 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:08:20 109996 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000003 > >> Mar 20 14:08:20 110407 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:08:20 222523 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000004 > >> Mar 20 14:08:20 222613 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:08:20 404596 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000005 > >> Mar 20 14:08:20 404698 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:08:20 476804 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000006 > >> Mar 20 14:08:20 476897 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:08:20 572434 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000007 > >> Mar 20 14:08:20 572520 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:08:20 621715 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:20 621726 [42803960] -> Removed port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:08:20 656232 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:08:20 698700 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000008 > >> Mar 20 14:08:20 698794 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:08:20 708598 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000009 > >> Mar 20 14:08:20 708698 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:08:20 713653 [45007960] -> SUBNET UP > >> Mar 20 14:08:20 730554 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000a > >> Mar 20 14:08:20 730697 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:08:20 754139 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000b > >> Mar 20 14:08:20 754251 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 11 times consecutively > >> Mar 20 14:08:20 947339 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000c > >> Mar 20 14:08:20 947426 [41401960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 12 times consecutively > >> Mar 20 14:08:20 975965 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000d > >> Mar 20 14:08:20 976024 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 13 times consecutively > >> Mar 20 14:08:20 997569 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000e > >> Mar 20 14:08:20 997648 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 14 times consecutively > >> Mar 20 14:08:21 019465 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000f > >> Mar 20 14:08:21 019512 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 15 times consecutively > >> Mar 20 14:08:21 064967 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000010 > >> Mar 20 14:08:21 065009 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 16 times consecutively > >> Mar 20 14:08:21 082838 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000011 > >> Mar 20 14:08:21 082877 [41401960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 17 times consecutively > >> Mar 20 14:08:21 100567 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000012 > >> Mar 20 14:08:21 100619 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 18 times consecutively > >> Mar 20 14:08:21 188128 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:21 188144 [43C05960] -> Removed port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:08:21 188166 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:21 188172 [43C05960] -> Removed port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:08:21 188194 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:21 188199 [43C05960] -> Removed port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:08:21 192421 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:21 192436 [41E02960] -> Discovered new port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:08:21 208455 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000013 > >> Mar 20 14:08:21 208499 [41401960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 19 times consecutively > >> Mar 20 14:08:21 223240 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:08:21 394585 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000014 > >> Mar 20 14:08:21 394665 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 20 times consecutively > >> Mar 20 14:08:21 419333 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000015 > >> Mar 20 14:08:21 419393 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 21 times consecutively > >> Mar 20 14:08:21 441228 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000016 > >> Mar 20 14:08:21 441276 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 22 times consecutively > >> Mar 20 14:08:21 462915 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000017 > >> Mar 20 14:08:21 462968 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 23 times consecutively > >> Mar 20 14:08:21 475440 [45007960] -> SUBNET UP > >> Mar 20 14:08:21 674045 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000018 > >> Mar 20 14:08:21 674084 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000004b > >> Mar 20 14:08:21 674137 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 24 times consecutively > >> Mar 20 14:08:21 674294 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:08:21 965885 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000004c > >> Mar 20 14:08:21 965992 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:08:22 092378 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:22 092395 [41401960] -> Removed port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:08:22 092415 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:22 092420 [41401960] -> Removed port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:08:22 092444 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:22 092449 [41401960] -> Removed port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:08:22 092625 [41401960] -> osm_drop_mgr_process: ERR 0108: Unknown > >> remote side for node 0x0005ad00000281b3 port 22. Adding to light sweep > >> sampling list > >> Mar 20 14:08:22 092655 [41401960] -> Directed Path Dump of 4 hop path: > >> Path = [0][1][11][1][4] > >> Mar 20 14:08:22 092663 [41401960] -> osm_drop_mgr_process: ERR 0108: Unknown > >> remote side for node 0x0005ad00000281b3 port 23. Adding to light sweep > >> sampling list > >> Mar 20 14:08:22 092672 [41401960] -> Directed Path Dump of 4 hop path: > >> Path = [0][1][11][1][4] > >> Mar 20 14:08:22 096789 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:22 096801 [41E02960] -> Discovered new port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:08:22 096805 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:22 096810 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:08:22 096814 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:22 096819 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:08:22 127266 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:08:22 184734 [45007960] -> SUBNET UP > >> Mar 20 14:08:22 541974 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:22 541985 [41401960] -> Discovered new port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:08:22 541989 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:22 541995 [41401960] -> Discovered new port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:08:22 541998 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:08:22 542003 [41401960] -> Discovered new port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:08:22 572711 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:08:22 611570 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000004d > >> Mar 20 14:08:22 611751 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:08:22 611770 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000005e > >> Mar 20 14:08:22 612060 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:08:22 623766 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:08:22 623814 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x66134 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][5] > >> Return path: [0][9][18][D][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:08:22 623876 [45007960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:08:22 623888 [45007960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:08:22 623907 [45007960] -> Capabilities Mask: > >> Mar 20 14:08:22 623945 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:08:22 623973 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x66135 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][5] > >> Return path: [0][9][18][D][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:08:22 624051 [44606960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:08:22 624056 [44606960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:08:22 624069 [44606960] -> Capabilities Mask: > >> Mar 20 14:08:22 629289 [45A08960] -> SUBNET UP > >> Mar 20 14:08:22 712180 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000018 > >> Mar 20 14:08:22 712238 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 25 times consecutively > >> Mar 20 14:08:22 869303 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000005f > >> Mar 20 14:08:22 869527 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:08:22 892522 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000004e > >> Mar 20 14:08:22 892707 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:08:22 957086 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000060 > >> Mar 20 14:08:22 957189 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:08:23 080551 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000061 > >> Mar 20 14:08:23 080621 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:08:23 102292 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000062 > >> Mar 20 14:08:23 102372 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:08:23 124176 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000063 > >> Mar 20 14:08:23 124278 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:08:23 285320 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000064 > >> Mar 20 14:08:23 285393 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:08:23 403309 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000065 > >> Mar 20 14:08:23 403388 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:08:23 425052 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000066 > >> Mar 20 14:08:23 425117 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:08:23 447189 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000067 > >> Mar 20 14:08:23 447266 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:08:23 535175 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:08:23 595127 [41401960] -> SUBNET UP > >> Mar 20 14:08:23 750323 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000018 > >> Mar 20 14:08:23 750432 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 26 times consecutively > >> Mar 20 14:08:23 960490 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:08:24 014256 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:08:24 014323 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x67b9d > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][6] > >> Return path: [0][9][18][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:08:24 014398 [41401960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:08:24 014408 [41401960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:08:24 014422 [41401960] -> Capabilities Mask: > >> Mar 20 14:08:24 019479 [41401960] -> SUBNET UP > >> Mar 20 14:11:00 201308 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F > >> TID:0x0000000000000018 > >> Mar 20 14:11:00 201580 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001F > >> GID:0xfe80000000000000,0x0005ad0000027c56 > >> Mar 20 14:11:00 554517 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:11:00 554538 [41E02960] -> Removed port with > >> GUID:0x0005ad000002516f LID range [0xBA,0xBA] of node:saguaro-24-1 HCA-1 > >> Mar 20 14:11:00 589140 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:11:00 641315 [45A08960] -> SUBNET UP > >> Mar 20 14:14:16 904140 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000068 > >> Mar 20 14:14:16 904369 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:14:16 904462 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000004f > >> Mar 20 14:14:16 904600 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:14:17 210726 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 210747 [41401960] -> Removed port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:14:17 210796 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 210802 [41401960] -> Removed port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:14:17 210818 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 210836 [41401960] -> Removed port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 20 14:14:17 210864 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 210869 [41401960] -> Removed port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:14:17 210885 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 210890 [41401960] -> Removed port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:14:17 210908 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 210913 [41401960] -> Removed port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 20 14:14:17 210931 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 210936 [41401960] -> Removed port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:14:17 211090 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 211096 [41401960] -> Removed port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 20 14:14:17 211127 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 211133 [41401960] -> Removed port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:14:17 211147 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 211153 [41401960] -> Removed port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 20 14:14:17 211169 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 211174 [41401960] -> Removed port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:14:17 211189 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 211194 [41401960] -> Removed port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 20 14:14:17 211212 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 211216 [41401960] -> Removed port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:14:17 211232 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 211237 [41401960] -> Removed port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:14:17 211253 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 211317 [41401960] -> Removed port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:14:17 211333 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:14:17 211338 [41401960] -> Removed port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 20 14:14:17 244432 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:14:17 292747 [42803960] -> SUBNET UP > >> Mar 20 14:14:17 698554 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:14:17 750419 [44606960] -> SUBNET UP > >> Mar 20 14:15:11 300343 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000050 > >> Mar 20 14:15:11 300577 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:15:11 306375 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000069 > >> Mar 20 14:15:11 306439 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000051 > >> Mar 20 14:15:11 306487 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:15:11 306514 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:15:11 312487 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000006a > >> Mar 20 14:15:11 312581 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:15:11 636546 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636559 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 20 14:15:11 636565 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636572 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 20 14:15:11 636577 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636584 [45007960] -> Discovered new port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:15:11 636589 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636595 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:15:11 636600 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636606 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:15:11 636612 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636618 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:15:11 636623 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636629 [45007960] -> Discovered new port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:15:11 636634 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636641 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:15:11 636646 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636652 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:15:11 636657 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636663 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 20 14:15:11 636668 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636675 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 20 14:15:11 636680 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636686 [45007960] -> Discovered new port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 20 14:15:11 636691 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636698 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:15:11 636703 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636709 [45007960] -> Discovered new port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 20 14:15:11 636742 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636750 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:15:11 636755 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:11 636761 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:15:11 667436 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:15:11 731917 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:15:11 732017 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x6b507 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x13 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][16][1][4] > >> Return path: [0][9][13][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:15:11 732102 [41401960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:15:11 732106 [41401960] -> PortInfo dump: > >> port number.............0x13 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:15:11 732128 [41401960] -> Capabilities Mask: > >> Mar 20 14:15:11 732160 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:15:11 732185 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x6b508 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][16][1][4] > >> Return path: [0][9][13][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:15:11 732254 [44606960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:15:11 732258 [44606960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:15:11 732269 [44606960] -> Capabilities Mask: > >> Mar 20 14:15:11 732300 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:15:11 732334 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x6b509 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][16][1][4] > >> Return path: [0][9][13][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:15:11 732420 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:15:11 732419 [45007960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:15:11 732451 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x6b50a > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][16][1][4] > >> Return path: [0][9][13][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:15:11 732447 [45007960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:15:11 732471 [45007960] -> Capabilities Mask: > >> Mar 20 14:15:11 732511 [45007960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:15:11 732516 [45007960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:15:11 732529 [45007960] -> Capabilities Mask: > >> Mar 20 14:15:11 732556 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:15:11 732591 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x6b50b > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][2][5] > >> Return path: [0][9][18][E][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:15:11 732653 [43204960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:15:11 732662 [43204960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:15:11 732673 [43204960] -> Capabilities Mask: > >> Mar 20 14:15:11 732705 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:15:11 732739 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x6b50c > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][2][5] > >> Return path: [0][9][18][E][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:15:11 732809 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:15:11 732805 [41E02960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:15:11 732839 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x6b50d > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][2][5] > >> Return path: [0][9][18][E][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:15:11 732837 [41E02960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:15:11 732856 [41E02960] -> Capabilities Mask: > >> Mar 20 14:15:11 732898 [41E02960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:15:11 732911 [41E02960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:15:11 732925 [41E02960] -> Capabilities Mask: > >> Mar 20 14:15:11 738354 [45A08960] -> SUBNET UP > >> Mar 20 14:15:12 115658 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:15:12 172029 [44606960] -> SUBNET UP > >> Mar 20 14:15:27 277617 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000000 > >> Mar 20 14:15:27 277863 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:15:27 510410 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000001 > >> Mar 20 14:15:27 510626 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:15:27 532239 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000002 > >> Mar 20 14:15:27 532443 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:15:27 533517 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000003 > >> Mar 20 14:15:27 533612 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:15:27 591171 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:27 591185 [41401960] -> Removed port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 20 14:15:27 591206 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:27 591211 [41401960] -> Removed port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:15:27 625811 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:15:27 668356 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000004 > >> Mar 20 14:15:27 668485 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:15:27 682282 [43204960] -> SUBNET UP > >> Mar 20 14:15:27 737313 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000005 > >> Mar 20 14:15:27 737387 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:15:27 809341 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000006 > >> Mar 20 14:15:27 809813 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:15:27 998181 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000007 > >> Mar 20 14:15:27 998331 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:15:28 012193 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000008 > >> Mar 20 14:15:28 012277 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:15:28 496329 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000009 > >> Mar 20 14:15:28 496422 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:15:28 624912 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:28 624940 [43C05960] -> Removed port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:15:28 624965 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:28 624972 [43C05960] -> Removed port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:15:28 625001 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:28 625008 [43C05960] -> Removed port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:15:28 629507 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:28 629518 [42803960] -> Discovered new port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:15:28 649776 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000a > >> Mar 20 14:15:28 660297 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:15:28 699777 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:15:28 716354 [41E02960] -> SUBNET UP > >> Mar 20 14:15:28 744686 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000b > >> Mar 20 14:15:28 744857 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 11 times consecutively > >> Mar 20 14:15:28 811329 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000c > >> Mar 20 14:15:28 811392 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 12 times consecutively > >> Mar 20 14:15:28 999808 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000d > >> Mar 20 14:15:28 999881 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 13 times consecutively > >> Mar 20 14:15:29 029918 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000e > >> Mar 20 14:15:29 029969 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 14 times consecutively > >> Mar 20 14:15:29 031783 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000052 > >> Mar 20 14:15:29 031900 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:15:29 037646 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:29 037662 [44606960] -> Removed port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 20 14:15:29 037683 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:29 037690 [44606960] -> Removed port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:15:29 037721 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:29 037726 [44606960] -> Removed port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 20 14:15:29 037741 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:29 037746 [44606960] -> Removed port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:15:29 037766 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:29 037771 [44606960] -> Removed port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 20 14:15:29 361560 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000053 > >> Mar 20 14:15:29 361622 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:15:29 433665 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:29 433674 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 20 14:15:29 433680 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:29 433687 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:15:29 433692 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:29 433698 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:15:29 433703 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:29 433709 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:15:29 464434 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:15:29 522011 [42803960] -> SUBNET UP > >> Mar 20 14:15:29 699605 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000006b > >> Mar 20 14:15:29 699782 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:15:29 701115 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000054 > >> Mar 20 14:15:29 701301 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:15:29 818974 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000055 > >> Mar 20 14:15:29 819054 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:15:29 992006 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000056 > >> Mar 20 14:15:29 992080 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:15:30 184132 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000006c > >> Mar 20 14:15:30 184205 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:15:30 207030 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000057 > >> Mar 20 14:15:30 207101 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:15:30 250541 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000006d > >> Mar 20 14:15:30 250635 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:15:30 317366 [45A08960] -> osm_drop_mgr_process: ERR 0108: Unknown > >> remote side for node 0x0005ad00000281a7 port 22. Adding to light sweep > >> sampling list > >> Mar 20 14:15:30 317409 [45A08960] -> Directed Path Dump of 4 hop path: > >> Path = [0][1][17][1][4] > >> Mar 20 14:15:30 494183 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000006e > >> Mar 20 14:15:30 494247 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:15:30 521869 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:30 521879 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 20 14:15:30 521885 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:30 521891 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 20 14:15:30 521896 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:30 521903 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:15:30 521908 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:30 521914 [43C05960] -> Discovered new port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 20 14:15:30 521919 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:15:30 521926 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:15:30 552581 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:15:30 553014 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000006f > >> Mar 20 14:15:30 592863 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:15:30 607595 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:15:30 607666 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x6f744 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][14][1][6] > >> Return path: [0][9][15][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:15:30 607770 [44606960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:15:30 607777 [44606960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:15:30 607794 [44606960] -> Capabilities Mask: > >> Mar 20 14:15:30 607914 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:15:30 607958 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x6f745 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][14][1][6] > >> Return path: [0][9][15][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:15:30 608014 [43204960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:15:30 608018 [43204960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:15:30 608031 [43204960] -> Capabilities Mask: > >> Mar 20 14:15:30 613309 [41E02960] -> SUBNET UP > >> Mar 20 14:15:30 995108 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:15:31 050102 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:15:31 050180 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x70486 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][4] > >> Return path: [0][9][18][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:15:31 050233 [45A08960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:15:31 050238 [45A08960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:15:31 050251 [45A08960] -> Capabilities Mask: > >> Mar 20 14:15:31 055273 [42803960] -> SUBNET UP > >> Mar 20 14:15:31 106129 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000e > >> Mar 20 14:15:31 106193 [41401960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 15 times consecutively > >> Mar 20 14:17:18 456260 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000058 > >> Mar 20 14:17:18 456512 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:17:18 456649 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000070 > >> Mar 20 14:17:18 456761 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:17:18 769730 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 769751 [45007960] -> Removed port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:17:18 769773 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 769780 [45007960] -> Removed port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:17:18 769803 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 769809 [45007960] -> Removed port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 20 14:17:18 769832 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 769838 [45007960] -> Removed port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:17:18 769858 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 769865 [45007960] -> Removed port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:17:18 769888 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 769895 [45007960] -> Removed port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 20 14:17:18 769927 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 769932 [45007960] -> Removed port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:17:18 770075 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 770081 [45007960] -> Removed port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 20 14:17:18 770109 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 770114 [45007960] -> Removed port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:17:18 770130 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 770135 [45007960] -> Removed port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 20 14:17:18 770150 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 770155 [45007960] -> Removed port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:17:18 770171 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 770176 [45007960] -> Removed port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 20 14:17:18 770193 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 770198 [45007960] -> Removed port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:17:18 770216 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 770221 [45007960] -> Removed port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:17:18 770238 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 770301 [45007960] -> Removed port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:17:18 770318 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:17:18 770323 [45007960] -> Removed port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 20 14:17:18 803377 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:17:18 855545 [44606960] -> SUBNET UP > >> Mar 20 14:17:19 249722 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:17:19 300999 [45A08960] -> SUBNET UP > >> Mar 20 14:18:11 663850 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000059 > >> Mar 20 14:18:11 664195 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:18:11 670836 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000071 > >> Mar 20 14:18:11 670964 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000005a > >> Mar 20 14:18:11 671199 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:18:11 672933 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:18:11 677654 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000072 > >> Mar 20 14:18:11 677826 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:18:12 026661 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026675 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 20 14:18:12 026681 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026688 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 20 14:18:12 026693 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026700 [44606960] -> Discovered new port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:18:12 026705 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026711 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:18:12 026716 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026723 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:18:12 026727 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026740 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:18:12 026745 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026751 [44606960] -> Discovered new port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:18:12 026758 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026764 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:18:12 026769 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026776 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:18:12 026781 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026787 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 20 14:18:12 026792 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026798 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 20 14:18:12 026803 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026809 [44606960] -> Discovered new port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 20 14:18:12 026814 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026821 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:18:12 026826 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026832 [44606960] -> Discovered new port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 20 14:18:12 026869 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026877 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:18:12 026882 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:12 026888 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:18:12 057534 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:18:12 133316 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:18:12 133419 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x72d97 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][14][3][6] > >> Return path: [0][9][15][F][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:18:12 133466 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:18:12 133467 [43204960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:18:12 133478 [43204960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:18:12 133490 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x72d98 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][14][3][6] > >> Return path: [0][9][15][F][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:18:12 133493 [43204960] -> Capabilities Mask: > >> Mar 20 14:18:12 133566 [45A08960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:18:12 133595 [45A08960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:18:12 133614 [45A08960] -> Capabilities Mask: > >> Mar 20 14:18:12 133583 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:18:12 133671 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x72d99 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][14][3][6] > >> Return path: [0][9][15][F][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:18:12 133760 [43C05960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:18:12 133788 [43C05960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:18:12 133807 [43C05960] -> Capabilities Mask: > >> Mar 20 14:18:12 139330 [41401960] -> SUBNET UP > >> Mar 20 14:18:12 496444 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:18:12 558965 [41401960] -> SUBNET UP > >> Mar 20 14:18:27 748551 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000000 > >> Mar 20 14:18:27 748795 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:18:27 888669 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000001 > >> Mar 20 14:18:27 888902 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:18:27 910605 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000002 > >> Mar 20 14:18:27 910710 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:18:27 911951 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000003 > >> Mar 20 14:18:27 912119 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:18:28 012957 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000004 > >> Mar 20 14:18:28 013058 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:18:28 075266 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000005 > >> Mar 20 14:18:28 075397 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:18:28 259000 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000006 > >> Mar 20 14:18:28 259121 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:18:28 308865 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000007 > >> Mar 20 14:18:28 309000 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:18:28 330606 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000008 > >> Mar 20 14:18:28 330714 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:18:28 444107 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000009 > >> Mar 20 14:18:28 444191 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:18:28 466156 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000a > >> Mar 20 14:18:28 466234 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:18:28 478021 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000b > >> Mar 20 14:18:28 478070 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 11 times consecutively > >> Mar 20 14:18:28 489091 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:28 489106 [43204960] -> Removed port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:18:28 521430 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000c > >> Mar 20 14:18:28 521499 [42803960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 12 times consecutively > >> Mar 20 14:18:28 523658 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:18:28 580295 [43204960] -> SUBNET UP > >> Mar 20 14:18:28 611805 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000d > >> Mar 20 14:18:28 611893 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 13 times consecutively > >> Mar 20 14:18:28 661292 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000e > >> Mar 20 14:18:28 661351 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 14 times consecutively > >> Mar 20 14:18:28 871670 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000f > >> Mar 20 14:18:28 871732 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 15 times consecutively > >> Mar 20 14:18:28 934440 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000010 > >> Mar 20 14:18:28 934505 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 16 times consecutively > >> Mar 20 14:18:28 941281 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:28 941303 [45A08960] -> Removed port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:18:28 941329 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:28 941336 [45A08960] -> Removed port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:18:28 941356 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:28 941363 [45A08960] -> Removed port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:18:28 941388 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:28 941395 [45A08960] -> Removed port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:18:28 941420 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:28 941426 [45A08960] -> Removed port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:18:28 945507 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:28 945515 [45A08960] -> Discovered new port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:18:28 956576 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000011 > >> Mar 20 14:18:28 956665 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 17 times consecutively > >> Mar 20 14:18:28 976211 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:18:29 033513 [42803960] -> SUBNET UP > >> Mar 20 14:18:29 071283 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000012 > >> Mar 20 14:18:29 071345 [41401960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 18 times consecutively > >> Mar 20 14:18:29 352103 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000013 > >> Mar 20 14:18:29 352155 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 19 times consecutively > >> Mar 20 14:18:29 376386 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000014 > >> Mar 20 14:18:29 376461 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 20 times consecutively > >> Mar 20 14:18:29 420228 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000015 > >> Mar 20 14:18:29 420282 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 21 times consecutively > >> Mar 20 14:18:29 421294 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000016 > >> Mar 20 14:18:29 421345 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 22 times consecutively > >> Mar 20 14:18:29 461135 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000017 > >> Mar 20 14:18:29 461179 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 23 times consecutively > >> Mar 20 14:18:29 633008 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000018 > >> Mar 20 14:18:29 633050 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000005b > >> Mar 20 14:18:29 633062 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 24 times consecutively > >> Mar 20 14:18:29 633350 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:18:29 733039 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000005c > >> Mar 20 14:18:29 733238 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:18:29 947440 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:29 947452 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:18:29 947457 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:29 947462 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:18:29 947465 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:29 947470 [44606960] -> Discovered new port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:18:29 947474 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:29 947479 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:18:29 947482 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:29 947487 [44606960] -> Discovered new port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:18:29 978182 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:18:30 027730 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:18:30 027819 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x762b8 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][4] > >> Return path: [0][9][18][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:18:30 027897 [41401960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:18:30 027901 [41401960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:18:30 027914 [41401960] -> Capabilities Mask: > >> Mar 20 14:18:30 027993 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:18:30 028043 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x762b9 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][4] > >> Return path: [0][9][18][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:18:30 028098 [45A08960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:18:30 028109 [45A08960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:18:30 028124 [45A08960] -> Capabilities Mask: > >> Mar 20 14:18:30 033824 [44606960] -> SUBNET UP > >> Mar 20 14:18:30 418497 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:30 418522 [43C05960] -> Removed port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:18:30 453167 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:18:30 494719 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000005d > >> Mar 20 14:18:30 494877 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:18:30 662496 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000073 > >> Mar 20 14:18:30 662564 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:18:30 662645 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000005e > >> Mar 20 14:18:30 662759 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:18:30 707085 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000005f > >> Mar 20 14:18:30 707179 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:18:30 728948 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000060 > >> Mar 20 14:18:30 729041 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:18:30 872332 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000061 > >> Mar 20 14:18:30 872412 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:18:30 899764 [45A08960] -> SUBNET UP > >> Mar 20 14:18:31 047423 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000062 > >> Mar 20 14:18:31 047611 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:18:31 165201 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000063 > >> Mar 20 14:18:31 165272 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:18:31 182461 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000074 > >> Mar 20 14:18:31 182653 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:18:31 248834 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000075 > >> Mar 20 14:18:31 248893 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:18:31 499830 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000076 > >> Mar 20 14:18:31 499908 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:18:31 521824 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000077 > >> Mar 20 14:18:31 521891 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:18:31 543713 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000078 > >> Mar 20 14:18:31 543784 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:18:31 589490 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:18:31 589499 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:18:31 620166 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:18:31 672647 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:18:31 672739 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x77d11 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][4] > >> Return path: [0][9][18][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:18:31 672817 [43C05960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:18:31 672823 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:18:31 672833 [43C05960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:18:31 672852 [43C05960] -> Capabilities Mask: > >> Mar 20 14:18:31 672861 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x77d12 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][4] > >> Return path: [0][9][18][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:18:31 672918 [45007960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:18:31 672922 [45007960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:18:31 672936 [45007960] -> Capabilities Mask: > >> Mar 20 14:18:31 678085 [45A08960] -> SUBNET UP > >> Mar 20 14:18:31 723715 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000018 > >> Mar 20 14:18:31 723815 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 25 times consecutively > >> Mar 20 14:18:32 061932 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:18:32 113545 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:18:32 113610 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x78a4d > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x13 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][15][4][4] > >> Return path: [0][9][18][D][4] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:18:32 113712 [42803960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:18:32 113725 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:18:32 113730 [42803960] -> PortInfo dump: > >> port number.............0x13 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x4 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:18:32 113745 [42803960] -> Capabilities Mask: > >> Mar 20 14:18:32 113751 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x78a4e > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][15][4][4] > >> Return path: [0][9][18][D][4] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:18:32 113803 [44606960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:18:32 113807 [44606960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x4 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:18:32 113820 [44606960] -> Capabilities Mask: > >> Mar 20 14:18:32 113845 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:18:32 113907 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x78a4f > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][15][4][4] > >> Return path: [0][9][18][D][4] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:18:32 113958 [41E02960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:18:32 113963 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:18:32 113969 [41E02960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x4 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:18:32 113986 [41E02960] -> Capabilities Mask: > >> Mar 20 14:18:32 113992 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x78a50 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][15][4][4] > >> Return path: [0][9][18][D][4] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:18:32 114052 [45A08960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:18:32 114066 [45A08960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x4 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:18:32 114089 [45A08960] -> Capabilities Mask: > >> Mar 20 14:18:32 114052 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:18:32 114171 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x78a51 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][13][1][6] > >> Return path: [0][9][13][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:18:32 114224 [42803960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:18:32 114228 [42803960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:18:32 114242 [42803960] -> Capabilities Mask: > >> Mar 20 14:18:32 119326 [42803960] -> SUBNET UP > >> Mar 20 14:23:02 506774 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000019 > >> Mar 20 14:23:02 507064 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:23:02 861642 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:23:02 861653 [43204960] -> Discovered new port with > >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:Topspin IB-DC > >> Mar 20 14:23:02 893030 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:23:02 943693 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:23:02 943765 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x5 > >> trans_id................0x79aff > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x1 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][5][18] > >> Return path: [0][9][18][D][2][13] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 13 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 2C 4C 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:23:02 943854 [43C05960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:23:02 943870 [43C05960] -> PortInfo dump: > >> port number.............0x1 > >> node_guid...............0x0005ad0000027c84 > >> port_guid...............0x0005ad0000027c84 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x13 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0x2C > >> vl_enforce..............0x4C > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:23:02 943886 [43C05960] -> Capabilities Mask: > >> Mar 20 14:23:02 948898 [43C05960] -> SUBNET UP > >> Mar 20 14:23:03 237496 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x00AF > >> TID:0x0000000000000000 > >> Mar 20 14:23:03 237710 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:4 num:144 from LID:0x00AF > >> GID:0xfe80000000000000,0x0005ad0000024b27 > >> Mar 20 14:23:03 605548 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:23:03 662757 [41401960] -> SUBNET UP > >> Mar 20 14:24:54 675782 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000079 > >> Mar 20 14:24:54 676077 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:24:54 677026 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000064 > >> Mar 20 14:24:54 677118 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:24:55 047478 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 047501 [43204960] -> Removed port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:24:55 047520 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 047525 [43204960] -> Removed port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:24:55 047541 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 047546 [43204960] -> Removed port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 20 14:24:55 047563 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 047569 [43204960] -> Removed port with > >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:Topspin IB-DC > >> Mar 20 14:24:55 047586 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 047591 [43204960] -> Removed port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:24:55 047607 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 047612 [43204960] -> Removed port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:24:55 047630 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 047635 [43204960] -> Removed port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 20 14:24:55 047652 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 047657 [43204960] -> Removed port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:24:55 047798 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 047803 [43204960] -> Removed port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 20 14:24:55 047836 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 047842 [43204960] -> Removed port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:24:55 047857 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 047862 [43204960] -> Removed port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 20 14:24:55 047877 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 047882 [43204960] -> Removed port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:24:55 047896 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 047902 [43204960] -> Removed port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 20 14:24:55 047918 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 047923 [43204960] -> Removed port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:24:55 047939 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 047988 [43204960] -> Removed port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:24:55 048005 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 048010 [43204960] -> Removed port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:24:55 048025 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:24:55 048030 [43204960] -> Removed port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 20 14:24:55 081006 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:24:55 130875 [45A08960] -> SUBNET UP > >> Mar 20 14:24:55 484995 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:24:55 535902 [42803960] -> SUBNET UP > >> Mar 20 14:25:48 653788 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000065 > >> Mar 20 14:25:48 654009 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:25:48 659749 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000007a > >> Mar 20 14:25:48 659790 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000066 > >> Mar 20 14:25:48 659814 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:25:48 659963 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:25:48 665972 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000007b > >> Mar 20 14:25:48 666050 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:25:49 025384 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025396 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 20 14:25:49 025401 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025406 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > >> Mar 20 14:25:49 025410 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025416 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 20 14:25:49 025420 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025425 [41E02960] -> Discovered new port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:25:49 025428 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025433 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:25:49 025437 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025442 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:25:49 025446 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025451 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:25:49 025461 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025466 [41E02960] -> Discovered new port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:25:49 025470 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025475 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:25:49 025483 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025488 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 20 14:25:49 025491 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025496 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 20 14:25:49 025500 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025505 [41E02960] -> Discovered new port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 20 14:25:49 025508 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025513 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:25:49 025517 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025522 [41E02960] -> Discovered new port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 20 14:25:49 025556 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025562 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:25:49 025565 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025570 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:25:49 025574 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:25:49 025579 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:25:49 056324 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:25:49 126247 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:25:49 126356 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x7d165 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x13 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][15][1][6] > >> Return path: [0][9][18][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:25:49 126409 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:25:49 126442 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x7d166 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][15][1][6] > >> Return path: [0][9][18][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:25:49 126496 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:25:49 126489 [42803960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:25:49 126535 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x7d167 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][15][1][6] > >> Return path: [0][9][18][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:25:49 126526 [42803960] -> PortInfo dump: > >> port number.............0x13 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:25:49 126567 [42803960] -> Capabilities Mask: > >> Mar 20 14:25:49 126613 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:25:49 126617 [42803960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:25:49 126658 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x7d168 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][15][1][6] > >> Return path: [0][9][18][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:25:49 126653 [42803960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:25:49 126687 [42803960] -> Capabilities Mask: > >> Mar 20 14:25:49 126703 [43204960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:25:49 126709 [43204960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:25:49 126744 [43204960] -> Capabilities Mask: > >> Mar 20 14:25:49 126765 [43C05960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:25:49 126770 [43C05960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:25:49 126874 [43C05960] -> Capabilities Mask: > >> Mar 20 14:25:49 126975 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:25:49 127015 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x7d169 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][13][1][6] > >> Return path: [0][9][13][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:25:49 127066 [45A08960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:25:49 127072 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:25:49 127084 [45A08960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:25:49 127103 [45A08960] -> Capabilities Mask: > >> Mar 20 14:25:49 127121 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x7d16a > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][13][1][6] > >> Return path: [0][9][13][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:25:49 127188 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:25:49 127220 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x7d16b > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][13][1][6] > >> Return path: [0][9][13][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:25:49 127326 [44606960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:25:49 127339 [44606960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:25:49 127357 [44606960] -> Capabilities Mask: > >> Mar 20 14:25:49 127378 [45007960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:25:49 127397 [45007960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:25:49 127410 [45007960] -> Capabilities Mask: > >> Mar 20 14:25:49 132961 [43204960] -> SUBNET UP > >> Mar 20 14:25:49 523879 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:25:49 580522 [42803960] -> SUBNET UP > >> Mar 20 14:26:04 718574 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000000 > >> Mar 20 14:26:04 718819 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:26:04 836781 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000001 > >> Mar 20 14:26:04 836881 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:26:04 858762 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000002 > >> Mar 20 14:26:04 860242 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:26:04 997451 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000003 > >> Mar 20 14:26:04 997647 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:26:05 180722 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000004 > >> Mar 20 14:26:05 180855 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:26:05 209122 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000005 > >> Mar 20 14:26:05 209200 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:26:05 347419 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000006 > >> Mar 20 14:26:05 347488 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:26:05 378670 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000007 > >> Mar 20 14:26:05 378739 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:26:05 409112 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:05 409121 [41401960] -> Removed port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:26:05 443639 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:26:05 483503 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000008 > >> Mar 20 14:26:05 486002 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:26:05 499183 [44606960] -> SUBNET UP > >> Mar 20 14:26:05 499856 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000009 > >> Mar 20 14:26:05 499941 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:26:05 521857 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000a > >> Mar 20 14:26:05 521971 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:26:05 532569 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000b > >> Mar 20 14:26:05 532624 [41401960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 11 times consecutively > >> Mar 20 14:26:05 633813 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000c > >> Mar 20 14:26:05 633869 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 12 times consecutively > >> Mar 20 14:26:05 655421 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000d > >> Mar 20 14:26:05 655501 [41401960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 13 times consecutively > >> Mar 20 14:26:05 702652 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000e > >> Mar 20 14:26:05 702745 [42803960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 14 times consecutively > >> Mar 20 14:26:05 817201 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:05 817216 [43204960] -> Removed port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:26:05 817235 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:05 817241 [43204960] -> Removed port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:26:05 817259 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:05 817264 [43204960] -> Removed port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:26:05 821322 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:05 821330 [41E02960] -> Discovered new port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:26:05 847950 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000f > >> Mar 20 14:26:05 848031 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 15 times consecutively > >> Mar 20 14:26:05 852036 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:26:05 893954 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000010 > >> Mar 20 14:26:05 894021 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 16 times consecutively > >> Mar 20 14:26:05 910489 [44606960] -> SUBNET UP > >> Mar 20 14:26:05 999993 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000011 > >> Mar 20 14:26:06 000039 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 17 times consecutively > >> Mar 20 14:26:06 021880 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000012 > >> Mar 20 14:26:06 021970 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 18 times consecutively > >> Mar 20 14:26:06 043912 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000013 > >> Mar 20 14:26:06 044001 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 19 times consecutively > >> Mar 20 14:26:06 052878 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000014 > >> Mar 20 14:26:06 052975 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 20 times consecutively > >> Mar 20 14:26:06 147560 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000015 > >> Mar 20 14:26:06 147616 [42803960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 21 times consecutively > >> Mar 20 14:26:06 158945 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000016 > >> Mar 20 14:26:06 158978 [41401960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 22 times consecutively > >> Mar 20 14:26:06 346046 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000017 > >> Mar 20 14:26:06 346106 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 23 times consecutively > >> Mar 20 14:26:06 405311 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000018 > >> Mar 20 14:26:06 405349 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 24 times consecutively > >> Mar 20 14:26:06 632882 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000019 > >> Mar 20 14:26:06 632923 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 25 times consecutively > >> Mar 20 14:26:06 634031 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000067 > >> Mar 20 14:26:06 634110 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:26:06 883831 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001a > >> Mar 20 14:26:06 883879 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 26 times consecutively > >> Mar 20 14:26:06 885475 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000068 > >> Mar 20 14:26:06 885560 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:26:06 982877 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001b > >> Mar 20 14:26:06 982926 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 27 times consecutively > >> Mar 20 14:26:06 992809 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000069 > >> Mar 20 14:26:06 992871 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:26:06 992909 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001c > >> Mar 20 14:26:06 992943 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 28 times consecutively > >> Mar 20 14:26:06 993058 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:06 993065 [41E02960] -> Discovered new port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:26:06 993069 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:06 993074 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:26:07 023890 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:26:07 085081 [41E02960] -> SUBNET UP > >> Mar 20 14:26:07 348105 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001d > >> Mar 20 14:26:07 348218 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 29 times consecutively > >> Mar 20 14:26:07 348958 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000006a > >> Mar 20 14:26:07 349041 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:26:07 540871 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000006b > >> Mar 20 14:26:07 540983 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:26:07 541063 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000007c > >> Mar 20 14:26:07 541131 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:26:07 585394 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000006c > >> Mar 20 14:26:07 585464 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:26:07 607406 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000006d > >> Mar 20 14:26:07 607486 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:26:07 850410 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000006e > >> Mar 20 14:26:07 850483 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:26:07 956365 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000007d > >> Mar 20 14:26:07 956404 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:07 956413 [42803960] -> Discovered new port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:26:07 987136 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:26:08 018887 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:26:08 032634 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:26:08 032679 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x813ce > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][12][4][5] > >> Return path: [0][9][14][D][5] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 05 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:26:08 032749 [41E02960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:26:08 032757 [41E02960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x5 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:26:08 032774 [41E02960] -> Capabilities Mask: > >> Mar 20 14:26:08 033119 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:26:08 033154 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x813cf > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][12][4][5] > >> Return path: [0][9][14][D][5] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 05 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:26:08 033202 [43C05960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:26:08 033213 [43C05960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x5 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:26:08 033231 [43C05960] -> Capabilities Mask: > >> Mar 20 14:26:08 038497 [45A08960] -> SUBNET UP > >> Mar 20 14:26:08 055480 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000007e > >> Mar 20 14:26:08 055587 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:26:08 372288 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000007f > >> Mar 20 14:26:08 376158 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:26:08 418607 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000080 > >> Mar 20 14:26:08 420668 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:26:08 420714 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:26:08 430046 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:26:08 430147 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x820fa > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][15][1][4] > >> Return path: [0][9][18][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:26:08 430236 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:26:08 430236 [43C05960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:26:08 430267 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x820fb > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][12][1][6] > >> Return path: [0][9][14][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:26:08 430262 [43C05960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:26:08 430286 [43C05960] -> Capabilities Mask: > >> Mar 20 14:26:08 430350 [43C05960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:26:08 430362 [43C05960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:26:08 430375 [43C05960] -> Capabilities Mask: > >> Mar 20 14:26:08 435317 [43C05960] -> SUBNET UP > >> Mar 20 14:26:08 583769 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001e > >> Mar 20 14:26:08 583903 [41401960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 30 times consecutively > >> Mar 20 14:26:08 854841 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:26:08 913273 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:26:08 913349 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x82e32 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x13 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][17][2][5] > >> Return path: [0][9][14][E][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:26:08 913415 [45A08960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:26:08 913432 [45A08960] -> PortInfo dump: > >> port number.............0x13 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:26:08 913449 [45A08960] -> Capabilities Mask: > >> Mar 20 14:26:08 913598 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:26:08 913676 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x82e33 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][17][2][5] > >> Return path: [0][9][14][E][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:26:08 913727 [43C05960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:26:08 913732 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:26:08 913734 [43C05960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:26:08 913752 [43C05960] -> Capabilities Mask: > >> Mar 20 14:26:08 913766 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x82e34 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][17][2][5] > >> Return path: [0][9][14][E][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:26:08 913828 [41E02960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:26:08 913833 [41E02960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:26:08 913848 [41E02960] -> Capabilities Mask: > >> Mar 20 14:26:08 918887 [41E02960] -> SUBNET UP > >> Mar 20 14:26:48 657517 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000006f > >> Mar 20 14:26:48 657779 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:26:48 658393 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000081 > >> Mar 20 14:26:48 658465 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:26:48 979610 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 979629 [41401960] -> Removed port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:26:48 979652 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 979660 [41401960] -> Removed port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:26:48 979682 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 979688 [41401960] -> Removed port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 20 14:26:48 979721 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 979727 [41401960] -> Removed port with > >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > >> Mar 20 14:26:48 979770 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 979782 [41401960] -> Removed port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:26:48 979799 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 979804 [41401960] -> Removed port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:26:48 979822 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 979827 [41401960] -> Removed port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 20 14:26:48 979845 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 979849 [41401960] -> Removed port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:26:48 980028 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 980033 [41401960] -> Removed port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 20 14:26:48 980061 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 980066 [41401960] -> Removed port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:26:48 980081 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 980087 [41401960] -> Removed port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 20 14:26:48 980102 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 980107 [41401960] -> Removed port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:26:48 980122 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 980127 [41401960] -> Removed port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 20 14:26:48 980143 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 980148 [41401960] -> Removed port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:26:48 980163 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 980239 [41401960] -> Removed port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:26:48 980256 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 980261 [41401960] -> Removed port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:26:48 980365 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:26:48 980371 [41401960] -> Removed port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 20 14:26:49 013365 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:26:49 065887 [43C05960] -> SUBNET UP > >> Mar 20 14:26:49 407010 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:26:49 459477 [44606960] -> SUBNET UP > >> Mar 20 14:27:42 754098 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000070 > >> Mar 20 14:27:42 754349 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:27:42 760115 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000071 > >> Mar 20 14:27:42 760178 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000082 > >> Mar 20 14:27:42 760236 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:27:42 760406 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:27:42 766931 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000083 > >> Mar 20 14:27:42 767049 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:27:43 085327 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085345 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 20 14:27:43 085349 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085355 [43C05960] -> Discovered new port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:27:43 085359 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085364 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:27:43 085368 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085373 [43C05960] -> Discovered new port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:27:43 085377 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085382 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > >> Mar 20 14:27:43 085386 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085390 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 20 14:27:43 085394 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085399 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:27:43 085403 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085407 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 20 14:27:43 085411 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085416 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:27:43 085420 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085425 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:27:43 085428 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085433 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:27:43 085437 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085442 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 20 14:27:43 085446 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085450 [43C05960] -> Discovered new port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 20 14:27:43 085454 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085459 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:27:43 085511 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085517 [43C05960] -> Discovered new port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 20 14:27:43 085521 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085526 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:27:43 085530 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:43 085534 [43C05960] -> Discovered new port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:27:43 116308 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:27:43 179935 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:27:43 179980 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x85669 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x13 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][16][1][4] > >> Return path: [0][9][13][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:27:43 180019 [41401960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:27:43 180037 [41401960] -> PortInfo dump: > >> port number.............0x13 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:27:43 180050 [41401960] -> Capabilities Mask: > >> Mar 20 14:27:43 180092 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:27:43 180137 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x8566a > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][16][1][4] > >> Return path: [0][9][13][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:27:43 180185 [44606960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:27:43 180189 [44606960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:27:43 180199 [44606960] -> Capabilities Mask: > >> Mar 20 14:27:43 180239 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:27:43 180263 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x8566b > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][16][1][4] > >> Return path: [0][9][13][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:27:43 180307 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:27:43 180319 [42803960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:27:43 180332 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x8566c > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][16][1][4] > >> Return path: [0][9][13][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:27:43 180336 [42803960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:27:43 180364 [42803960] -> Capabilities Mask: > >> Mar 20 14:27:43 180389 [42803960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:27:43 180410 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:27:43 180392 [42803960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:27:43 180415 [42803960] -> Capabilities Mask: > >> Mar 20 14:27:43 180436 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x8566d > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][2][5] > >> Return path: [0][9][18][E][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:27:43 180490 [41E02960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:27:43 180494 [41E02960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:27:43 180504 [41E02960] -> Capabilities Mask: > >> Mar 20 14:27:43 180536 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:27:43 180560 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x8566e > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][2][5] > >> Return path: [0][9][18][E][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:27:43 180606 [45007960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:27:43 180615 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:27:43 180634 [45007960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:27:43 180657 [45007960] -> Capabilities Mask: > >> Mar 20 14:27:43 180678 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x8566f > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][2][5] > >> Return path: [0][9][18][E][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:27:43 180769 [43C05960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:27:43 180775 [43C05960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:27:43 180789 [43C05960] -> Capabilities Mask: > >> Mar 20 14:27:43 186228 [43204960] -> SUBNET UP > >> Mar 20 14:27:43 557268 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:27:43 611082 [45A08960] -> SUBNET UP > >> Mar 20 14:27:58 852744 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000000 > >> Mar 20 14:27:58 852982 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:27:58 970772 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000001 > >> Mar 20 14:27:58 970864 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:27:58 992628 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000002 > >> Mar 20 14:27:58 992712 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:27:59 132331 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000003 > >> Mar 20 14:27:59 132484 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:27:59 314893 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000004 > >> Mar 20 14:27:59 315006 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:27:59 343241 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000005 > >> Mar 20 14:27:59 343320 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:27:59 481698 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000006 > >> Mar 20 14:27:59 481775 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:27:59 512746 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000007 > >> Mar 20 14:27:59 512853 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:27:59 548851 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:59 548861 [41E02960] -> Removed port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:27:59 583414 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:27:59 583817 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000008 > >> Mar 20 14:27:59 623971 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:27:59 626182 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000009 > >> Mar 20 14:27:59 626329 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:27:59 634080 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000a > >> Mar 20 14:27:59 634442 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:27:59 641962 [45A08960] -> SUBNET UP > >> Mar 20 14:27:59 656231 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000b > >> Mar 20 14:27:59 656307 [41401960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 11 times consecutively > >> Mar 20 14:27:59 689788 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000c > >> Mar 20 14:27:59 690249 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 12 times consecutively > >> Mar 20 14:27:59 758521 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000d > >> Mar 20 14:27:59 758646 [42803960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 13 times consecutively > >> Mar 20 14:27:59 970740 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000e > >> Mar 20 14:27:59 970812 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 14 times consecutively > >> Mar 20 14:27:59 985557 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:59 985577 [41E02960] -> Removed port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:27:59 985601 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:59 985615 [41E02960] -> Removed port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:27:59 985649 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:59 985656 [41E02960] -> Removed port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:27:59 989767 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:27:59 989787 [42803960] -> Discovered new port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:28:00 014445 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000f > >> Mar 20 14:28:00 014524 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 15 times consecutively > >> Mar 20 14:28:00 020896 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:28:00 086824 [43204960] -> SUBNET UP > >> Mar 20 14:28:00 124057 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000010 > >> Mar 20 14:28:00 124108 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 16 times consecutively > >> Mar 20 14:28:00 131596 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000011 > >> Mar 20 14:28:00 131643 [41401960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 17 times consecutively > >> Mar 20 14:28:00 412484 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000012 > >> Mar 20 14:28:00 412528 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 18 times consecutively > >> Mar 20 14:28:00 436877 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000013 > >> Mar 20 14:28:00 436921 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 19 times consecutively > >> Mar 20 14:28:00 458745 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000014 > >> Mar 20 14:28:00 458816 [42803960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 20 times consecutively > >> Mar 20 14:28:00 480551 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000015 > >> Mar 20 14:28:00 480599 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 21 times consecutively > >> Mar 20 14:28:00 695340 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000016 > >> Mar 20 14:28:00 695386 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 22 times consecutively > >> Mar 20 14:28:00 695726 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000072 > >> Mar 20 14:28:00 695886 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:28:00 719764 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000017 > >> Mar 20 14:28:00 719825 [41401960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 23 times consecutively > >> Mar 20 14:28:00 743680 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000018 > >> Mar 20 14:28:00 743775 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 24 times consecutively > >> Mar 20 14:28:00 763599 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000019 > >> Mar 20 14:28:00 763654 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 25 times consecutively > >> Mar 20 14:28:00 813393 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001a > >> Mar 20 14:28:00 813473 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 26 times consecutively > >> Mar 20 14:28:00 831287 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001b > >> Mar 20 14:28:00 831302 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000073 > >> Mar 20 14:28:00 831383 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 27 times consecutively > >> Mar 20 14:28:00 831424 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:28:00 841593 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001c > >> Mar 20 14:28:00 841644 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 28 times consecutively > >> Mar 20 14:28:01 050511 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:28:01 050529 [41E02960] -> Discovered new port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:28:01 050535 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:28:01 050542 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:28:01 050547 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:28:01 050554 [41E02960] -> Discovered new port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:28:01 081322 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:28:01 142873 [43204960] -> SUBNET UP > >> Mar 20 14:28:01 460275 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001d > >> Mar 20 14:28:01 460358 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 29 times consecutively > >> Mar 20 14:28:01 488474 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:28:01 538634 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:28:01 538712 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x898d1 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][6] > >> Return path: [0][9][18][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:28:01 538752 [42803960] -> osm_pi_rcv_process_set: ERR 0F10: > >> Received error status for SetResp() > >> Mar 20 14:28:01 538758 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:28:01 538767 [42803960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............DOWN > >> state_info2.............0x42 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:28:01 538795 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x898d2 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][6] > >> Return path: [0][9][18][D][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:28:01 538810 [42803960] -> Capabilities Mask: > >> Mar 20 14:28:01 538849 [42803960] -> osm_pi_rcv_process_set: ERR 0F10: > >> Received error status for SetResp() > >> Mar 20 14:28:01 538856 [42803960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............DOWN > >> state_info2.............0x42 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:28:01 538871 [42803960] -> Capabilities Mask: > >> Mar 20 14:28:01 539658 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:28:01 539696 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x5 > >> trans_id................0x898d3 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x11 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][15][1][4][17] > >> Return path: [0][9][18][D][1][16] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 16 02 03 02 > >> > >> 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:28:01 539778 [45A08960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:28:01 539784 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:28:01 539798 [45A08960] -> PortInfo dump: > >> port number.............0x11 > >> node_guid...............0x0005ad0000027c84 > >> port_guid...............0x0005ad0000027c84 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x16 > >> link_width_enabled......0x2 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............DOWN > >> state_info2.............0x42 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:28:01 539834 [45A08960] -> Capabilities Mask: > >> Mar 20 14:28:01 539844 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x5 > >> trans_id................0x898d4 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x12 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][15][1][4][17] > >> Return path: [0][9][18][D][1][16] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 16 02 03 02 > >> > >> 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:28:01 539903 [45007960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:28:01 539908 [45007960] -> PortInfo dump: > >> port number.............0x12 > >> node_guid...............0x0005ad0000027c84 > >> port_guid...............0x0005ad0000027c84 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x16 > >> link_width_enabled......0x2 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............DOWN > >> state_info2.............0x42 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:28:01 539924 [45007960] -> Capabilities Mask: > >> Mar 20 14:28:01 545091 [45007960] -> SUBNET UP > >> Mar 20 14:28:01 652647 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000084 > >> Mar 20 14:28:01 652864 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:28:01 879631 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000074 > >> Mar 20 14:28:01 880104 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:28:01 962839 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000075 > >> Mar 20 14:28:01 965155 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:28:02 006432 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000085 > >> Mar 20 14:28:02 030610 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:28:02 068999 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:28:02 081130 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:28:02 081198 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x8a604 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][4][4] > >> Return path: [0][9][18][D][4] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:28:02 081249 [43204960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:28:02 081257 [43204960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x4 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:28:02 081275 [43204960] -> Capabilities Mask: > >> Mar 20 14:28:02 081650 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:28:02 081713 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x8a605 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][4][4] > >> Return path: [0][9][18][D][4] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 04 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:28:02 081782 [43C05960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:28:02 081787 [43C05960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x4 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:28:02 081802 [43C05960] -> Capabilities Mask: > >> Mar 20 14:28:02 087435 [45A08960] -> SUBNET UP > >> Mar 20 14:28:02 170696 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000086 > >> Mar 20 14:28:02 170819 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:28:02 459228 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:28:02 500761 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000087 > >> Mar 20 14:28:02 500979 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:28:02 510190 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:28:02 510258 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x8b330 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][17][1][5] > >> Return path: [0][9][14][D][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:28:02 510366 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:28:02 510370 [45007960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:28:02 510384 [45007960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:28:02 510394 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x8b331 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][3][6] > >> Return path: [0][9][18][F][3] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 03 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:28:02 510398 [45007960] -> Capabilities Mask: > >> Mar 20 14:28:02 510481 [41401960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:28:02 510491 [41401960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x3 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:28:02 510509 [41401960] -> Capabilities Mask: > >> Mar 20 14:28:02 510511 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000088 > >> Mar 20 14:28:02 515576 [41401960] -> SUBNET UP > >> Mar 20 14:28:02 515695 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:28:02 532552 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000089 > >> Mar 20 14:28:02 538569 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:28:02 695997 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001e > >> Mar 20 14:28:02 696096 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 30 times consecutively > >> Mar 20 14:28:02 918226 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:28:02 975244 [43204960] -> SUBNET UP > >> Mar 20 14:28:03 325494 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:28:03 379145 [41401960] -> SUBNET UP > >> Mar 20 14:29:12 561841 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F > >> TID:0x0000000000000019 > >> Mar 20 14:29:12 562033 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001F > >> GID:0xfe80000000000000,0x0005ad0000027c56 > >> Mar 20 14:29:12 562751 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F > >> TID:0x000000000000001a > >> Mar 20 14:29:12 562902 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001F > >> GID:0xfe80000000000000,0x0005ad0000027c56 > >> Mar 20 14:29:12 571346 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0084 > >> TID:0x0000000000000018 > >> Mar 20 14:29:12 571569 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0084 > >> GID:0xfe80000000000000,0x0005ad0000027c70 > >> Mar 20 14:29:12 914371 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001F > >> TID:0x000000000000001b > >> Mar 20 14:29:12 916287 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:12 916297 [44606960] -> Discovered new port with > >> GUID:0x0005ad000002502f LID range [0x2,0x2] of node:Topspin IB-DC > >> Mar 20 14:29:12 946985 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:29:12 976839 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001F > >> GID:0xfe80000000000000,0x0005ad0000027c56 > >> Mar 20 14:29:12 987963 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:29:12 988004 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x5 > >> trans_id................0x8dbb2 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0xD > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][2][4][D] > >> Return path: [0][9][18][E][1][12] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 12 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 2C 4C 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:29:12 988078 [43C05960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:29:12 988089 [43C05960] -> PortInfo dump: > >> port number.............0xD > >> node_guid...............0x0005ad0000027c70 > >> port_guid...............0x0005ad0000027c70 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x12 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0x2C > >> vl_enforce..............0x4C > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:29:12 988105 [43C05960] -> Capabilities Mask: > >> Mar 20 14:29:12 993136 [44606960] -> SUBNET UP > >> Mar 20 14:29:13 300755 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x0002 > >> TID:0x0000000000000000 > >> Mar 20 14:29:13 300874 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:4 num:144 from LID:0x0002 > >> GID:0xfe80000000000000,0x0005ad000002502f > >> Mar 20 14:29:13 338077 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:13 338099 [41E02960] -> Discovered new port with > >> GUID:0x0005ad000002516f LID range [0xBA,0xBA] of node:Topspin IB-DC > >> Mar 20 14:29:13 368879 [41E02960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:29:13 431763 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:29:13 431806 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x5 > >> trans_id................0x8e8e9 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0xA > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][14][1][6][12] > >> Return path: [0][9][15][D][3][11] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 11 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 2C 4C 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:29:13 432093 [43204960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:29:13 432116 [43204960] -> PortInfo dump: > >> port number.............0xA > >> node_guid...............0x0005ad0000027c56 > >> port_guid...............0x0005ad0000027c56 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x11 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0x2C > >> vl_enforce..............0x4C > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:29:13 432135 [43204960] -> Capabilities Mask: > >> Mar 20 14:29:13 437219 [45007960] -> SUBNET UP > >> Mar 20 14:29:13 690992 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x00BA > >> TID:0x0000000000000000 > >> Mar 20 14:29:13 691128 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:4 num:144 from LID:0x00BA > >> GID:0xfe80000000000000,0x0005ad000002516f > >> Mar 20 14:29:13 835017 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:29:13 891082 [42803960] -> SUBNET UP > >> Mar 20 14:29:14 235714 [42803960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:29:14 289127 [41E02960] -> SUBNET UP > >> Mar 20 14:29:17 689267 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000008a > >> Mar 20 14:29:17 689511 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:29:17 689975 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000076 > >> Mar 20 14:29:17 690097 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:29:18 025237 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025255 [44606960] -> Removed port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:29:18 025273 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025279 [44606960] -> Removed port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:29:18 025296 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025300 [44606960] -> Removed port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 20 14:29:18 025317 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025323 [44606960] -> Removed port with > >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > >> Mar 20 14:29:18 025340 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025345 [44606960] -> Removed port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:29:18 025362 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025367 [44606960] -> Removed port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:29:18 025385 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025390 [44606960] -> Removed port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 20 14:29:18 025406 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025411 [44606960] -> Removed port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:29:18 025571 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025576 [44606960] -> Removed port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 20 14:29:18 025612 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025619 [44606960] -> Removed port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:29:18 025634 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025639 [44606960] -> Removed port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 20 14:29:18 025655 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025660 [44606960] -> Removed port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:29:18 025678 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025683 [44606960] -> Removed port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 20 14:29:18 025700 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025705 [44606960] -> Removed port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:29:18 025721 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025777 [44606960] -> Removed port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:29:18 025794 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025800 [44606960] -> Removed port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:29:18 025816 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:29:18 025821 [44606960] -> Removed port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 20 14:29:18 058968 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:29:18 112970 [43C05960] -> SUBNET UP > >> Mar 20 14:29:18 511156 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:29:18 573846 [41E02960] -> SUBNET UP > >> Mar 20 14:30:11 599965 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000077 > >> Mar 20 14:30:11 600182 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:30:11 606044 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000078 > >> Mar 20 14:30:11 606078 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000008b > >> Mar 20 14:30:11 606178 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:30:11 606207 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:30:11 612375 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000008c > >> Mar 20 14:30:11 612499 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:30:11 947057 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947074 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000027c84 LID range [0x152,0x152] of node:Topspin Switch TS120 > >> Mar 20 14:30:11 947079 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947084 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024b27 LID range [0xAF,0xAF] of node:saguaro-23-0 HCA-1 > >> Mar 20 14:30:11 947088 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947093 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024da7 LID range [0xB0,0xB0] of node:saguaro-23-1 HCA-1 > >> Mar 20 14:30:11 947097 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947102 [45007960] -> Discovered new port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:30:11 947106 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947118 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:30:11 947138 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947143 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:30:11 947148 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947153 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:30:11 947157 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947162 [45007960] -> Discovered new port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:30:11 947166 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947170 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:30:11 947174 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947179 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:30:11 947183 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947188 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024d6b LID range [0xB8,0xB8] of node:saguaro-23-9 HCA-1 > >> Mar 20 14:30:11 947191 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947196 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024afb LID range [0xA5,0xA5] of node:saguaro-22-0 HCA-1 > >> Mar 20 14:30:11 947200 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947205 [45007960] -> Discovered new port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 20 14:30:11 947209 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947214 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024c9b LID range [0xA7,0xA7] of node:saguaro-22-2 HCA-1 > >> Mar 20 14:30:11 947282 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947288 [45007960] -> Discovered new port with > >> GUID:0x0005ad000002498f LID range [0xA8,0xA8] of node:saguaro-22-3 HCA-1 > >> Mar 20 14:30:11 947291 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947296 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024977 LID range [0xA9,0xA9] of node:saguaro-22-4 HCA-1 > >> Mar 20 14:30:11 947300 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:11 947305 [45007960] -> Discovered new port with > >> GUID:0x0005ad0000024feb LID range [0x153,0x153] of node:saguaro-22-5 HCA-1 > >> Mar 20 14:30:11 978149 [45007960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:30:12 042474 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:30:12 042577 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x92b38 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x13 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][16][1][5] > >> Return path: [0][9][13][D][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:30:12 042668 [45007960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:30:12 042676 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:30:12 042682 [45007960] -> PortInfo dump: > >> port number.............0x13 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:30:12 042701 [45007960] -> Capabilities Mask: > >> Mar 20 14:30:12 042714 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x92b39 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][16][1][5] > >> Return path: [0][9][13][D][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:30:12 042845 [41401960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:30:12 042856 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:30:12 042851 [41401960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:30:12 042897 [41401960] -> Capabilities Mask: > >> Mar 20 14:30:12 042907 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x92b3a > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][16][1][5] > >> Return path: [0][9][13][D][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:30:12 043013 [43204960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:30:12 043015 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:30:12 043038 [43204960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:30:12 043090 [43204960] -> Capabilities Mask: > >> Mar 20 14:30:12 043094 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x92b3b > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][16][1][5] > >> Return path: [0][9][13][D][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:30:12 043173 [44606960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:30:12 043178 [44606960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:30:12 043191 [44606960] -> Capabilities Mask: > >> Mar 20 14:30:12 043222 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:30:12 043247 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x92b3c > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][12][1][4] > >> Return path: [0][9][14][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:30:12 043318 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:30:12 043314 [41E02960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:30:12 043357 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x92b3d > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][12][1][4] > >> Return path: [0][9][14][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:30:12 043367 [41E02960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:30:12 043422 [41E02960] -> Capabilities Mask: > >> Mar 20 14:30:12 043513 [43C05960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:30:12 043518 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:30:12 043519 [43C05960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:30:12 043535 [43C05960] -> Capabilities Mask: > >> Mar 20 14:30:12 043553 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x92b3e > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][12][1][4] > >> Return path: [0][9][14][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:30:12 043658 [42803960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:30:12 043663 [42803960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:30:12 043678 [42803960] -> Capabilities Mask: > >> Mar 20 14:30:12 049088 [43204960] -> SUBNET UP > >> Mar 20 14:30:12 442903 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:30:12 497312 [45007960] -> SUBNET UP > >> Mar 20 14:30:27 571421 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000000 > >> Mar 20 14:30:27 571674 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:30:27 782498 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000001 > >> Mar 20 14:30:27 782616 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:30:27 804302 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000002 > >> Mar 20 14:30:27 805103 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:30:27 924983 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000003 > >> Mar 20 14:30:27 925088 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:30:27 934314 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:27 934327 [43204960] -> Removed port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:30:27 969077 [43204960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:30:28 017451 [41E02960] -> SUBNET UP > >> Mar 20 14:30:28 030947 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000004 > >> Mar 20 14:30:28 031177 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:30:28 120040 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000005 > >> Mar 20 14:30:28 120190 [42803960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:30:28 148805 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000006 > >> Mar 20 14:30:28 149108 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:30:28 170453 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000007 > >> Mar 20 14:30:28 170971 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:30:28 336861 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:28 336884 [43C05960] -> Removed port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:30:28 336910 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:28 336916 [43C05960] -> Removed port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:30:28 336945 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:28 336951 [43C05960] -> Removed port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:30:28 371497 [43C05960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:30:28 410709 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000008 > >> Mar 20 14:30:28 410894 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:30:28 415926 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000009 > >> Mar 20 14:30:28 419624 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:30:28 426978 [45A08960] -> SUBNET UP > >> Mar 20 14:30:28 438003 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000a > >> Mar 20 14:30:28 438182 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0152 > >> GID:0xfe80000000000000,0x0005ad0000027c84 > >> Mar 20 14:30:28 470141 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000b > >> Mar 20 14:30:28 470197 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 11 times consecutively > >> Mar 20 14:30:28 652535 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000c > >> Mar 20 14:30:28 652623 [42803960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 12 times consecutively > >> Mar 20 14:30:28 681514 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000d > >> Mar 20 14:30:28 681636 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 13 times consecutively > >> Mar 20 14:30:28 703052 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000e > >> Mar 20 14:30:28 703092 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 14 times consecutively > >> Mar 20 14:30:28 724753 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000000f > >> Mar 20 14:30:28 724809 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 15 times consecutively > >> Mar 20 14:30:28 855519 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000010 > >> Mar 20 14:30:28 855671 [42803960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 16 times consecutively > >> Mar 20 14:30:28 877289 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000011 > >> Mar 20 14:30:28 877354 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 17 times consecutively > >> Mar 20 14:30:28 899021 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000012 > >> Mar 20 14:30:28 899062 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 18 times consecutively > >> Mar 20 14:30:29 006886 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000013 > >> Mar 20 14:30:29 006950 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 19 times consecutively > >> Mar 20 14:30:29 099965 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000014 > >> Mar 20 14:30:29 100020 [44606960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 20 times consecutively > >> Mar 20 14:30:29 146532 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000015 > >> Mar 20 14:30:29 146578 [41E02960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 21 times consecutively > >> Mar 20 14:30:29 356891 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000016 > >> Mar 20 14:30:29 356938 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 22 times consecutively > >> Mar 20 14:30:29 383112 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000017 > >> Mar 20 14:30:29 383157 [43204960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 23 times consecutively > >> Mar 20 14:30:29 383710 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x0000000000000079 > >> Mar 20 14:30:29 383790 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:30:29 407890 [42803960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000018 > >> Mar 20 14:30:29 407935 [42803960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 24 times consecutively > >> Mar 20 14:30:29 429653 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x0000000000000019 > >> Mar 20 14:30:29 429700 [45A08960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 25 times consecutively > >> Mar 20 14:30:29 451352 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001a > >> Mar 20 14:30:29 451401 [45007960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 26 times consecutively > >> Mar 20 14:30:29 479843 [4780B960] -> umad_receiver: ERR 5409: send completed > >> with error (method=0x1 attr=0x11 trans_id=0x124ef00095cbf) -- dropping > >> Mar 20 14:30:29 479855 [4780B960] -> umad_receiver: ERR 5411: DR SMP > >> Mar 20 14:30:29 479865 [4780B960] -> __osm_sm_mad_ctrl_send_err_cb: ERR > >> 3113: MAD completed in error (IB_TIMEOUT) > >> Mar 20 14:30:29 479901 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x1 (SubnGet) > >> D bit...................0x0 > >> status..................0x0 > >> hop_ptr.................0x0 > >> hop_count...............0x6 > >> trans_id................0x95cbf > >> attr_id.................0x11 (NodeInfo) > >> resv....................0x0 > >> attr_mod................0x0 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][5][17][C] > >> Return path: [0][0][0][0][0][0][0] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:30:29 480017 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:29 480030 [44606960] -> Removed port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:30:29 480092 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:29 480099 [44606960] -> Removed port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 20 14:30:29 480121 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:29 480128 [44606960] -> Removed port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:30:29 480152 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:65 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:29 480160 [44606960] -> Removed port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:30:29 480325 [44606960] -> osm_drop_mgr_process: ERR 0108: Unknown > >> remote side for node 0x0005ad0000027c84 port 12. Adding to light sweep > >> sampling list > >> Mar 20 14:30:29 480343 [44606960] -> Directed Path Dump of 5 hop path: > >> Path = [0][1][11][1][5][17] > >> Mar 20 14:30:29 665327 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000007a > >> Mar 20 14:30:29 665355 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001b > >> Mar 20 14:30:29 665397 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:30:29 665404 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 27 times consecutively > >> Mar 20 14:30:29 680658 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:29 680668 [45A08960] -> Discovered new port with > >> GUID:0x0005ad00000249d3 LID range [0xB1,0xB1] of node:saguaro-23-2 HCA-1 > >> Mar 20 14:30:29 680672 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:29 680678 [45A08960] -> Discovered new port with > >> GUID:0x0005ad0000024cbb LID range [0xB2,0xB2] of node:saguaro-23-3 HCA-1 > >> Mar 20 14:30:29 680681 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:29 680686 [45A08960] -> Discovered new port with > >> GUID:0x0005ad0000024e0b LID range [0xB3,0xB3] of node:saguaro-23-4 HCA-1 > >> Mar 20 14:30:29 680690 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:29 680695 [45A08960] -> Discovered new port with > >> GUID:0x0005ad0000025043 LID range [0xB4,0xB4] of node:saguaro-23-5 HCA-1 > >> Mar 20 14:30:29 711542 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:30:29 768280 [41401960] -> SUBNET UP > >> Mar 20 14:30:30 113195 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:30 113206 [45A08960] -> Discovered new port with > >> GUID:0x0005ad000002510b LID range [0xB5,0xB5] of node:saguaro-23-6 HCA-1 > >> Mar 20 14:30:30 113211 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:30 113216 [45A08960] -> Discovered new port with > >> GUID:0x0005ad0000024d47 LID range [0xB6,0xB6] of node:saguaro-23-7 HCA-1 > >> Mar 20 14:30:30 113220 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:30 113225 [45A08960] -> Discovered new port with > >> GUID:0x0005ad0000024d8b LID range [0xB7,0xB7] of node:saguaro-23-8 HCA-1 > >> Mar 20 14:30:30 113228 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:3 num:64 from LID:0x0092 > >> GID:0xfe80000000000000,0x0005ad0000024bbb > >> Mar 20 14:30:30 113233 [45A08960] -> Discovered new port with > >> GUID:0x0005ad000002511b LID range [0xA6,0xA6] of node:saguaro-22-1 HCA-1 > >> Mar 20 14:30:30 144149 [45A08960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:30:30 195765 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:30:30 195850 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x96dcd > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x16 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][14][2][4] > >> Return path: [0][9][15][E][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:30:30 195929 [43C05960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:30:30 195942 [43C05960] -> PortInfo dump: > >> port number.............0x16 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:30:30 195968 [43C05960] -> Capabilities Mask: > >> Mar 20 14:30:30 196144 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:30:30 196179 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x96dce > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x17 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][14][2][4] > >> Return path: [0][9][15][E][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:30:30 196248 [43204960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:30:30 196254 [43204960] -> PortInfo dump: > >> port number.............0x17 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:30:30 196269 [43204960] -> Capabilities Mask: > >> Mar 20 14:30:30 201633 [45007960] -> SUBNET UP > >> Mar 20 14:30:30 278051 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001c > >> Mar 20 14:30:30 278107 [43C05960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 28 times consecutively > >> Mar 20 14:30:30 278656 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000007b > >> Mar 20 14:30:30 278871 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:30:30 279653 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000008d > >> Mar 20 14:30:30 279765 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:30:30 568539 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000008e > >> Mar 20 14:30:30 568617 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:30:30 607916 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0148 > >> TID:0x000000000000007c > >> Mar 20 14:30:30 625139 [44606960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:30:30 663838 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x0148 > >> GID:0xfe80000000000000,0x0005ad00000281b3 > >> Mar 20 14:30:30 664569 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x000000000000008f > >> Mar 20 14:30:30 664747 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:30:30 679262 [45A08960] -> SUBNET UP > >> Mar 20 14:30:30 784024 [43204960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000090 > >> Mar 20 14:30:30 784123 [43204960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:30:30 804217 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000091 > >> Mar 20 14:30:30 807807 [41401960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:30:30 825500 [45A08960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000092 > >> Mar 20 14:30:30 825600 [45A08960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:30:30 988887 [43C05960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000093 > >> Mar 20 14:30:30 988978 [43C05960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:30:31 059298 [41401960] -> osm_ucast_mgr_process: Min Hop Tables > >> configured on all switches > >> Mar 20 14:30:31 106840 [41E02960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000094 > >> Mar 20 14:30:31 111335 [41E02960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:30:31 112465 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:30:31 112497 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x98837 > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][16][1][5] > >> Return path: [0][9][13][D][2] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 02 03 03 02 > >> > >> 11 42 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:30:31 112593 [44606960] -> osm_pi_rcv_process_set: ERR 0F10: > >> Received error status for SetResp() > >> Mar 20 14:30:31 112627 [44606960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281a7 > >> port_guid...............0x0005ad00000281a7 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x2 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............DOWN > >> state_info2.............0x42 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:30:31 112673 [44606960] -> Capabilities Mask: > >> Mar 20 14:30:31 113808 [4780B960] -> __osm_sm_mad_ctrl_rcv_callback: ERR > >> 3111: Error status = 0x1C00 > >> Mar 20 14:30:31 113838 [4780B960] -> SMP dump: > >> base_ver................0x1 > >> mgmt_class..............0x81 > >> class_ver...............0x1 > >> method..................0x81 (SubnGetResp) > >> D bit...................0x1 > >> status..................0x1C00 > >> hop_ptr.................0x0 > >> hop_count...............0x4 > >> trans_id................0x9883e > >> attr_id.................0x15 (PortInfo) > >> resv....................0x0 > >> attr_mod................0x18 > >> m_key...................0x0000000000000000 > >> dr_slid.................0xFFFF > >> dr_dlid.................0xFFFF > >> > >> Initial path: [0][1][11][1][4] > >> Return path: [0][9][18][D][1] > >> Reserved: [0][0][0][0][0][0][0] > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> > >> 00 00 00 00 00 00 00 00 00 00 00 00 01 03 03 02 > >> > >> 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 > >> > >> 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 > >> > >> Mar 20 14:30:31 113925 [43204960] -> osm_pi_rcv_process_set: Received error > >> status 0x1c for SetResp() during ACTIVE transition > >> Mar 20 14:30:31 113930 [43204960] -> PortInfo dump: > >> port number.............0x18 > >> node_guid...............0x0005ad00000281b3 > >> port_guid...............0x0005ad00000281b3 > >> m_key...................0x0000000000000000 > >> subnet_prefix...........0x0000000000000000 > >> base_lid................0x0 > >> master_sm_base_lid......0x0 > >> capability_mask.........0x0 > >> diag_code...............0x0 > >> m_key_lease_period......0x0 > >> local_port_num..........0x1 > >> link_width_enabled......0x3 > >> link_width_supported....0x3 > >> link_width_active.......0x2 > >> link_speed_supported....0x1 > >> port_state..............ACTIVE > >> state_info2.............0x52 > >> m_key_protect_bits......0x0 > >> lmc.....................0x0 > >> link_speed..............0x11 > >> mtu_smsl................0x40 > >> vl_cap_init_type........0x40 > >> vl_high_limit...........0x0 > >> vl_arb_high_cap.........0x8 > >> vl_arb_low_cap..........0x8 > >> init_rep_mtu_cap........0x4 > >> vl_stall_life...........0xF2 > >> vl_enforce..............0x40 > >> m_key_violations........0x0 > >> p_key_violations........0x0 > >> q_key_violations........0x0 > >> guid_cap................0x0 > >> client_reregister.......0x0 > >> subnet_timeout..........0x0 > >> resp_time_value.........0x0 > >> error_threshold.........0x88 > >> Mar 20 14:30:31 113946 [43204960] -> Capabilities Mask: > >> Mar 20 14:30:31 119007 [43204960] -> SUBNET UP > >> Mar 20 14:30:31 128758 [45007960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000095 > >> Mar 20 14:30:31 128851 [45007960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:30:31 150370 [44606960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x001B > >> TID:0x0000000000000096 > >> Mar 20 14:30:31 150468 [44606960] -> osm_report_notice: Reporting Generic > >> Notice type:1 num:128 from LID:0x001B > >> GID:0xfe80000000000000,0x0005ad00000281a7 > >> Mar 20 14:30:31 316422 [41401960] -> __osm_trap_rcv_process_request: > >> Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0152 > >> TID:0x000000000000001c > >> Mar 20 14:30:31 316498 [41401960] -> __osm_trap_rcv_process_request: ERR > >> 3804: Received trap 29 times consecutively > >> > >> _______________________________________________ > >> general mailing list > >> general at lists.openfabrics.org > >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >> > >> To unsubscribe, please visit > >> http://openib.org/mailman/listinfo/openib-general > > > From bugzilla-daemon at lists.openfabrics.org Fri Mar 23 11:34:47 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Fri, 23 Mar 2007 11:34:47 -0700 (PDT) Subject: [ofa-general] [Bug 488] New: user_mad GRH handling on receive side is incomplete Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=488 Summary: user_mad GRH handling on receive side is incomplete Product: OpenFabrics Linux Version: 1.2beta1 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P3 Component: IB Core AssignedTo: bugzilla at openib.org ReportedBy: halr at voltaire.com user_mad.c:recv_handler has the following code: if (packet->mad.hdr.grh_present) { /* XXX parse GRH */ packet->mad.hdr.gid_index = 0; packet->mad.hdr.hop_limit = 0; packet->mad.hdr.traffic_class = 0; memset(packet->mad.hdr.gid, 0, 16); packet->mad.hdr.flow_label = 0; } This prevents the GRH fields from being properly set for a received MAD which includes a GRH. Updated patch for this is in the ib_router branch of Sean's git tree. Can/should this be included in OFED 1.2 ? -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Fri Mar 23 11:35:33 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Fri, 23 Mar 2007 11:35:33 -0700 (PDT) Subject: [ofa-general] [Bug 488] user_mad GRH handling on receive side is incomplete In-Reply-To: Message-ID: <20070323183533.3ECF9E603B1@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=488 halr at voltaire.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sean.hefty at intel.com -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From halr at voltaire.com Fri Mar 23 12:36:53 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Mar 2007 14:36:53 -0500 Subject: [ofa-general] Re: Re: [PATCH] IB/umad: fix GRH handling In-Reply-To: <20070323152822.GH17532@mellanox.co.il> References: <000001c76b85$74adfb50$18fd070a@amr.corp.intel.com> <1174519319.17678.25309.camel@hal.voltaire.com> <1174658146.24305.148489.camel@hal.voltaire.com> <20070323152822.GH17532@mellanox.co.il> Message-ID: <1174678608.24305.170303.camel@hal.voltaire.com> On Fri, 2007-03-23 at 10:28, Michael S. Tsirkin wrote: > > Quoting Hal Rosenstock : > > Subject: Re: Re: [PATCH] IB/umad: fix GRH handling > > > > On Wed, 2007-03-21 at 18:22, Hal Rosenstock wrote: > > > On Wed, 2007-03-21 at 01:52, Sean Hefty wrote: > > > > >> Unfortunately, at least opensm cannot respond to SA queries issued from a > > > > >> remote subnet. I'm not sure how much work this would take to fix, or if > > > > >> other SAs have this issue. Hal briefly looked at the problems, > > > > > > > > > >FWIW, I'll be looking some more at these again. > > > > > > > > I think the following patch corrects the GRH handling issues in ib_umad. > > > > (Tested loading of ib_umad module only, and not against openSM.) > > > > > > It can't be tested against OpenSM right now. > > > > > > > If this looks right, > > > > > > It looks right to me. I'll need some time to take it out for a test > > > driver as some other issues need some work to exercise this. > > > > I exercised this and it works fine. The received GRH information is now > > seen on the receive side of user MADs. > > > > Can this be pushed for OFED 1.2 as well ? > > Overall, looks safe. > If you want the fix in OFED 1.2, file a bug in the bugzilla. OK; I filed bug #488 on this. > But - is this patch going into 2.6.21? Don't know when it would go upstream. > And if not, why does it have to be in OFED 1.2? Have to is a little strong. Nice to is more accurate. user_mad would then be seeded to be GRH capable which will allow for easier experimentation going forward. -- Hal From monty at lampreynetworks.com Fri Mar 23 13:55:01 2007 From: monty at lampreynetworks.com (John LaMontagne) Date: Fri, 23 Mar 2007 16:55:01 -0400 Subject: [ofa-general] OFA Interop Event #3 - Event Begins in 3 Weeks Message-ID: <007801c76d8d$8cf71d90$a6e558b0$@com> Dear OFA Member, The OpenFabrics Alliance is proud to announce: OFA Interoperability Event #3 April 16 - 20, 2007 University of New Hampshire (UNH) Interoperability Lab (IOL) Durham, NH. This event will provide an opportunity for participants to measure their products for Interoperability using the OpenFabrics Alliance Stack. For Plugfest event and registration information visit: http://www.iol.unh.edu/services/testing/ofa/events/Invitation_2007-04_OFA.ht ml For our planning purposes, we request that you submit your registration information as soon as possible. For registrations received after March 30th, a late fee may be imposed. We will be conducting the IBTA Plugfest #11 at the University of New Hampshire Interoperability Lab from April 12 - 13. This is an excellent opportunity for InfiniBand vendors to test their devices for inclusion on the IBTA Integrators' List. For event information, visit: http://www.infinibandta.org/members/April_2007_plugfest If you have any questions, please contact the Event Coordinator: John LaMontagne, Lamprey Networks, Inc. (monty at lampreynetwroks.com) Mark your calendar now for this event! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.heffner at evergrid.com Fri Mar 23 16:47:22 2007 From: mike.heffner at evergrid.com (Mike Heffner) Date: Fri, 23 Mar 2007 18:47:22 -0500 Subject: [ofa-general] completion vector Message-ID: <4604670A.4010905@evergrid.com> Can someone explain or point me to documentation on what the completion vector argument to ibv_create_cq() is for? Given that context->num_comp_vectors always appears to be one, does it mean you can only have a single CQ generating events on a comp_channel at any point in time? Also, how do you go about increasing context->num_comp_vectors? Thanks, Mike -- Mike Heffner EverGrid Software Blacksburg, VA USA Voice: (540) 443-3500 x603 From rdreier at cisco.com Fri Mar 23 15:54:37 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 23 Mar 2007 15:54:37 -0700 Subject: [ofa-general] completion vector In-Reply-To: <4604670A.4010905@evergrid.com> (Mike Heffner's message of "Fri, 23 Mar 2007 18:47:22 -0500") References: <4604670A.4010905@evergrid.com> Message-ID: Mike> Can someone explain or point me to documentation on what the Mike> completion vector argument to ibv_create_cq() is for? In the future, it will be used to support adapters that have multiple completion event vectors (via MSI-X or similar). Mike> Given that context->num_comp_vectors always appears to be Mike> one, does it mean you can only have a single CQ generating Mike> events on a comp_channel at any point in time? No, completion vectors and completion channels are orthogonal. Mike> Also, how do you go about increasing Mike> context->num_comp_vectors? You would need to update the kernel midlayer to handle multiple completion vectors, and then patch a HCA driver to support it too. - R. From rdreier at cisco.com Fri Mar 23 15:57:29 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 23 Mar 2007 15:57:29 -0700 Subject: [ofa-general] compilation problem on ofed_1_2 In-Reply-To: <60E9D8CA1AC31048A237499BD73FF9AD01BC03@W2K3MAILSV.gsi.de> (Linev Sergei's message of "Fri, 23 Mar 2007 16:08:15 +0100") References: <60E9D8CA1AC31048A237499BD73FF9AD01BC03@W2K3MAILSV.gsi.de> Message-ID: > Second, in file ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c, > failed definition SPIN_LOCK_UNLOCKED. Seems to be, "spinlock.h" include is > missed in this file. this is a problem with the vnic code that needs to be cleaned up -- SPIN_LOCK_UNLOCKED is not really supposed to be used in generic code. Either spin_lock_init() or DEFINE_SPINLOCK() should be used instead. Thanks for the report. From mike.heffner at evergrid.com Fri Mar 23 16:40:34 2007 From: mike.heffner at evergrid.com (Mike Heffner) Date: Fri, 23 Mar 2007 19:40:34 -0400 Subject: [ofa-general] completion vector In-Reply-To: References: <4604670A.4010905@evergrid.com> Message-ID: <46046572.9060700@evergrid.com> Roland Dreier wrote: > > Mike> Given that context->num_comp_vectors always appears to be > Mike> one, does it mean you can only have a single CQ generating > Mike> events on a comp_channel at any point in time? > > No, completion vectors and completion channels are orthogonal. > Thanks. So there should be no limit to how many CQ's you can create under a single completion channel and still receive events for any of the CQ's? Mike -- Mike Heffner EverGrid Software Blacksburg, VA USA Voice: (540) 443-3500 #603 From rdreier at cisco.com Fri Mar 23 16:45:29 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 23 Mar 2007 16:45:29 -0700 Subject: [ofa-general] completion vector In-Reply-To: <46046572.9060700@evergrid.com> (Mike Heffner's message of "Fri, 23 Mar 2007 19:40:34 -0400") References: <4604670A.4010905@evergrid.com> <46046572.9060700@evergrid.com> Message-ID: Mike> Thanks. So there should be no limit to how many CQ's you can Mike> create under a single completion channel and still receive Mike> events for any of the CQ's? Right, there is no limit other than the total number of CQs you can create (independent of completion channels). - R. From verifyingdepartment at yahoo.co.uk Sat Mar 24 00:43:14 2007 From: verifyingdepartment at yahoo.co.uk (Guido Mihan) Date: Sat, 24 Mar 2007 08:43:14 +0100 (CET) Subject: [ofa-general] FINAL NOTICE Message-ID: <20070324074314.6E3A010546EA@www3.tibit.de> Euromillion UK Lotto. 49/56 Featherstone Street LONDON EC 1Y 8SY GREAT BRITAIN Ref. N�: UK/007/05/12/EU. Batch. N�: GHT/2907/333/07. YOUR E-MAIL ADDRESS WON THE LOTTERY. We wish to congratulate you over your email success in our computer balloting sweepstake held on 24th March,2007. This is a millennium scientific computer game in which email addresses were used. It is a promotional program aimed at encouraging internet users; therefore you do not need to buy ticket to enter for it. Your email address attached to ticket star number (4-5) drew the EUROMILLION lucky numbers 3-19-26-49-50 which consequently won The draw in the Second category.You have been approve for the star prize of Pounds 700,000.00. (Seven Hundred Thousand,Pounds sterling). CONGRATULATIONS !!! You are advised to keep this winning very confidential until you receive your lump prize in your account or optional cheque issuance to you. This is a protective measure to avoid double claiming by people you may tell as we have had cases like this before, please send your Full Name,Home and Office Tel & Fax Number, Mobile Tel Number and your winning ticket number,reference numbers and amount won information for processing of your winning fund to our registered claim agent in addrres below: *************************************************************** Euromillion Trust Security Service Mr.Guido Mihan Address: 49/56 Feather stone Street United Kingdom E-mail:verifyingdepartment2007 at yahoo.co.uk Phone:+447031848249 or+447024026795 *************************************************************** Rememer, all winning must be claimed not later than 28th April, 2007. Please note, in order to avoid unnecessary delays and complications, remember to quote your reference number and batch number in all correspondence.Furthermore, should there be any change of address do inform our agent as soon as possible. Once again congratulations. Best regard, Mrs. Emily Simon, Lottery coordinator. The information transmitted is intended only for the person or entity to whom or which it is addressed. Unauthorised use, disclosure or copying is strictly prohibited. The sender accepts no liability for the improper transmission of this communication nor for any delay in its receipt *************************************************************** From vlad at lists.openfabrics.org Sat Mar 24 02:35:23 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sat, 24 Mar 2007 02:35:23 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070324-0200 daily build status Message-ID: <20070324093523.52D00E607F6@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.17 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.13 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.12 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.16 Passed on x86_64 with linux-2.6.17 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Failed: From mst at dev.mellanox.co.il Sat Mar 24 10:24:21 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sat, 24 Mar 2007 19:24:21 +0200 Subject: [ofa-general] Re: osm error messages In-Reply-To: <1174678032.24305.169706.camel@hal.voltaire.com> References: <1174678032.24305.169706.camel@hal.voltaire.com> Message-ID: <20070324172421.GI17532@mellanox.co.il> > Quoting Hal Rosenstock : > Subject: Re: osm error messages > > On Fri, 2007-03-23 at 12:41, Douglas Fuller wrote: > > On /21/07 2:53 PM, "Hal Rosenstock" halr at voltaire.cm> wrote: > > > > > On Wed, 200-03-21 at 13:29, Douglas Fulle wrote: > > >> I'm seeing some sporadic error activity from OpnSM (FED 1.1; osm.log>> below) that ay correlate with some ob failures -- I'mtrying to get to the > > >> bottom of this. > > >> > > >> efore seeing this, I isolatedand disabled with ibortstat what ppeared > > >> to be a ba intenal port n one of our core switches. That leads me to > > >> suspectI have a switchmisbehaving somwhere. > > >> > > >> ithout any other ntervention, things seem to check out (wth > > >> ibdiagnet/ibchecknet). An thought? Need any more nformatin? > > > > > > Is something bouncingyour subnet or was this just what ibporttte did > > > ? It could be if this was a coreswitch. > > > > Nothing should be. The same thing appears to happen onceevery couple days > > -- it is very difficult to correlate wth anything. > > And does it just go away ? Is some part of your subnet not accesible ? > > > > Also, you may have someSMAs which have gone nonresponsive to SMPs > > > (IB_TIMEOUs) but the links are up. I can't be surenot knowng what the > > > exact scenario was. If you do, you will like want to chase these and do > > > something abot them if you haven't already. > > > > Hmm, what could causethat? All my hosts are responsive whenever I check > > (though it hasn't been during one of these stors of activity). > > Are all your switches responsive ? What switches are you using ? > > -- Hal > > > > All the messages reltin to ACTIVE-> ACTIVE transition can be ignored. > > > > > > Also, it looks likesomething i removing characters n the log. > > > > Yeah, there are characters missing in the whole message. rious. > > > > Thans again, > > --Doug Could you guys stop sending same 10000 lines back and forth please? -- MST From tzachid at mellanox.co.il Sat Mar 24 12:18:36 2007 From: tzachid at mellanox.co.il (Tzachi Dar) Date: Sat, 24 Mar 2007 21:18:36 +0200 Subject: [ofa-general] RE: [openib-general] [ofw] [Fwd: Re: [Fwd: Re:winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] In-Reply-To: <20070323134229.GP20990@sashak.voltaire.com> References: <6C2C79E72C305246B504CBA17B5500C9EBB1F2@mtlexch01.mtl.com> <20070323134229.GP20990@sashak.voltaire.com> Message-ID: <6C2C79E72C305246B504CBA17B5500C9013410E2@mtlexch01.mtl.com> After some more checks, and given the limited time that I had for this, I'm afraid that I wasn't able to find a way to do that. Do you have any idea of how this can be done? Thanks Tzachi > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Friday, March 23, 2007 3:42 PM > To: Tzachi Dar > Cc: Hal Rosenstock; ofw at lists.openfabrics.org; Gilad Shainer; > OPENIB; Fab Tillier > Subject: Re: [openib-general] [ofw] [Fwd: Re: [Fwd: > Re:winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] > > Hi Tzahi, > > On 18:56 Wed 21 Feb , Tzachi Dar wrote: > > > > To be on the practical side, I have read the introduction > to pthreads > > in the past and from what I saw it was relatively easy to implement > > that on Win32. I want to look at the functions that were mentioned > > before in this thread and see if that is still the case. > > > > Let me get back to you on this at the beginning of next week. > > Any news here? Thanks. > > Sasha > From swise at opengridcomputing.com Sat Mar 24 18:24:36 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Sat, 24 Mar 2007 20:24:36 -0500 Subject: [ofa-general] [PATCH ofed_1_2] Fixes to the Chelsio iWARP drivers. Message-ID: <1174785876.24222.7.camel@stevo-laptop> Vlad, Please pull these fixes to the Chelsio drivers for ofed_1_2. They can be pulled from git://staging.openfabrics.org/~swise/ofed_1_2.git ofed_1_2 Thanks, Steve. Short log: ---------- commit a23b6c4285c19dce8e69e63ee07572d6a5fe8928 Author: Steve Wise Fix a resource leak in cxio_hal_init_ctrl_qp(). commit eda4707f97d4d6e8bf8fb8191fd58e77a208d650 Author: Steve Wise Handle build_phys_page_list() failure in iwch_reregister_phys_mem(). commit 4ffb1b3675ff8cc434f0b3dec4cb91be12eb16ef Author: Divy Le Ray Use tabs instead of white spaces for CHELSIO_T3 entry. commit 8de956ebc8671da1b24b012ccd8de52b668863d9 Author: Divy Le Ray T3B2 does not lose its pcie config space on reset. commit de12425264a5e68de0e04b1179d97c988d0e5b16 Author: Divy Le Ray Under rare conditions, the MAC might hang while generating a pause frame. commit af6df91716c7999c54cc5740b12625df33b61c36 Author: Divy Le Ray The driver attempts to upgrade the FW if the card has the wrong version. Full diffs: ----------- commit a23b6c4285c19dce8e69e63ee07572d6a5fe8928 Author: Steve Wise Fix a resource leak in cxio_hal_init_ctrl_qp(). diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c index 229edd5..ce05db5 100644 --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c @@ -519,9 +519,9 @@ static int cxio_hal_init_ctrl_qp(struct u64 sge_cmd, ctx0, ctx1; u64 base_addr; struct t3_modify_qp_wr *wqe; - struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_KERNEL); - + struct sk_buff *skb; + skb = alloc_skb(sizeof(*wqe), GFP_KERNEL); if (!skb) { PDBG("%s alloc_skb failed\n", __FUNCTION__); return -ENOMEM; @@ -529,7 +529,7 @@ static int cxio_hal_init_ctrl_qp(struct err = cxio_hal_init_ctrl_cq(rdev_p); if (err) { PDBG("%s err %d initializing ctrl_cq\n", __FUNCTION__, err); - return err; + goto err; } rdev_p->ctrl_qp.workq = dma_alloc_coherent( &(rdev_p->rnic_info.pdev->dev), @@ -539,7 +539,8 @@ static int cxio_hal_init_ctrl_qp(struct GFP_KERNEL); if (!rdev_p->ctrl_qp.workq) { PDBG("%s dma_alloc_coherent failed\n", __FUNCTION__); - return -ENOMEM; + err = -ENOMEM; + goto err; } pci_unmap_addr_set(&rdev_p->ctrl_qp, mapping, rdev_p->ctrl_qp.dma_addr); @@ -577,6 +578,9 @@ static int cxio_hal_init_ctrl_qp(struct 1 << T3_CTRL_QP_SIZE_LOG2); skb->priority = CPL_PRIORITY_CONTROL; return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb)); +err: + kfree_skb(skb); + return err; } static int cxio_hal_destroy_ctrl_qp(struct cxio_rdev *rdev_p) commit eda4707f97d4d6e8bf8fb8191fd58e77a208d650 Author: Steve Wise Handle build_phys_page_list() failure in iwch_reregister_phys_mem(). diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 1388687..0c0ee20 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -528,11 +528,14 @@ static int iwch_reregister_phys_mem(stru php = to_iwch_pd(pd); if (mr_rereg_mask & IB_MR_REREG_ACCESS) mh.attr.perms = iwch_ib_to_tpt_access(acc); - if (mr_rereg_mask & IB_MR_REREG_TRANS) + if (mr_rereg_mask & IB_MR_REREG_TRANS) { ret = build_phys_page_list(buffer_list, num_phys_buf, iova_start, &total_size, &npages, &shift, &page_list); + if (ret) + return ret; + } ret = iwch_reregister_mem(rhp, php, &mh, shift, page_list, npages); kfree(page_list); commit 4ffb1b3675ff8cc434f0b3dec4cb91be12eb16ef Author: Divy Le Ray Use tabs instead of white spaces for CHELSIO_T3 entry. diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index ba74008..dece276 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -2393,23 +2393,23 @@ config CHELSIO_T1_NAPI when the driver is receiving lots of packets from the card. config CHELSIO_T3 - tristate "Chelsio Communications T3 10Gb Ethernet support" - depends on PCI + tristate "Chelsio Communications T3 10Gb Ethernet support" + depends on PCI select FW_LOADER - help - This driver supports Chelsio T3-based gigabit and 10Gb Ethernet - adapters. + help + This driver supports Chelsio T3-based gigabit and 10Gb Ethernet + adapters. - For general information about Chelsio and our products, visit - our website at . + For general information about Chelsio and our products, visit + our website at . - For customer support, please visit our customer support page at - . + For customer support, please visit our customer support page at + . - Please send feedback to . + Please send feedback to . - To compile this driver as a module, choose M here: the module - will be called cxgb3. + To compile this driver as a module, choose M here: the module + will be called cxgb3. config EHEA tristate "eHEA Ethernet support" commit 8de956ebc8671da1b24b012ccd8de52b668863d9 Author: Divy Le Ray T3B2 does not lose its pcie config space on reset. diff --git a/drivers/net/cxgb3/t3_hw.c b/drivers/net/cxgb3/t3_hw.c index 983ee81..791ed6d 100644 --- a/drivers/net/cxgb3/t3_hw.c +++ b/drivers/net/cxgb3/t3_hw.c @@ -3244,15 +3244,17 @@ void early_hw_init(struct adapter *adapt } /* - * Reset the adapter. PCIe cards lose their config space during reset, PCI-X + * Reset the adapter. + * Older PCIe cards lose their config space during reset, PCI-X * ones don't. */ int t3_reset_adapter(struct adapter *adapter) { - int i; + int i, save_and_restore_pcie = + adapter->params.rev < T3_REV_B2 && is_pcie(adapter); uint16_t devid = 0; - if (is_pcie(adapter)) + if (save_and_restore_pcie) pci_save_state(adapter->pdev); t3_write_reg(adapter, A_PL_RST, F_CRSTWRM | F_CRSTWRMMODE); @@ -3270,7 +3272,7 @@ int t3_reset_adapter(struct adapter *ada if (devid != 0x1425) return -1; - if (is_pcie(adapter)) + if (save_and_restore_pcie) pci_restore_state(adapter->pdev); return 0; } commit de12425264a5e68de0e04b1179d97c988d0e5b16 Author: Divy Le Ray Under rare conditions, the MAC might hang while generating a pause frame. diff --git a/drivers/net/cxgb3/common.h b/drivers/net/cxgb3/common.h old mode 100755 new mode 100644 index e23deeb..85e5543 --- a/drivers/net/cxgb3/common.h +++ b/drivers/net/cxgb3/common.h @@ -260,6 +260,10 @@ struct mac_stats { unsigned long serdes_signal_loss; unsigned long xaui_pcs_ctc_err; unsigned long xaui_pcs_align_change; + + unsigned long num_toggled; /* # times toggled TxEn due to stuck TX */ + unsigned long num_resets; /* # times reset due to stuck TX */ + }; struct tp_mib_stats { @@ -400,6 +404,12 @@ struct adapter_params { unsigned int rev; /* chip revision */ }; +enum { /* chip revisions */ + T3_REV_A = 0, + T3_REV_B = 2, + T3_REV_B2 = 3, +}; + struct trace_params { u32 sip; u32 sip_mask; @@ -465,6 +475,10 @@ struct cmac { struct adapter *adapter; unsigned int offset; unsigned int nucast; /* # of address filters for unicast MACs */ + unsigned int tcnt; + unsigned int xcnt; + unsigned int toggle_cnt; + unsigned int txen; struct mac_stats stats; }; @@ -666,6 +680,7 @@ int t3_mac_set_address(struct cmac *mac, int t3_mac_set_num_ucast(struct cmac *mac, int n); const struct mac_stats *t3_mac_update_stats(struct cmac *mac); int t3_mac_set_speed_duplex_fc(struct cmac *mac, int speed, int duplex, int fc); +int t3b2_mac_watchdog_task(struct cmac *mac); void t3_mc5_prep(struct adapter *adapter, struct mc5 *mc5, int mode); int t3_mc5_init(struct mc5 *mc5, unsigned int nservers, unsigned int nfilters, diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c index c88f3a7..1c31d19 100644 --- a/drivers/net/cxgb3/cxgb3_main.c +++ b/drivers/net/cxgb3/cxgb3_main.c @@ -1051,7 +1051,11 @@ static char stats_strings[][ETH_GSTRING_ "VLANinsertions ", "TxCsumOffload ", "RxCsumGood ", - "RxDrops " + "RxDrops ", + + "CheckTXEnToggled ", + "CheckResets ", + }; static int get_stats_count(struct net_device *dev) @@ -1165,6 +1169,9 @@ static void get_stats(struct net_device *data++ = collect_sge_port_stats(adapter, pi, SGE_PSTAT_TX_CSUM); *data++ = collect_sge_port_stats(adapter, pi, SGE_PSTAT_RX_CSUM_GOOD); *data++ = s->rx_cong_drops; + + *data++ = s->num_toggled; + *data++ = s->num_resets; } static inline void reg_block_dump(struct adapter *ap, void *buf, @@ -2090,6 +2097,40 @@ static void check_link_status(struct ada } } +static void check_t3b2_mac(struct adapter *adapter) +{ + int i; + + rtnl_lock(); /* synchronize with ifdown */ + for_each_port(adapter, i) { + struct net_device *dev = adapter->port[i]; + struct port_info *p = netdev_priv(dev); + int status; + + if (!netif_running(dev)) + continue; + + status = 0; + if (netif_running(dev)) + status = t3b2_mac_watchdog_task(&p->mac); + if (status == 1) + p->mac.stats.num_toggled++; + else if (status == 2) { + struct cmac *mac = &p->mac; + + t3_mac_set_mtu(mac, dev->mtu); + t3_mac_set_address(mac, 0, dev->dev_addr); + cxgb_set_rxmode(dev); + t3_link_start(&p->phy, mac, &p->link_config); + t3_mac_enable(mac, MAC_DIRECTION_RX | MAC_DIRECTION_TX); + t3_port_intr_enable(adapter, p->port_id); + p->mac.stats.num_resets++; + } + } + rtnl_unlock(); +} + + static void t3_adap_check_task(struct work_struct *work) { struct adapter *adapter = container_of(work, struct adapter, @@ -2110,6 +2151,9 @@ static void t3_adap_check_task(struct wo adapter->check_task_cnt = 0; } + if (p->rev == T3_REV_B2) + check_t3b2_mac(adapter); + /* Schedule the next check update if any port is active. */ spin_lock(&adapter->work_lock); if (adapter->open_device_map & PORT_MASK) diff --git a/drivers/net/cxgb3/regs.h b/drivers/net/cxgb3/regs.h old mode 100755 new mode 100644 index b56c5f5..b38629a --- a/drivers/net/cxgb3/regs.h +++ b/drivers/net/cxgb3/regs.h @@ -1206,6 +1206,14 @@ #define A_TP_TX_TRC_KEY0 0x20 #define A_TP_RX_TRC_KEY0 0x120 +#define A_TP_TX_DROP_CNT_CH0 0x12d + +#define S_TXDROPCNTCH0RCVD 0 +#define M_TXDROPCNTCH0RCVD 0xffff +#define V_TXDROPCNTCH0RCVD(x) ((x) << S_TXDROPCNTCH0RCVD) +#define G_TXDROPCNTCH0RCVD(x) (((x) >> S_TXDROPCNTCH0RCVD) & \ + M_TXDROPCNTCH0RCVD) + #define A_ULPRX_CTL 0x500 #define S_ROUND_ROBIN 4 @@ -1834,6 +1842,8 @@ #define S_TXPAUSEEN 0 #define V_TXPAUSEEN(x) ((x) << S_TXPAUSEEN) #define F_TXPAUSEEN V_TXPAUSEEN(1U) +#define A_XGM_TX_PAUSE_QUANTA 0x808 + #define A_XGM_RX_CTRL 0x80c #define S_RXEN 0 @@ -1920,6 +1930,11 @@ #define F_DISERRFRAMES V_DISERRFRAMES #define A_XGM_TXFIFO_CFG 0x888 +#define S_TXIPG 13 +#define M_TXIPG 0xff +#define V_TXIPG(x) ((x) << S_TXIPG) +#define G_TXIPG(x) (((x) >> S_TXIPG) & M_TXIPG) + #define S_TXFIFOTHRESH 4 #define M_TXFIFOTHRESH 0x1ff @@ -2190,6 +2205,13 @@ #define F_CMULOCK V_CMULOCK(1U) #define A_XGM_RX_MAX_PKT_SIZE_ERR_CNT 0x9a4 +#define A_XGM_TX_SPI4_SOP_EOP_CNT 0x9a8 + +#define S_TXSPI4SOPCNT 16 +#define M_TXSPI4SOPCNT 0xffff +#define V_TXSPI4SOPCNT(x) ((x) << S_TXSPI4SOPCNT) +#define G_TXSPI4SOPCNT(x) (((x) >> S_TXSPI4SOPCNT) & M_TXSPI4SOPCNT) + #define A_XGM_RX_SPI4_SOP_EOP_CNT 0x9ac #define XGMAC0_1_BASE_ADDR 0xa00 diff --git a/drivers/net/cxgb3/xgmac.c b/drivers/net/cxgb3/xgmac.c old mode 100755 new mode 100644 index 907a272..2b42c13 --- a/drivers/net/cxgb3/xgmac.c +++ b/drivers/net/cxgb3/xgmac.c @@ -124,9 +124,6 @@ int t3_mac_reset(struct cmac *mac) xaui_serdes_reset(mac); } - if (adap->params.rev > 0) - t3_write_reg(adap, A_XGM_PAUSE_TIMER + oft, 0xf000); - val = F_MAC_RESET_; if (is_10G(adap)) val |= F_PCS_RESET_; @@ -145,6 +142,58 @@ int t3_mac_reset(struct cmac *mac) return 0; } +int t3b2_mac_reset(struct cmac *mac) +{ + struct adapter *adap = mac->adapter; + unsigned int oft = mac->offset; + u32 val; + + if (!macidx(mac)) + t3_set_reg_field(adap, A_MPS_CFG, F_PORT0ACTIVE, 0); + else + t3_set_reg_field(adap, A_MPS_CFG, F_PORT1ACTIVE, 0); + + t3_write_reg(adap, A_XGM_RESET_CTRL + oft, F_MAC_RESET_); + t3_read_reg(adap, A_XGM_RESET_CTRL + oft); /* flush */ + + msleep(10); + + /* Check for xgm Rx fifo empty */ + if (t3_wait_op_done(adap, A_XGM_RX_MAX_PKT_SIZE_ERR_CNT + oft, + 0x80000000, 1, 5, 2)) { + CH_ERR(adap, "MAC %d Rx fifo drain failed\n", + macidx(mac)); + return -1; + } + + t3_write_reg(adap, A_XGM_RESET_CTRL + oft, 0); + t3_read_reg(adap, A_XGM_RESET_CTRL + oft); /* flush */ + + val = F_MAC_RESET_; + if (is_10G(adap)) + val |= F_PCS_RESET_; + else if (uses_xaui(adap)) + val |= F_PCS_RESET_ | F_XG2G_RESET_; + else + val |= F_RGMII_RESET_ | F_XG2G_RESET_; + t3_write_reg(adap, A_XGM_RESET_CTRL + oft, val); + t3_read_reg(adap, A_XGM_RESET_CTRL + oft); /* flush */ + if ((val & F_PCS_RESET_) && adap->params.rev) { + msleep(1); + t3b_pcs_reset(mac); + } + t3_write_reg(adap, A_XGM_RX_CFG + oft, + F_DISPAUSEFRAMES | F_EN1536BFRAMES | + F_RMFCS | F_ENJUMBO | F_ENHASHMCAST); + + if (!macidx(mac)) + t3_set_reg_field(adap, A_MPS_CFG, 0, F_PORT0ACTIVE); + else + t3_set_reg_field(adap, A_MPS_CFG, 0, F_PORT1ACTIVE); + + return 0; +} + /* * Set the exact match register 'idx' to recognize the given Ethernet address. */ @@ -251,9 +300,11 @@ int t3_mac_set_mtu(struct cmac *mac, uns * Adjust the PAUSE frame watermarks. We always set the LWM, and the * HWM only if flow-control is enabled. */ - hwm = max(MAC_RXFIFO_SIZE - 3 * mtu, MAC_RXFIFO_SIZE / 2U); - hwm = min(hwm, 3 * MAC_RXFIFO_SIZE / 4 + 1024); - lwm = hwm - 1024; + hwm = max_t(unsigned int, MAC_RXFIFO_SIZE - 3 * mtu, + MAC_RXFIFO_SIZE * 38 / 100); + hwm = min(hwm, MAC_RXFIFO_SIZE - 8192); + lwm = min(3 * (int)mtu, MAC_RXFIFO_SIZE / 4); + v = t3_read_reg(adap, A_XGM_RXFIFO_CFG + mac->offset); v &= ~V_RXFIFOPAUSELWM(M_RXFIFOPAUSELWM); v |= V_RXFIFOPAUSELWM(lwm / 8); @@ -270,7 +321,15 @@ int t3_mac_set_mtu(struct cmac *mac, uns thres = mtu > thres ? (mtu - thres + 7) / 8 : 0; thres = max(thres, 8U); /* need at least 8 */ t3_set_reg_field(adap, A_XGM_TXFIFO_CFG + mac->offset, - V_TXFIFOTHRESH(M_TXFIFOTHRESH), V_TXFIFOTHRESH(thres)); + V_TXFIFOTHRESH(M_TXFIFOTHRESH) | V_TXIPG(M_TXIPG), + V_TXFIFOTHRESH(thres) | V_TXIPG(1)); + + if (adap->params.rev > 0) + t3_write_reg(adap, A_XGM_PAUSE_TIMER + mac->offset, + (hwm - lwm) * 4 / 8); + t3_write_reg(adap, A_XGM_TX_PAUSE_QUANTA + mac->offset, + MAC_RXFIFO_SIZE * 4 * 8 / 512); + return 0; } @@ -298,12 +357,6 @@ int t3_mac_set_speed_duplex_fc(struct cm V_PORTSPEED(M_PORTSPEED), val); } - val = t3_read_reg(adap, A_XGM_RXFIFO_CFG + oft); - val &= ~V_RXFIFOPAUSEHWM(M_RXFIFOPAUSEHWM); - if (fc & PAUSE_TX) - val |= V_RXFIFOPAUSEHWM(G_RXFIFOPAUSELWM(val) + 128); /* +1KB */ - t3_write_reg(adap, A_XGM_RXFIFO_CFG + oft, val); - t3_set_reg_field(adap, A_XGM_TX_CFG + oft, F_TXPAUSEEN, (fc & PAUSE_RX) ? F_TXPAUSEEN : 0); return 0; @@ -318,9 +371,17 @@ int t3_mac_enable(struct cmac *mac, int if (which & MAC_DIRECTION_TX) { t3_write_reg(adap, A_XGM_TX_CTRL + oft, F_TXEN); t3_write_reg(adap, A_TP_PIO_ADDR, A_TP_TX_DROP_CFG_CH0 + idx); - t3_write_reg(adap, A_TP_PIO_DATA, 0xbf000001); + t3_write_reg(adap, A_TP_PIO_DATA, 0xc0ede401); t3_write_reg(adap, A_TP_PIO_ADDR, A_TP_TX_DROP_MODE); t3_set_reg_field(adap, A_TP_PIO_DATA, 1 << idx, 1 << idx); + + t3_write_reg(adap, A_TP_PIO_ADDR, A_TP_TX_DROP_CNT_CH0 + idx); + mac->tcnt = (G_TXDROPCNTCH0RCVD(t3_read_reg(adap, + A_TP_PIO_DATA))); + mac->xcnt = (G_TXSPI4SOPCNT(t3_read_reg(adap, + A_XGM_TX_SPI4_SOP_EOP_CNT))); + mac->txen = F_TXEN; + mac->toggle_cnt = 0; } if (which & MAC_DIRECTION_RX) t3_write_reg(adap, A_XGM_RX_CTRL + oft, F_RXEN); @@ -337,13 +398,50 @@ int t3_mac_disable(struct cmac *mac, int t3_write_reg(adap, A_TP_PIO_ADDR, A_TP_TX_DROP_CFG_CH0 + idx); t3_write_reg(adap, A_TP_PIO_DATA, 0xc000001f); t3_write_reg(adap, A_TP_PIO_ADDR, A_TP_TX_DROP_MODE); - t3_set_reg_field(adap, A_TP_PIO_DATA, 1 << idx, 0); + t3_set_reg_field(adap, A_TP_PIO_DATA, 1 << idx, 1 << idx); + mac->txen = 0; } if (which & MAC_DIRECTION_RX) t3_write_reg(adap, A_XGM_RX_CTRL + mac->offset, 0); return 0; } +int t3b2_mac_watchdog_task(struct cmac *mac) +{ + struct adapter *adap = mac->adapter; + unsigned int tcnt, xcnt; + int status; + + t3_write_reg(adap, A_TP_PIO_ADDR, A_TP_TX_DROP_CNT_CH0 + macidx(mac)); + tcnt = (G_TXDROPCNTCH0RCVD(t3_read_reg(adap, A_TP_PIO_DATA))); + xcnt = (G_TXSPI4SOPCNT(t3_read_reg(adap, + A_XGM_TX_SPI4_SOP_EOP_CNT + + mac->offset))); + + if (tcnt != mac->tcnt && xcnt == 0 && mac->xcnt == 0) { + if (mac->toggle_cnt > 4) { + t3b2_mac_reset(mac); + mac->toggle_cnt = 0; + status = 2; + } else { + t3_write_reg(adap, A_XGM_TX_CTRL + mac->offset, 0); + t3_read_reg(adap, A_XGM_TX_CTRL + mac->offset); + t3_write_reg(adap, A_XGM_TX_CTRL + mac->offset, + mac->txen); + t3_read_reg(adap, A_XGM_TX_CTRL + mac->offset); + mac->toggle_cnt++; + status = 1; + } + } else { + mac->toggle_cnt = 0; + status = 0; + } + mac->tcnt = tcnt; + mac->xcnt = xcnt; + + return status; +} + /* * This function is called periodically to accumulate the current values of the * RMON counters into the port statistics. Since the packet counters are only @@ -375,6 +473,11 @@ #define RMON_UPDATE64(mac, name, reg_lo, RMON_UPDATE(mac, rx_too_long, RX_OVERSIZE_FRAMES); mac->stats.rx_too_long += RMON_READ(mac, A_XGM_RX_MAX_PKT_SIZE_ERR_CNT); + v = RMON_READ(mac, A_XGM_RX_MAX_PKT_SIZE_ERR_CNT); + if (mac->adapter->params.rev == T3_REV_B2) + v &= 0x7fffffff; + mac->stats.rx_too_long += v; + RMON_UPDATE(mac, rx_frames_64, RX_64B_FRAMES); RMON_UPDATE(mac, rx_frames_65_127, RX_65_127B_FRAMES); RMON_UPDATE(mac, rx_frames_128_255, RX_128_255B_FRAMES); commit af6df91716c7999c54cc5740b12625df33b61c36 Author: Divy Le Ray The driver attempts to upgrade the FW if the card has the wrong version. diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index f8742f1..ba74008 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -2395,6 +2395,7 @@ config CHELSIO_T1_NAPI config CHELSIO_T3 tristate "Chelsio Communications T3 10Gb Ethernet support" depends on PCI + select FW_LOADER help This driver supports Chelsio T3-based gigabit and 10Gb Ethernet adapters. diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c index 9ec1ea3..c88f3a7 100644 --- a/drivers/net/cxgb3/cxgb3_main.c +++ b/drivers/net/cxgb3/cxgb3_main.c @@ -42,6 +42,7 @@ #include #include #include #include +#include #include #include "common.h" @@ -703,6 +704,28 @@ static void bind_qsets(struct adapter *a } } +#define FW_FNAME "t3fw-%d.%d.bin" + +static int upgrade_fw(struct adapter *adap) +{ + int ret; + char buf[64]; + const struct firmware *fw; + struct device *dev = &adap->pdev->dev; + + snprintf(buf, sizeof(buf), FW_FNAME, FW_VERSION_MAJOR, + FW_VERSION_MINOR); + ret = request_firmware(&fw, buf, dev); + if (ret < 0) { + dev_err(dev, "could not upgrade firmware: unable to load %s\n", + buf); + return ret; + } + ret = t3_load_fw(adap, fw->data, fw->size); + release_firmware(fw); + return ret; +} + /** * cxgb_up - enable the adapter * @adapter: adapter being enabled @@ -719,6 +742,8 @@ static int cxgb_up(struct adapter *adap) if (!(adap->flags & FULL_INIT_DONE)) { err = t3_check_fw_version(adap); + if (err == -EINVAL) + err = upgrade_fw(adap); if (err) goto out; diff --git a/drivers/net/cxgb3/t3_hw.c b/drivers/net/cxgb3/t3_hw.c index eaa7a2e..983ee81 100644 --- a/drivers/net/cxgb3/t3_hw.c +++ b/drivers/net/cxgb3/t3_hw.c @@ -681,7 +681,8 @@ enum { SF_ERASE_SECTOR = 0xd8, /* erase sector */ FW_FLASH_BOOT_ADDR = 0x70000, /* start address of FW in flash */ - FW_VERS_ADDR = 0x77ffc /* flash address holding FW version */ + FW_VERS_ADDR = 0x77ffc, /* flash address holding FW version */ + FW_MIN_SIZE = 8 /* at least version and csum */ }; /** @@ -935,7 +936,7 @@ int t3_load_fw(struct adapter *adapter, const u32 *p = (const u32 *)fw_data; int ret, addr, fw_sector = FW_FLASH_BOOT_ADDR >> 16; - if (size & 3) + if ((size & 3) || size < FW_MIN_SIZE) return -EINVAL; if (size > FW_VERS_ADDR + 8 - FW_FLASH_BOOT_ADDR) return -EFBIG; From sashak at voltaire.com Sun Mar 25 00:38:18 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 25 Mar 2007 09:38:18 +0200 Subject: [ofa-general] Re: [openib-general] [ofw] [Fwd: Re: [Fwd: Re:winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9013410E2@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C9EBB1F2@mtlexch01.mtl.com> <20070323134229.GP20990@sashak.voltaire.com> <6C2C79E72C305246B504CBA17B5500C9013410E2@mtlexch01.mtl.com> Message-ID: <20070325073818.GY20990@sashak.voltaire.com> On 21:18 Sat 24 Mar , Tzachi Dar wrote: > After some more checks, and given the limited time that I had for this, > I'm afraid that I wasn't able to find a way to do that. Do you have any > idea of how this can be done? I think that using one of the ready pthread for windows implementations could be more optimal way. However if you still prefer a "wrapper" solution I found this article, this describes pthread_cond_wait() implementation issue, guess it is potentially hardest part of such wrapper. Hope it is useful. http://www.cs.wustl.edu/~schmidt/win32-cv-1.html Sasha > > Thanks > Tzachi > > > -----Original Message----- > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > Sent: Friday, March 23, 2007 3:42 PM > > To: Tzachi Dar > > Cc: Hal Rosenstock; ofw at lists.openfabrics.org; Gilad Shainer; > > OPENIB; Fab Tillier > > Subject: Re: [openib-general] [ofw] [Fwd: Re: [Fwd: > > Re:winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]] > > > > Hi Tzahi, > > > > On 18:56 Wed 21 Feb , Tzachi Dar wrote: > > > > > > To be on the practical side, I have read the introduction > > to pthreads > > > in the past and from what I saw it was relatively easy to implement > > > that on Win32. I want to look at the functions that were mentioned > > > before in this thread and see if that is still the case. > > > > > > Let me get back to you on this at the beginning of next week. > > > > Any news here? Thanks. > > > > Sasha > > From monil at voltaire.com Sun Mar 25 00:40:10 2007 From: monil at voltaire.com (Moni Levy) Date: Sun, 25 Mar 2007 09:40:10 +0200 Subject: [ofa-general] Re: [ewg] Re: bugs to fix for OFED 1.2 RC1 In-Reply-To: <20070322172245.GB17532@mellanox.co.il> References: <6a122cc00703220602s7cdad558ud73f72e39f812eaf@mail.gmail.com> <20070322172245.GB17532@mellanox.co.il> Message-ID: <6a122cc00703250040h7c29003endd1d1359d3b903b7@mail.gmail.com> On 3/22/07, Michael S. Tsirkin wrote: > > I would like to add these two to the list: > > > > IPoIB passes async events to an > > 413 nor P3 All mst at mellanox.co.il NEW unrelated devices. > > > > This is not a problem. > > > 420 cri P3 All monil at voltaire.com NEW PKey table reordering caused by > > SM failover stops ipoib t... > > Please re-post the latest patch on openib-general. > I'd like Roland's feedback. I'll try to do that tomorrow -- Moni > > -- > MST > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > From mst at dev.mellanox.co.il Sun Mar 25 02:17:43 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 25 Mar 2007 11:17:43 +0200 Subject: [ofa-general] [PATCH for-2.6.21] IB/mthca: fix thinko in init_mr_table In-Reply-To: <77F22E47-0C9B-4EF6-A1A9-902B3E6C566D@scl.ameslab.gov> References: <77F22E47-0C9B-4EF6-A1A9-902B3E6C566D@scl.ameslab.gov> Message-ID: <20070325091743.GA25332@mellanox.co.il> Patch 'IB/mthca: Merge MR and FMR space on 64-bit systems' swapped the number of MTTs and MPTs. As a result, we get kernel oops when the number of MTT segments allocated exceeds 0x20000. Noted by Troy Benjegerdes , and reproduced by Dotan Barak . OpenFabrics bugzilla 490. Signed-off-by: Michael S. Tsirkin --- Roland, please pull for 2.6.21. Links: http://article.gmane.org/gmane.linux.drivers.openib/37956 https://bugs.openfabrics.org/show_bug.cgi?id=490 diff --git a/drivers/infiniband/hw/mthca/mthca_mr.c b/drivers/infiniband/hw/mthca/mthca_mr.c index 8e4846b..fdb576d 100644 --- a/drivers/infiniband/hw/mthca/mthca_mr.c +++ b/drivers/infiniband/hw/mthca/mthca_mr.c @@ -881,8 +881,8 @@ int mthca_init_mr_table(struct mthca_dev *dev) } mpts = mtts = 1 << i; } else { - mpts = dev->limits.num_mtt_segs; - mtts = dev->limits.num_mpts; + mtts = dev->limits.num_mtt_segs; + mpts = dev->limits.num_mpts; } if (!mthca_is_memfree(dev) && -- MST From dotanb at dev.mellanox.co.il Sun Mar 25 02:22:20 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Sun, 25 Mar 2007 11:22:20 +0200 Subject: [ofa-general] interesting ibv_reg_mr failures In-Reply-To: <77F22E47-0C9B-4EF6-A1A9-902B3E6C566D@scl.ameslab.gov> References: <77F22E47-0C9B-4EF6-A1A9-902B3E6C566D@scl.ameslab.gov> Message-ID: <46063F4C.20908@dev.mellanox.co.il> Hi Troy. I can only answer about your info which is related to the mthca devices. Troy Benjegerdes wrote: > We have been getting some interesting failures with ibv_reg_mr.. > > gcc -ggdb -libverbs -o mr-test mr-test.c > /usr/src/ibv-mr-test/mr-test > mr-test: bufsize 1048576 > device # 0 name="mthca0" guid="00066a0098000464" > ibv_open_device() context=0x10012c98 > ibv_alloc_pd() pd=0x10013678 > alloc: 2482 > ibv_reg_mr failed:: Cannot allocate memory > fw_ver: 3.3.2 > max_mr_size 0xffffffffffffffff > max_mr: 131056, could only register 2482 regions > sleep 5 sec > free: 0 > done I wasn't able to reproduce this failure but i noticed that you are using an old FW version (current version is 3.5.0). > > with a 10MB buffer: > > gcc -ggdb -libverbs -o mr-test mr-test.c > /usr/src/ibv-mr-test/mr-test > mr-test: bufsize 10485760 > device # 0 name="mthca0" guid="00066a0098000464" > ibv_open_device() context=0x10012c98 > ibv_alloc_pd() pd=0x10013678 > alloc: 2482 > ibv_reg_mr failed:: Cannot allocate memory > fw_ver: 3.3.2 > max_mr_size 0xffffffffffffffff > max_mr: 131056, could only register 2482 regions > sleep 5 sec > free: 0 > done On 64 bit machine i got a kernel oops, bug number 490 was opened in the Bugzilla and we are analyzing this failure. > And, on an PCI-express mellanox hca: > /afs/scl.ameslab.gov/user/troy/src/ibv-mr-test/mr-test > mr-test: bufsize 10485760 > device # 0 name="mthca0" guid="0002c9020040272c" > ibv_open_device() context=0x504c00 > ibv_alloc_pd() pd=0x503f30 > alloc: 12277 > ibv_reg_mr failed:: Cannot allocate memory > fw_ver: 5.1.0 > max_mr_size 0xffffffffffffffff > max_mr: 131056, could only register 12277 regions > sleep 5 sec > free: 0 > done I'm checking this issue and let you know about what i will find. > > On the pci-express hca, it also looks like the memory usage, as > reported by "free" goes down by about 300MB once all these regions are > allocated.. but the process usage as reported by top is only 20mb > total virtual size. What's going on here? are you talking about the "free memory" which is reported by top? thanks Dotan From vlad at lists.openfabrics.org Sun Mar 25 02:35:29 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Sun, 25 Mar 2007 02:35:29 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070325-0200 daily build status Message-ID: <20070325093530.5DBDBE6080E@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.17 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Failed: From monil at voltaire.com Sun Mar 25 02:44:11 2007 From: monil at voltaire.com (Moni Levy) Date: Sun, 25 Mar 2007 11:44:11 +0200 Subject: [ofa-general] [Bug 450] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 In-Reply-To: <20070314095505.GA9721@mellanox.co.il> References: <20070313123715.GS2608@mellanox.co.il> <6a122cc00703130908v2b97b85fg2816cc22e179da50@mail.gmail.com> <20070313161840.GD16246@mellanox.co.il> <6a122cc00703140224w201a31d4v70d9ec360b4bde7@mail.gmail.com> <20070314095505.GA9721@mellanox.co.il> Message-ID: <6a122cc00703250244t751634fawa5efdd8238a4130d@mail.gmail.com> On 3/14/07, Michael S. Tsirkin wrote: > > Quoting Moni Levy : > > Subject: Re: [ofa-general] [Bug 450] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 > > > > On 3/13/07, Michael S. Tsirkin wrote: > > >> >Why are you playing with ifc mtu? > > >> > > >> We needed to do some IP forwarding tests between IB and Ethernet and > > >> wanted to have the same MTU for the eth0 and ib0 interfaces, after > > >> that we reproduced that in a peer to peer configuration. > > > > > >OK but PMTU discovery would do this better, won't it? > > > > I'm not sure (it will probably drop some packets at the beginning). > > Can you check pls? > > > Anyway lower MTU should work. > > Manually tweaking MTU seems to be broken in lots of systems - > I sometimes see the same behaviour with gigabit ethernet. > > > >As a side note, with 1.5K MTU it's probably better to use datagram mode > > >anyway. > > > > Now I see that I missed that in the report. We used datagram mode. > > Did you try checking ethernet BW on the same machine? Ethernet seem to work fine. I can't reproduce the bug in the last week. The only thing that I see now is that at that same MTU and connected mode enabled I get the same low BW. What version of tcpdump are You using ? -- Moni > > -- > MST > From bugzilla-daemon at lists.openfabrics.org Sun Mar 25 02:44:22 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Sun, 25 Mar 2007 02:44:22 -0700 (PDT) Subject: [ofa-general] [Bug 450] IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 In-Reply-To: Message-ID: <20070325094423.33EAEE60816@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=450 ------- Comment #7 from monil at voltaire.com 2007-03-25 02:44 ------- Subject: Re: [ofa-general] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 On 3/14/07, Michael S. Tsirkin wrote: > > Quoting Moni Levy : > > Subject: Re: [ofa-general] [Bug 450] New: IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 > > > > On 3/13/07, Michael S. Tsirkin wrote: > > >> >Why are you playing with ifc mtu? > > >> > > >> We needed to do some IP forwarding tests between IB and Ethernet and > > >> wanted to have the same MTU for the eth0 and ib0 interfaces, after > > >> that we reproduced that in a peer to peer configuration. > > > > > >OK but PMTU discovery would do this better, won't it? > > > > I'm not sure (it will probably drop some packets at the beginning). > > Can you check pls? > > > Anyway lower MTU should work. > > Manually tweaking MTU seems to be broken in lots of systems - > I sometimes see the same behaviour with gigabit ethernet. > > > >As a side note, with 1.5K MTU it's probably better to use datagram mode > > >anyway. > > > > Now I see that I missed that in the report. We used datagram mode. > > Did you try checking ethernet BW on the same machine? Ethernet seem to work fine. I can't reproduce the bug in the last week. The only thing that I see now is that at that same MTU and connected mode enabled I get the same low BW. What version of tcpdump are You using ? -- Moni > > -- > MST > -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From amip at dev.mellanox.co.il Sun Mar 25 02:48:12 2007 From: amip at dev.mellanox.co.il (Ami Perlmutter) Date: Sun, 25 Mar 2007 11:48:12 +0200 Subject: [ofa-general] netinet/if_fddi.h broken in RH5 Message-ID: <1174816092.5946.4.camel@Ami-desktop> netinet/if_fddi.h does not pull linux/types.h and causes netstat compilation to break. In file included from /usr/include/netinet/if_fddi.h:26, from fddi.c:31: /usr/include/linux/if_fddi.h:88: error: expected specifier-qualifier-list before ‘__be16’ From erezz at voltaire.com Sun Mar 25 02:59:47 2007 From: erezz at voltaire.com (Erez Zilber) Date: Sun, 25 Mar 2007 11:59:47 +0200 Subject: [ofa-general] [PATCH 0/2] IB/iser: bug fixes for 2.6.21 Message-ID: <46064813.6070208@voltaire.com> Roland, Here is a series of patches for iSER. All of them are bug fixes. I hope that they can be added to 2.6.21. Thanks Erez From erezz at voltaire.com Sun Mar 25 03:07:10 2007 From: erezz at voltaire.com (Erez Zilber) Date: Sun, 25 Mar 2007 12:07:10 +0200 Subject: [ofa-general] [PATCH 1/2] IB/iser: do not assume that a task may be aborted only after the qp times out In-Reply-To: <46064813.6070208@voltaire.com> References: <46064813.6070208@voltaire.com> Message-ID: <460649CE.3080805@voltaire.com> scsi-ml may abort a command that was already sent. If the initiator is still trying to send the command (or data-out PDUs for that command), the qp may time out after scsi-ml times out. Therefore, when aborting the command, iSER may still have references for the command's buffers. When sending these PDUs will complete with an error, their resources will be released. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iser_initiator.c | 17 +++++++++-------- 1 files changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 89e3728..278fcbc 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -658,6 +658,7 @@ void iser_ctask_rdma_finalize(struct isc { int deferred; int is_rdma_aligned = 1; + struct iser_regd_buf *regd; /* if we were reading, copy back to unaligned sglist, * anyway dma_unmap and free the copy @@ -672,20 +673,20 @@ void iser_ctask_rdma_finalize(struct isc } if (iser_ctask->dir[ISER_DIR_IN]) { - deferred = iser_regd_buff_release - (&iser_ctask->rdma_regd[ISER_DIR_IN]); + regd = &iser_ctask->rdma_regd[ISER_DIR_IN]; + deferred = iser_regd_buff_release(regd); if (deferred) { - iser_err("References remain for BUF-IN rdma reg\n"); - BUG(); + iser_err("%d references remain for BUF-IN rdma reg\n", + atomic_read(®d->ref_count)); } } if (iser_ctask->dir[ISER_DIR_OUT]) { - deferred = iser_regd_buff_release - (&iser_ctask->rdma_regd[ISER_DIR_OUT]); + regd = &iser_ctask->rdma_regd[ISER_DIR_OUT]; + deferred = iser_regd_buff_release(regd); if (deferred) { - iser_err("References remain for BUF-OUT rdma reg\n"); - BUG(); + iser_err("%d references remain for BUF-OUT rdma reg\n", + atomic_read(®d->ref_count)); } } -- 1.4.2 From erezz at voltaire.com Sun Mar 25 03:10:00 2007 From: erezz at voltaire.com (Erez Zilber) Date: Sun, 25 Mar 2007 12:10:00 +0200 Subject: [ofa-general] [PATCH 2/2] IB/iser: iser_conn should not be released while its workqueue is active In-Reply-To: <46064813.6070208@voltaire.com> References: <46064813.6070208@voltaire.com> Message-ID: <46064A78.5050005@voltaire.com> When shutting down a connection, after there are no more posted buffers, iSER schedules a workqueue that is used to notify the iSCSI layer that the connection is down. Meanwhile, the connection object (that holds the workqueue) is released. If the workqueue function wasn't called yet, a NULL pointer will be referenced. This fix waits until the workqueue function is called and only then it releases the connection object. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iscsi_iser.h | 1 + drivers/infiniband/ulp/iser/iser_verbs.c | 9 +++++++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index cae8c96..0e11f79 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -242,6 +242,7 @@ struct iser_conn { struct ib_qp *qp; /* QP */ struct ib_fmr_pool *fmr_pool; /* pool of IB FMRs */ int disc_evt_flag; /* disconn event delivered */ + int wait_comp_error_work; wait_queue_head_t wait; /* waitq for conn/disconn */ atomic_t post_recv_buf_count; /* posted rx count */ atomic_t post_send_buf_count; /* posted tx count */ diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 693b770..dc2df33 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -331,7 +331,8 @@ void iser_conn_terminate(struct iser_con ib_conn,err); wait_event_interruptible(ib_conn->wait, - ib_conn->state == ISER_CONN_DOWN); + ib_conn->state == ISER_CONN_DOWN && + ib_conn->wait_comp_error_work == 0); iser_conn_release(ib_conn); } @@ -771,6 +772,8 @@ static void iser_comp_error_worker(struc ib_conn->state = ISER_CONN_DOWN; wake_up_interruptible(&ib_conn->wait); } + + ib_conn->wait_comp_error_work = 0; } static void iser_handle_comp_error(struct iser_desc *desc) @@ -791,8 +794,10 @@ static void iser_handle_comp_error(struc } if (atomic_read(&ib_conn->post_recv_buf_count) == 0 && - atomic_read(&ib_conn->post_send_buf_count) == 0) + atomic_read(&ib_conn->post_send_buf_count) == 0) { + ib_conn->wait_comp_error_work = 1; schedule_work(&ib_conn->comperror_work); + } } static void iser_cq_tasklet_fn(unsigned long data) -- 1.4.2 From vlad at mellanox.co.il Sun Mar 25 03:11:37 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Sun, 25 Mar 2007 12:11:37 +0200 Subject: [ofa-general] Re: [PATCH ofed_1_2] Fixes to the Chelsio iWARP drivers. In-Reply-To: <1174785876.24222.7.camel@stevo-laptop> References: <1174785876.24222.7.camel@stevo-laptop> Message-ID: <1174817497.23571.8.camel@vladsk-laptop> On Sat, 2007-03-24 at 20:24 -0500, Steve Wise wrote: > Vlad, > > Please pull these fixes to the Chelsio drivers for ofed_1_2. They can be pulled from > > git://staging.openfabrics.org/~swise/ofed_1_2.git ofed_1_2 > > Thanks, > > Steve. > > Hi Steve, The patch t3_hw_to_2_6_5-7_244.patch is broken with the last changes. Please fix it and then I'll pull from your tree. Please test your commit on staging with the following command: env git_url=git://staging.openfabrics.org/~swise/ofed_1_2.git \ CHECK_LOCAL=yes CHECK_KERNEL_ORG=yes CHECK_CROSS=yes \ /home/vlad/scripts/build_ofa_kernel.sh 2>&1 | tee /tmp/ker.log You will get: ... Build failed on x86_64 with linux-2.6.9-42.ELsmp Build failed on x86_64 with linux-2.6.9-34.ELsmp Build failed on x86_64 with linux-2.6.9-22.ELsmp Build failed on x86_64 with linux-2.6.5-7.244-smp ... Log (same for kernels above): ... /tmp/gen2_devel_kernel-20070325-1108_check/kernel_patches/backport/2.6.9_U4/t3_hw_to_2_6_5-7_244.patch /usr/bin/quilt --quiltrc /tmp/gen2_devel_kernel-20070325-1108_check/patches/quiltrc import /tmp/gen2_devel_kernel-20070325-1108_check/kernel_patches/backport/2.6.9_U4/t3_hw_to_2_6_5-7_244.patch Importing patch /tmp/gen2_devel_kernel-20070325-1108_check/kernel_patches/backport/2.6.9_U4/t3_hw_to_2_6_5-7_244.patch (stored as t3_hw_to_2_6_5-7_244.patch) /usr/bin/quilt --quiltrc /tmp/gen2_devel_kernel-20070325-1108_check/patches/quiltrc push patches/t3_hw_to_2_6_5-7_244.patch Applying patch t3_hw_to_2_6_5-7_244.patch patching file drivers/net/cxgb3/adapter.h Hunk #1 succeeded at 188 (offset 9 lines). patching file drivers/net/cxgb3/t3_hw.c Hunk #1 FAILED at 3250. Hunk #2 FAILED at 3268. Hunk #3 succeeded at 3362 (offset 5 lines). 2 out of 3 hunks FAILED -- rejects in file drivers/net/cxgb3/t3_hw.c -- Vladimir Sokolovsky Mellanox Technologies Ltd. From sashak at voltaire.com Sun Mar 25 03:52:34 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 25 Mar 2007 12:52:34 +0200 Subject: [ofa-general] [PATCH] opensm: chdir to / in the daemon mode Message-ID: <20070325105234.GA30920@sashak.voltaire.com> Change working directory to "/" when running in the daemon mode. It is in order to not keep the filesystem busy and prevent potential umount failures. Signed-off-by: Sasha Khapyorsky --- osm/opensm/main.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/osm/opensm/main.c b/osm/opensm/main.c index 3f465e9..d5c3699 100644 --- a/osm/opensm/main.c +++ b/osm/opensm/main.c @@ -548,6 +548,8 @@ static int daemonize(osm_opensm_t *osm) } else if (pid > 0) exit(0); + chdir("/"); + close(0); close(1); close(2); -- 1.5.1.rc1.18.ga41b4 From kliteyn at dev.mellanox.co.il Sun Mar 25 04:02:48 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 25 Mar 2007 13:02:48 +0200 Subject: [ofa-general] [PATCH] opensm: chdir to / in the daemon mode In-Reply-To: <20070325105234.GA30920@sashak.voltaire.com> References: <20070325105234.GA30920@sashak.voltaire.com> Message-ID: <460656D8.6080602@dev.mellanox.co.il> Sasha Khapyorsky wrote: > Change working directory to "/" when running in the daemon mode. It is > in order to not keep the filesystem busy and prevent potential umount > failures. > > Signed-off-by: Sasha Khapyorsky > --- > osm/opensm/main.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/osm/opensm/main.c b/osm/opensm/main.c > index 3f465e9..d5c3699 100644 > --- a/osm/opensm/main.c > +++ b/osm/opensm/main.c > @@ -548,6 +548,8 @@ static int daemonize(osm_opensm_t *osm) > } else if (pid > 0) > exit(0); > > + chdir("/"); What if there is a coredump? Wouldn't it be better to chdir to /tmp/ instead ? -- Yevgeny > + > close(0); > close(1); > close(2); From sashak at voltaire.com Sun Mar 25 04:18:23 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 25 Mar 2007 13:18:23 +0200 Subject: [ofa-general] [PATCH] opensm: chdir to / in the daemon mode In-Reply-To: <460656D8.6080602@dev.mellanox.co.il> References: <20070325105234.GA30920@sashak.voltaire.com> <460656D8.6080602@dev.mellanox.co.il> Message-ID: <20070325111823.GB30920@sashak.voltaire.com> On 13:02 Sun 25 Mar , Yevgeny Kliteynik wrote: > Sasha Khapyorsky wrote: > >Change working directory to "/" when running in the daemon mode. It is > >in order to not keep the filesystem busy and prevent potential umount > >failures. > > > >Signed-off-by: Sasha Khapyorsky > >--- > > osm/opensm/main.c | 2 ++ > > 1 files changed, 2 insertions(+), 0 deletions(-) > > > >diff --git a/osm/opensm/main.c b/osm/opensm/main.c > >index 3f465e9..d5c3699 100644 > >--- a/osm/opensm/main.c > >+++ b/osm/opensm/main.c > >@@ -548,6 +548,8 @@ static int daemonize(osm_opensm_t *osm) > > } else if (pid > 0) > > exit(0); > > > >+ chdir("/"); > > What if there is a coredump? This will go at "/". > Wouldn't it be better to chdir > to /tmp/ instead ? /tmp can be mounted fs as well. Sasha From mst at dev.mellanox.co.il Sun Mar 25 05:06:14 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 25 Mar 2007 14:06:14 +0200 Subject: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode In-Reply-To: <20070325111823.GB30920@sashak.voltaire.com> References: <20070325105234.GA30920@sashak.voltaire.com> <460656D8.6080602@dev.mellanox.co.il> <20070325111823.GB30920@sashak.voltaire.com> Message-ID: <20070325120614.GC30425@mellanox.co.il> > Quoting Sasha Khapyorsky : > Subject: Re: [PATCH] opensm: chdir to / in the daemon mode > > On 13:02 Sun 25 Mar , Yevgeny Kliteynik wrote: > > Sasha Khapyorsky wrote: > > >Change working directory to "/" when running in the daemon mode. It is > > >in order to not keep the filesystem busy and prevent potential umount > > >failures. > > > > > >Signed-off-by: Sasha Khapyorsky > > >--- > > > osm/opensm/main.c | 2 ++ > > > 1 files changed, 2 insertions(+), 0 deletions(-) > > > > > >diff --git a/osm/opensm/main.c b/osm/opensm/main.c > > >index 3f465e9..d5c3699 100644 > > >--- a/osm/opensm/main.c > > >+++ b/osm/opensm/main.c > > >@@ -548,6 +548,8 @@ static int daemonize(osm_opensm_t *osm) > > > } else if (pid > 0) > > > exit(0); > > > > > >+ chdir("/"); > > > > What if there is a coredump? > > This will go at "/". > > > Wouldn't it be better to chdir > > to /tmp/ instead ? > > /tmp can be mounted fs as well. > > Sasha So can / IMO. Isn't this a user error? Shouldn't we let the user decide where to run it? -- MST From halr at voltaire.com Sun Mar 25 06:38:47 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 25 Mar 2007 08:38:47 -0500 Subject: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode In-Reply-To: <20070325120614.GC30425@mellanox.co.il> References: <20070325105234.GA30920@sashak.voltaire.com> <460656D8.6080602@dev.mellanox.co.il> <20070325111823.GB30920@sashak.voltaire.com> <20070325120614.GC30425@mellanox.co.il> Message-ID: <1174829924.24305.335293.camel@hal.voltaire.com> On Sun, 2007-03-25 at 07:06, Michael S. Tsirkin wrote: > > Quoting Sasha Khapyorsky : > > Subject: Re: [PATCH] opensm: chdir to / in the daemon mode > > > > On 13:02 Sun 25 Mar , Yevgeny Kliteynik wrote: > > > Sasha Khapyorsky wrote: > > > >Change working directory to "/" when running in the daemon mode. It is > > > >in order to not keep the filesystem busy and prevent potential umount > > > >failures. > > > > > > > >Signed-off-by: Sasha Khapyorsky > > > >--- > > > > osm/opensm/main.c | 2 ++ > > > > 1 files changed, 2 insertions(+), 0 deletions(-) > > > > > > > >diff --git a/osm/opensm/main.c b/osm/opensm/main.c > > > >index 3f465e9..d5c3699 100644 > > > >--- a/osm/opensm/main.c > > > >+++ b/osm/opensm/main.c > > > >@@ -548,6 +548,8 @@ static int daemonize(osm_opensm_t *osm) > > > > } else if (pid > 0) > > > > exit(0); > > > > > > > >+ chdir("/"); > > > > > > What if there is a coredump? > > > > This will go at "/". > > > > > Wouldn't it be better to chdir > > > to /tmp/ instead ? > > > > /tmp can be mounted fs as well. > > > > Sasha > > So can / IMO. > Isn't this a user error? > Shouldn't we let the user decide where to run it? How about using OSM_DEFAULT_TMP_DIR for this ? We're putting other dumps there (like osm.fdbs, osm.mcfdbs, osm-subnet.lst, etc.). -- Hal From kliteyn at dev.mellanox.co.il Sun Mar 25 05:58:22 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 25 Mar 2007 14:58:22 +0200 Subject: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode In-Reply-To: <1174829924.24305.335293.camel@hal.voltaire.com> References: <20070325105234.GA30920@sashak.voltaire.com> <460656D8.6080602@dev.mellanox.co.il> <20070325111823.GB30920@sashak.voltaire.com> <20070325120614.GC30425@mellanox.co.il> <1174829924.24305.335293.camel@hal.voltaire.com> Message-ID: <460671EE.7000904@dev.mellanox.co.il> Hal Rosenstock wrote: > On Sun, 2007-03-25 at 07:06, Michael S. Tsirkin wrote: >>> Quoting Sasha Khapyorsky : >>> Subject: Re: [PATCH] opensm: chdir to / in the daemon mode >>> >>> On 13:02 Sun 25 Mar , Yevgeny Kliteynik wrote: >>>> Sasha Khapyorsky wrote: >>>>> Change working directory to "/" when running in the daemon mode. It is >>>>> in order to not keep the filesystem busy and prevent potential umount >>>>> failures. >>>>> >>>>> Signed-off-by: Sasha Khapyorsky >>>>> --- >>>>> osm/opensm/main.c | 2 ++ >>>>> 1 files changed, 2 insertions(+), 0 deletions(-) >>>>> >>>>> diff --git a/osm/opensm/main.c b/osm/opensm/main.c >>>>> index 3f465e9..d5c3699 100644 >>>>> --- a/osm/opensm/main.c >>>>> +++ b/osm/opensm/main.c >>>>> @@ -548,6 +548,8 @@ static int daemonize(osm_opensm_t *osm) >>>>> } else if (pid > 0) >>>>> exit(0); >>>>> >>>>> + chdir("/"); >>>> What if there is a coredump? >>> This will go at "/". >>> >>>> Wouldn't it be better to chdir >>>> to /tmp/ instead ? >>> /tmp can be mounted fs as well. >>> >>> Sasha >> So can / IMO. >> Isn't this a user error? >> Shouldn't we let the user decide where to run it? > > How about using OSM_DEFAULT_TMP_DIR for this ? We're putting other dumps > there (like osm.fdbs, osm.mcfdbs, osm-subnet.lst, etc.). Sounds good to me - this way OSM won't spread dumps in different directories. And this also gives the user an option to decide where these dumps will be. -- Yevgeny > -- Hal > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Sun Mar 25 06:12:37 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 25 Mar 2007 15:12:37 +0200 Subject: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode In-Reply-To: <20070325120614.GC30425@mellanox.co.il> References: <20070325105234.GA30920@sashak.voltaire.com> <460656D8.6080602@dev.mellanox.co.il> <20070325111823.GB30920@sashak.voltaire.com> <20070325120614.GC30425@mellanox.co.il> Message-ID: <20070325131237.GE19999@sashak.voltaire.com> On 14:06 Sun 25 Mar , Michael S. Tsirkin wrote: > > Quoting Sasha Khapyorsky : > > Subject: Re: [PATCH] opensm: chdir to / in the daemon mode > > > > On 13:02 Sun 25 Mar , Yevgeny Kliteynik wrote: > > > Sasha Khapyorsky wrote: > > > >Change working directory to "/" when running in the daemon mode. It is > > > >in order to not keep the filesystem busy and prevent potential umount > > > >failures. > > > > > > > >Signed-off-by: Sasha Khapyorsky > > > >--- > > > > osm/opensm/main.c | 2 ++ > > > > 1 files changed, 2 insertions(+), 0 deletions(-) > > > > > > > >diff --git a/osm/opensm/main.c b/osm/opensm/main.c > > > >index 3f465e9..d5c3699 100644 > > > >--- a/osm/opensm/main.c > > > >+++ b/osm/opensm/main.c > > > >@@ -548,6 +548,8 @@ static int daemonize(osm_opensm_t *osm) > > > > } else if (pid > 0) > > > > exit(0); > > > > > > > >+ chdir("/"); > > > > > > What if there is a coredump? > > > > This will go at "/". > > > > > Wouldn't it be better to chdir > > > to /tmp/ instead ? > > > > /tmp can be mounted fs as well. > > > > Sasha > > So can / IMO. > Isn't this a user error? What? Not changing current dir before starting OpenSM? Don't think this should be error. > Shouldn't we let the user decide where to run it? As an option? Possible, but not sure somebody will use this, let's see. Sasha From sashak at voltaire.com Sun Mar 25 06:15:26 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 25 Mar 2007 15:15:26 +0200 Subject: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode In-Reply-To: <1174829924.24305.335293.camel@hal.voltaire.com> References: <20070325105234.GA30920@sashak.voltaire.com> <460656D8.6080602@dev.mellanox.co.il> <20070325111823.GB30920@sashak.voltaire.com> <20070325120614.GC30425@mellanox.co.il> <1174829924.24305.335293.camel@hal.voltaire.com> Message-ID: <20070325131526.GF19999@sashak.voltaire.com> On 08:38 Sun 25 Mar , Hal Rosenstock wrote: > On Sun, 2007-03-25 at 07:06, Michael S. Tsirkin wrote: > > > Quoting Sasha Khapyorsky : > > > Subject: Re: [PATCH] opensm: chdir to / in the daemon mode > > > > > > On 13:02 Sun 25 Mar , Yevgeny Kliteynik wrote: > > > > Sasha Khapyorsky wrote: > > > > >Change working directory to "/" when running in the daemon mode. It is > > > > >in order to not keep the filesystem busy and prevent potential umount > > > > >failures. > > > > > > > > > >Signed-off-by: Sasha Khapyorsky > > > > >--- > > > > > osm/opensm/main.c | 2 ++ > > > > > 1 files changed, 2 insertions(+), 0 deletions(-) > > > > > > > > > >diff --git a/osm/opensm/main.c b/osm/opensm/main.c > > > > >index 3f465e9..d5c3699 100644 > > > > >--- a/osm/opensm/main.c > > > > >+++ b/osm/opensm/main.c > > > > >@@ -548,6 +548,8 @@ static int daemonize(osm_opensm_t *osm) > > > > > } else if (pid > 0) > > > > > exit(0); > > > > > > > > > >+ chdir("/"); > > > > > > > > What if there is a coredump? > > > > > > This will go at "/". > > > > > > > Wouldn't it be better to chdir > > > > to /tmp/ instead ? > > > > > > /tmp can be mounted fs as well. > > > > > > Sasha > > > > So can / IMO. > > Isn't this a user error? > > Shouldn't we let the user decide where to run it? > > How about using OSM_DEFAULT_TMP_DIR for this ? And require from admin to keep this on root fs? Why? > We're putting other dumps > there (like osm.fdbs, osm.mcfdbs, osm-subnet.lst, etc.). This is working directory, not a dump directory, we are not going to write something there. Sasha From mst at dev.mellanox.co.il Sun Mar 25 06:09:38 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 25 Mar 2007 15:09:38 +0200 Subject: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode In-Reply-To: <20070325131237.GE19999@sashak.voltaire.com> References: <20070325105234.GA30920@sashak.voltaire.com> <460656D8.6080602@dev.mellanox.co.il> <20070325111823.GB30920@sashak.voltaire.com> <20070325120614.GC30425@mellanox.co.il> <20070325131237.GE19999@sashak.voltaire.com> Message-ID: <20070325130938.GE30425@mellanox.co.il> > Quoting Sasha Khapyorsky : > Subject: Re: [PATCH] opensm: chdir to / in the daemon mode > > On 14:06 Sun 25 Mar , Michael S. Tsirkin wrote: > > > Quoting Sasha Khapyorsky : > > > Subject: Re: [PATCH] opensm: chdir to / in the daemon mode > > > > > > On 13:02 Sun 25 Mar , Yevgeny Kliteynik wrote: > > > > Sasha Khapyorsky wrote: > > > > >Change working directory to "/" when running in the daemon mode. It is > > > > >in order to not keep the filesystem busy and prevent potential umount > > > > >failures. > > > > > > > > > >Signed-off-by: Sasha Khapyorsky > > > > >--- > > > > > osm/opensm/main.c | 2 ++ > > > > > 1 files changed, 2 insertions(+), 0 deletions(-) > > > > > > > > > >diff --git a/osm/opensm/main.c b/osm/opensm/main.c > > > > >index 3f465e9..d5c3699 100644 > > > > >--- a/osm/opensm/main.c > > > > >+++ b/osm/opensm/main.c > > > > >@@ -548,6 +548,8 @@ static int daemonize(osm_opensm_t *osm) > > > > > } else if (pid > 0) > > > > > exit(0); > > > > > > > > > >+ chdir("/"); > > > > > > > > What if there is a coredump? > > > > > > This will go at "/". > > > > > > > Wouldn't it be better to chdir > > > > to /tmp/ instead ? > > > > > > /tmp can be mounted fs as well. > > > > > > Sasha > > > > So can / IMO. > > Isn't this a user error? > > What? Not changing current dir before starting OpenSM? Don't think this > should be error. Running a program and then unmounting the current directory. > > Shouldn't we let the user decide where to run it? > > As an option? Possible, but not sure somebody will use this, let's see. Do you see other daemons doing this? I just checked git and it does not. -- MST From mst at dev.mellanox.co.il Sun Mar 25 06:17:05 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 25 Mar 2007 15:17:05 +0200 Subject: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode In-Reply-To: <20070325130938.GE30425@mellanox.co.il> References: <20070325105234.GA30920@sashak.voltaire.com> <460656D8.6080602@dev.mellanox.co.il> <20070325111823.GB30920@sashak.voltaire.com> <20070325120614.GC30425@mellanox.co.il> <20070325131237.GE19999@sashak.voltaire.com> <20070325130938.GE30425@mellanox.co.il> Message-ID: <20070325131705.GG30425@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: Re: [PATCH] opensm: chdir to / in the daemon mode > > > Quoting Sasha Khapyorsky : > > Subject: Re: [PATCH] opensm: chdir to / in the daemon mode > > > > On 14:06 Sun 25 Mar , Michael S. Tsirkin wrote: > > > > Quoting Sasha Khapyorsky : > > > > Subject: Re: [PATCH] opensm: chdir to / in the daemon mode > > > > > > > > On 13:02 Sun 25 Mar , Yevgeny Kliteynik wrote: > > > > > Sasha Khapyorsky wrote: > > > > > >Change working directory to "/" when running in the daemon mode. It is > > > > > >in order to not keep the filesystem busy and prevent potential umount > > > > > >failures. > > > > > > > > > > > >Signed-off-by: Sasha Khapyorsky > > > > > >--- > > > > > > osm/opensm/main.c | 2 ++ > > > > > > 1 files changed, 2 insertions(+), 0 deletions(-) > > > > > > > > > > > >diff --git a/osm/opensm/main.c b/osm/opensm/main.c > > > > > >index 3f465e9..d5c3699 100644 > > > > > >--- a/osm/opensm/main.c > > > > > >+++ b/osm/opensm/main.c > > > > > >@@ -548,6 +548,8 @@ static int daemonize(osm_opensm_t *osm) > > > > > > } else if (pid > 0) > > > > > > exit(0); > > > > > > > > > > > >+ chdir("/"); > > > > > > > > > > What if there is a coredump? > > > > > > > > This will go at "/". > > > > > > > > > Wouldn't it be better to chdir > > > > > to /tmp/ instead ? > > > > > > > > /tmp can be mounted fs as well. > > > > > > > > Sasha > > > > > > So can / IMO. > > > Isn't this a user error? > > > > What? Not changing current dir before starting OpenSM? Don't think this > > should be error. > > Running a program and then unmounting the current directory. > > > > Shouldn't we let the user decide where to run it? > > > > As an option? Possible, but not sure somebody will use this, let's see. > > Do you see other daemons doing this? I just checked git and it does not. After some googling, this does seem like a common practice. Need to check, however, that opensm never uses relative paths. -- MST From mst at dev.mellanox.co.il Sun Mar 25 06:17:40 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 25 Mar 2007 15:17:40 +0200 Subject: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode In-Reply-To: <1174829924.24305.335293.camel@hal.voltaire.com> References: <20070325105234.GA30920@sashak.voltaire.com> <460656D8.6080602@dev.mellanox.co.il> <20070325111823.GB30920@sashak.voltaire.com> <20070325120614.GC30425@mellanox.co.il> <1174829924.24305.335293.camel@hal.voltaire.com> Message-ID: <20070325131740.GH30425@mellanox.co.il> > Quoting Hal Rosenstock : > Subject: Re: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode > > On Sun, 2007-03-25 at 07:06, Michael S. Tsirkin wrote: > > > Quoting Sasha Khapyorsky : > > > Subject: Re: [PATCH] opensm: chdir to / in the daemon mode > > > > > > On 13:02 Sun 25 Mar , Yevgeny Kliteynik wrote: > > > > Sasha Khapyorsky wrote: > > > > >Change working directory to "/" when running in the daemon mode. It is > > > > >in order to not keep the filesystem busy and prevent potential umount > > > > >failures. > > > > > > > > > >Signed-off-by: Sasha Khapyorsky > > > > >--- > > > > > osm/opensm/main.c | 2 ++ > > > > > 1 files changed, 2 insertions(+), 0 deletions(-) > > > > > > > > > >diff --git a/osm/opensm/main.c b/osm/opensm/main.c > > > > >index 3f465e9..d5c3699 100644 > > > > >--- a/osm/opensm/main.c > > > > >+++ b/osm/opensm/main.c > > > > >@@ -548,6 +548,8 @@ static int daemonize(osm_opensm_t *osm) > > > > > } else if (pid > 0) > > > > > exit(0); > > > > > > > > > >+ chdir("/"); > > > > > > > > What if there is a coredump? > > > > > > This will go at "/". > > > > > > > Wouldn't it be better to chdir > > > > to /tmp/ instead ? > > > > > > /tmp can be mounted fs as well. > > > > > > Sasha > > > > So can / IMO. > > Isn't this a user error? > > Shouldn't we let the user decide where to run it? > > How about using OSM_DEFAULT_TMP_DIR for this ? We're putting other dumps > there (like osm.fdbs, osm.mcfdbs, osm-subnet.lst, etc.). Or the log directory? -- MST From halr at voltaire.com Sun Mar 25 08:39:07 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 25 Mar 2007 10:39:07 -0500 Subject: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode In-Reply-To: <20070325131740.GH30425@mellanox.co.il> References: <20070325105234.GA30920@sashak.voltaire.com> <460656D8.6080602@dev.mellanox.co.il> <20070325111823.GB30920@sashak.voltaire.com> <20070325120614.GC30425@mellanox.co.il> <1174829924.24305.335293.camel@hal.voltaire.com> <20070325131740.GH30425@mellanox.co.il> Message-ID: <1174837145.24305.343392.camel@hal.voltaire.com> On Sun, 2007-03-25 at 08:17, Michael S. Tsirkin wrote: > > Quoting Hal Rosenstock : > > Subject: Re: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode > > > > On Sun, 2007-03-25 at 07:06, Michael S. Tsirkin wrote: > > > > Quoting Sasha Khapyorsky : > > > > Subject: Re: [PATCH] opensm: chdir to / in the daemon mode > > > > > > > > On 13:02 Sun 25 Mar , Yevgeny Kliteynik wrote: > > > > > Sasha Khapyorsky wrote: > > > > > >Change working directory to "/" when running in the daemon mode. It is > > > > > >in order to not keep the filesystem busy and prevent potential umount > > > > > >failures. > > > > > > > > > > > >Signed-off-by: Sasha Khapyorsky > > > > > >--- > > > > > > osm/opensm/main.c | 2 ++ > > > > > > 1 files changed, 2 insertions(+), 0 deletions(-) > > > > > > > > > > > >diff --git a/osm/opensm/main.c b/osm/opensm/main.c > > > > > >index 3f465e9..d5c3699 100644 > > > > > >--- a/osm/opensm/main.c > > > > > >+++ b/osm/opensm/main.c > > > > > >@@ -548,6 +548,8 @@ static int daemonize(osm_opensm_t *osm) > > > > > > } else if (pid > 0) > > > > > > exit(0); > > > > > > > > > > > >+ chdir("/"); > > > > > > > > > > What if there is a coredump? > > > > > > > > This will go at "/". > > > > > > > > > Wouldn't it be better to chdir > > > > > to /tmp/ instead ? > > > > > > > > /tmp can be mounted fs as well. > > > > > > > > Sasha > > > > > > So can / IMO. > > > Isn't this a user error? > > > Shouldn't we let the user decide where to run it? > > > > How about using OSM_DEFAULT_TMP_DIR for this ? We're putting other dumps > > there (like osm.fdbs, osm.mcfdbs, osm-subnet.lst, etc.). > > Or the log directory? That is the default dierctory for the log file too. #define OSM_DEFAULT_TMP_DIR "/var/log/" #define OSM_DEFAULT_LOG_FILE "/var/log/osm.log" -- Hal From sashak at voltaire.com Sun Mar 25 08:50:11 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 25 Mar 2007 17:50:11 +0200 Subject: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode In-Reply-To: <1174837145.24305.343392.camel@hal.voltaire.com> References: <20070325105234.GA30920@sashak.voltaire.com> <460656D8.6080602@dev.mellanox.co.il> <20070325111823.GB30920@sashak.voltaire.com> <20070325120614.GC30425@mellanox.co.il> <1174829924.24305.335293.camel@hal.voltaire.com> <20070325131740.GH30425@mellanox.co.il> <1174837145.24305.343392.camel@hal.voltaire.com> Message-ID: <20070325155011.GH19999@sashak.voltaire.com> On 10:39 Sun 25 Mar , Hal Rosenstock wrote: > On Sun, 2007-03-25 at 08:17, Michael S. Tsirkin wrote: > > > Quoting Hal Rosenstock : > > > Subject: Re: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode > > > > > > On Sun, 2007-03-25 at 07:06, Michael S. Tsirkin wrote: > > > > > Quoting Sasha Khapyorsky : > > > > > Subject: Re: [PATCH] opensm: chdir to / in the daemon mode > > > > > > > > > > On 13:02 Sun 25 Mar , Yevgeny Kliteynik wrote: > > > > > > Sasha Khapyorsky wrote: > > > > > > >Change working directory to "/" when running in the daemon mode. It is > > > > > > >in order to not keep the filesystem busy and prevent potential umount > > > > > > >failures. > > > > > > > > > > > > > >Signed-off-by: Sasha Khapyorsky > > > > > > >--- > > > > > > > osm/opensm/main.c | 2 ++ > > > > > > > 1 files changed, 2 insertions(+), 0 deletions(-) > > > > > > > > > > > > > >diff --git a/osm/opensm/main.c b/osm/opensm/main.c > > > > > > >index 3f465e9..d5c3699 100644 > > > > > > >--- a/osm/opensm/main.c > > > > > > >+++ b/osm/opensm/main.c > > > > > > >@@ -548,6 +548,8 @@ static int daemonize(osm_opensm_t *osm) > > > > > > > } else if (pid > 0) > > > > > > > exit(0); > > > > > > > > > > > > > >+ chdir("/"); > > > > > > > > > > > > What if there is a coredump? > > > > > > > > > > This will go at "/". > > > > > > > > > > > Wouldn't it be better to chdir > > > > > > to /tmp/ instead ? > > > > > > > > > > /tmp can be mounted fs as well. > > > > > > > > > > Sasha > > > > > > > > So can / IMO. > > > > Isn't this a user error? > > > > Shouldn't we let the user decide where to run it? > > > > > > How about using OSM_DEFAULT_TMP_DIR for this ? We're putting other dumps > > > there (like osm.fdbs, osm.mcfdbs, osm-subnet.lst, etc.). > > > > Or the log directory? > > That is the default dierctory for the log file too. > > #define OSM_DEFAULT_TMP_DIR "/var/log/" > #define OSM_DEFAULT_LOG_FILE "/var/log/osm.log" Yes, but log and/or tmp directory has different purpose which doesn't serve the patch goal. Sasha From swise at opengridcomputing.com Sun Mar 25 15:45:31 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Sun, 25 Mar 2007 17:45:31 -0500 Subject: [ofa-general] Re: [PATCH ofed_1_2] Fixes to the Chelsio iWARP drivers. In-Reply-To: <1174817497.23571.8.camel@vladsk-laptop> References: <1174785876.24222.7.camel@stevo-laptop> <1174817497.23571.8.camel@vladsk-laptop> Message-ID: <1174862731.4817.10.camel@stevo-laptop> On Sun, 2007-03-25 at 12:11 +0200, Vladimir Sokolovsky wrote: > On Sat, 2007-03-24 at 20:24 -0500, Steve Wise wrote: > > Vlad, > > > > Please pull these fixes to the Chelsio drivers for ofed_1_2. They can be pulled from > > > > git://staging.openfabrics.org/~swise/ofed_1_2.git ofed_1_2 > > > > Thanks, > > > > Steve. > > > > > > Hi Steve, > The patch t3_hw_to_2_6_5-7_244.patch is broken with the last changes. > Please fix it and then I'll pull from your tree. > > Please test your commit on staging with the following command: > > env git_url=git://staging.openfabrics.org/~swise/ofed_1_2.git \ > CHECK_LOCAL=yes CHECK_KERNEL_ORG=yes CHECK_CROSS=yes \ > /home/vlad/scripts/build_ofa_kernel.sh 2>&1 | tee /tmp/ker.log > > Sorry Vlad! That was an oversight on my part. I'll run the above build test from now on when updating the chelsio drivers. I forgot about the t3 backport patches... Anyway: I've added a new commit to fix up the backport patches for the cxgb3 module. It now builds on all kernels/platforms. Please pull from git://staging.openfabrics.org/~swise/ofed_1_2 ofed_1_2 Thanks, Steve. From dledford at redhat.com Sun Mar 25 17:30:25 2007 From: dledford at redhat.com (Doug Ledford) Date: Sun, 25 Mar 2007 20:30:25 -0400 Subject: [ofa-general] Re: how do I interactively install the OFED included with RHEL5? In-Reply-To: References: Message-ID: <1174869025.3973.71.camel@athlon-x2.xsintricity.com> On Wed, 2007-03-21 at 11:55 -0700, Scott Weitzenkamp (sweitzen) wrote: > Doug, > > I can't seem to get an interactive RHEL5 install to include all the > OFED packages (for example, libib*). I see an OpenIB choice in the > Base packages, but that doesn't install everything. RHEL should start > using OFED name instead of OpenIB, too. > > I've only been able to install all OFED packages by using Kickstart > and specifying this in my .cfg: > > %packages > @ everything OpenMPI should be in the package list. Most of the openib stuff really isn't intended to be individually selected, but pulled in by dependencies instead. So, selecting OpenMPI will pull in a large number of the items. OpenSM will too. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From troy at scl.ameslab.gov Sun Mar 25 17:41:11 2007 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Sun, 25 Mar 2007 19:41:11 -0500 Subject: [ofa-general] interesting ibv_reg_mr failures In-Reply-To: <46063F4C.20908@dev.mellanox.co.il> References: <77F22E47-0C9B-4EF6-A1A9-902B3E6C566D@scl.ameslab.gov> <46063F4C.20908@dev.mellanox.co.il> Message-ID: <59E28760-2496-4F18-B8B3-DEC6637201AF@scl.ameslab.gov> On Mar 25, 2007, at 4:22 AM, Dotan Barak wrote: > Hi Troy. > > I can only answer about your info which is related to the mthca > devices. > > > Troy Benjegerdes wrote: >> We have been getting some interesting failures with ibv_reg_mr.. >> >> gcc -ggdb -libverbs -o mr-test mr-test.c >> /usr/src/ibv-mr-test/mr-test >> mr-test: bufsize 1048576 >> device # 0 name="mthca0" guid="00066a0098000464" >> ibv_open_device() context=0x10012c98 >> ibv_alloc_pd() pd=0x10013678 >> alloc: 2482 >> ibv_reg_mr failed:: Cannot allocate memory >> fw_ver: 3.3.2 >> max_mr_size 0xffffffffffffffff >> max_mr: 131056, could only register 2482 regions >> sleep 5 sec >> free: 0 >> done > I wasn't able to reproduce this failure but i noticed that you are > using an old FW version (current version is 3.5.0). >> >> with a 10MB buffer: >> >> gcc -ggdb -libverbs -o mr-test mr-test.c >> /usr/src/ibv-mr-test/mr-test >> mr-test: bufsize 10485760 >> device # 0 name="mthca0" guid="00066a0098000464" >> ibv_open_device() context=0x10012c98 >> ibv_alloc_pd() pd=0x10013678 >> alloc: 2482 >> ibv_reg_mr failed:: Cannot allocate memory >> fw_ver: 3.3.2 >> max_mr_size 0xffffffffffffffff >> max_mr: 131056, could only register 2482 regions >> sleep 5 sec >> free: 0 >> done > On 64 bit machine i got a kernel oops, bug number 490 was opened in > the Bugzilla and we are analyzing this failure. >> And, on an PCI-express mellanox hca: >> /afs/scl.ameslab.gov/user/troy/src/ibv-mr-test/mr-test >> mr-test: bufsize 10485760 >> device # 0 name="mthca0" guid="0002c9020040272c" >> ibv_open_device() context=0x504c00 >> ibv_alloc_pd() pd=0x503f30 >> alloc: 12277 >> ibv_reg_mr failed:: Cannot allocate memory >> fw_ver: 5.1.0 >> max_mr_size 0xffffffffffffffff >> max_mr: 131056, could only register 12277 regions >> sleep 5 sec >> free: 0 >> done > I'm checking this issue and let you know about what i will find. >> >> On the pci-express hca, it also looks like the memory usage, as >> reported by "free" goes down by about 300MB once all these regions >> are allocated.. but the process usage as reported by top is only >> 20mb total virtual size. What's going on here? > are you talking about the "free memory" which is reported by top? Both the free memory reported by 'top', and the free memory reported by the 'free' command on debian. From vlad at mellanox.co.il Mon Mar 26 00:21:43 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 26 Mar 2007 09:21:43 +0200 Subject: [ofa-general] Re: [PATCH ofed_1_2] Fixes to the Chelsio iWARP drivers. In-Reply-To: <1174862731.4817.10.camel@stevo-laptop> References: <1174785876.24222.7.camel@stevo-laptop> <1174817497.23571.8.camel@vladsk-laptop> <1174862731.4817.10.camel@stevo-laptop> Message-ID: <1174893703.23571.9.camel@vladsk-laptop> On Sun, 2007-03-25 at 17:45 -0500, Steve Wise wrote: > On Sun, 2007-03-25 at 12:11 +0200, Vladimir Sokolovsky wrote: > > On Sat, 2007-03-24 at 20:24 -0500, Steve Wise wrote: > > > Vlad, > > > > > > Please pull these fixes to the Chelsio drivers for ofed_1_2. They can be pulled from > > > > > > git://staging.openfabrics.org/~swise/ofed_1_2.git ofed_1_2 > > > > > > Thanks, > > > > > > Steve. > > > > > > > > > > Hi Steve, > > The patch t3_hw_to_2_6_5-7_244.patch is broken with the last changes. > > Please fix it and then I'll pull from your tree. > > > > Please test your commit on staging with the following command: > > > > env git_url=git://staging.openfabrics.org/~swise/ofed_1_2.git \ > > CHECK_LOCAL=yes CHECK_KERNEL_ORG=yes CHECK_CROSS=yes \ > > /home/vlad/scripts/build_ofa_kernel.sh 2>&1 | tee /tmp/ker.log > > > > > > Sorry Vlad! That was an oversight on my part. I'll run the above build > test from now on when updating the chelsio drivers. I forgot about the > t3 backport patches... > > Anyway: > > I've added a new commit to fix up the backport patches for the cxgb3 > module. It now builds on all kernels/platforms. > > Please pull from > > git://staging.openfabrics.org/~swise/ofed_1_2 ofed_1_2 > > Thanks, > > Steve. Done, -- Vladimir Sokolovsky Mellanox Technologies Ltd. From amip at dev.mellanox.co.il Mon Mar 26 02:36:32 2007 From: amip at dev.mellanox.co.il (Ami Perlmutter) Date: Mon, 26 Mar 2007 11:36:32 +0200 Subject: [ofa-general] kernel oops when killing opensm Message-ID: <1174901792.5401.2.camel@Ami-desktop> cat /etc/issue Fedora Core release 6 (Zod) Kernel \r on an \m uname -a Linux sw185.lab.mtl.com 2.6.18-1.2798.fc6 #1 SMP Mon Oct 16 14:39:22 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux ib_mad: Method 1 already in use general protection fault: 0000 [1] SMP last sysfs file: /class/infiniband_mad/umad0/port CPU 1 Modules linked in: nfsd exportfs autofs4 hidp nfs lockd fscache nfs_acl rfcomm l2cap bluetooth sunrpc rdma_ucm(U) rds(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_local_sa(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ib_uverbs(U) ib_umad(U) ipv6 dm_mirror dm_multipath dm_mod video sbs i2c_ec button battery asus_acpi ac parport_pc lp parport i2c_nforce2 i2c_core ide_cd sg k8_edac edac_mc cdrom shpchp ib_mthca(U) ib_mad(U) ib_core(U) forcedeth floppy serio_raw pcspkr sata_nv libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 16190, comm: opensm Not tainted 2.6.18-1.2798.fc6 #1 RIP: 0010:[] [] list_del+0x1/0x71 RSP: 0018:ffff8101084d3d88 EFLAGS: 00010002 RAX: ffff81011e9a0040 RBX: ffff810077fa2c50 RCX: ffff8100dffdba80 RDX: 00000000ffffffff RSI: 0012020602040200 RDI: 0012020602040210 RBP: 0012020602040200 R08: ffff810077fa2b10 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0012020602040200 R13: ffff810077fa2a10 R14: 0000000000000005 R15: 0000000000000000 FS: 00002aaaab5081d0(0000) GS:ffff8100036581c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000311760af20 CR3: 0000000046624000 CR4: 00000000000006e0 Process opensm (pid: 16190, threadinfo ffff8101084d2000, task ffff81011e9a0040) Stack: ffff810077fa2c50 ffffffff88344395 0000000000000007 ffff8101084d3e18 ffff810077fa2c00 ffffffff8834446e ffff810003022450 ffff810077fa2a00 ffff8101084d3df0 00007fff77a97aac 0000000000000005 ffffffff8814c2f7 Call Trace: [] :ib_umad:dequeue_send+0x23/0x30 [] :ib_umad:send_handler+0x26/0x7b [] :ib_mad:ib_unregister_mad_agent+0x140/0x454 [] :ib_umad:ib_umad_ioctl+0x201/0x233 [] do_ioctl+0x21/0x6b [] vfs_ioctl+0x256/0x26f [] sys_ioctl+0x59/0x78 [] system_call+0x7e/0x83 DWARF2 unwinder stuck at system_call+0x7e/0x83 Leftover inexact backtrace: Code: 48 8b 47 08 48 89 fb 48 8b 10 48 39 fa 74 1b 48 89 fe 31 c0 RIP [] list_del+0x1/0x71 RSP <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 Call Trace: [] show_trace+0x34/0x47 [] dump_stack+0x12/0x17 [] down_read+0x15/0x23 [] blocking_notifier_call_chain+0x13/0x36 [] do_exit+0x1f/0x8c2 [] kernel_math_error+0x0/0x90 [<0000000000000340>] From vlad at lists.openfabrics.org Mon Mar 26 02:35:12 2007 From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org) Date: Mon, 26 Mar 2007 02:35:12 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070326-0200 daily build status Message-ID: <20070326093512.BAEB0E60817@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.14 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.12 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.16 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on powerpc with linux-2.6.12 Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.17 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.14 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From halr at voltaire.com Mon Mar 26 05:59:22 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Mar 2007 07:59:22 -0500 Subject: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode In-Reply-To: <20070325155011.GH19999@sashak.voltaire.com> References: <20070325105234.GA30920@sashak.voltaire.com> <460656D8.6080602@dev.mellanox.co.il> <20070325111823.GB30920@sashak.voltaire.com> <20070325120614.GC30425@mellanox.co.il> <1174829924.24305.335293.camel@hal.voltaire.com> <20070325131740.GH30425@mellanox.co.il> <1174837145.24305.343392.camel@hal.voltaire.com> <20070325155011.GH19999@sashak.voltaire.com> Message-ID: <1174913960.24305.429115.camel@hal.voltaire.com> On Sun, 2007-03-25 at 10:50, Sasha Khapyorsky wrote: > On 10:39 Sun 25 Mar , Hal Rosenstock wrote: > > On Sun, 2007-03-25 at 08:17, Michael S. Tsirkin wrote: > > > > Quoting Hal Rosenstock : > > > > Subject: Re: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode > > > > > > > > On Sun, 2007-03-25 at 07:06, Michael S. Tsirkin wrote: > > > > > > Quoting Sasha Khapyorsky : > > > > > > Subject: Re: [PATCH] opensm: chdir to / in the daemon mode > > > > > > > > > > > > On 13:02 Sun 25 Mar , Yevgeny Kliteynik wrote: > > > > > > > Sasha Khapyorsky wrote: > > > > > > > >Change working directory to "/" when running in the daemon mode. It is > > > > > > > >in order to not keep the filesystem busy and prevent potential umount > > > > > > > >failures. > > > > > > > > > > > > > > > >Signed-off-by: Sasha Khapyorsky > > > > > > > >--- > > > > > > > > osm/opensm/main.c | 2 ++ > > > > > > > > 1 files changed, 2 insertions(+), 0 deletions(-) > > > > > > > > > > > > > > > >diff --git a/osm/opensm/main.c b/osm/opensm/main.c > > > > > > > >index 3f465e9..d5c3699 100644 > > > > > > > >--- a/osm/opensm/main.c > > > > > > > >+++ b/osm/opensm/main.c > > > > > > > >@@ -548,6 +548,8 @@ static int daemonize(osm_opensm_t *osm) > > > > > > > > } else if (pid > 0) > > > > > > > > exit(0); > > > > > > > > > > > > > > > >+ chdir("/"); > > > > > > > > > > > > > > What if there is a coredump? > > > > > > > > > > > > This will go at "/". > > > > > > > > > > > > > Wouldn't it be better to chdir > > > > > > > to /tmp/ instead ? > > > > > > > > > > > > /tmp can be mounted fs as well. > > > > > > > > > > > > Sasha > > > > > > > > > > So can / IMO. > > > > > Isn't this a user error? > > > > > Shouldn't we let the user decide where to run it? > > > > > > > > How about using OSM_DEFAULT_TMP_DIR for this ? We're putting other dumps > > > > there (like osm.fdbs, osm.mcfdbs, osm-subnet.lst, etc.). > > > > > > Or the log directory? > > > > That is the default dierctory for the log file too. > > > > #define OSM_DEFAULT_TMP_DIR "/var/log/" > > #define OSM_DEFAULT_LOG_FILE "/var/log/osm.log" > > Yes, but log and/or tmp directory has different purpose which doesn't > serve the patch goal. So is an OSM_DEFAULT_CORE_DIR needed then ? -- Hal > Sasha From erezz at voltaire.com Mon Mar 26 05:16:59 2007 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 26 Mar 2007 14:16:59 +0200 Subject: [ofa-general] [PATCH 2/2] IB/iser: iser_conn should not be released while its workqueue is active In-Reply-To: <46064A78.5050005@voltaire.com> References: <46064813.6070208@voltaire.com> <46064A78.5050005@voltaire.com> Message-ID: <4607B9BB.80407@voltaire.com> Roland, Or & I found a bug in this patch. I hope to send a fix for it in the next few days. Meanwhile, please don't merge it. Erez From halr at voltaire.com Mon Mar 26 08:16:58 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Mar 2007 10:16:58 -0500 Subject: [ofa-general] OFA server gitweb Message-ID: <1174922217.24305.438064.camel@hal.voltaire.com> Hi, Should the clone info on the OFA server gitweb page now say: git clone git://git.openfabrics.org/+project path rather than: git clone git://staging.openfabrics.org/+project path Also, none of the links on this page seem to be working (at least for me). -- Hal From jsquyres at cisco.com Mon Mar 26 07:24:07 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Mon, 26 Mar 2007 10:24:07 -0400 Subject: [ofa-general] OFA server gitweb In-Reply-To: <1174922217.24305.438064.camel@hal.voltaire.com> References: <1174922217.24305.438064.camel@hal.voltaire.com> Message-ID: On Mar 26, 2007, at 11:16 AM, Hal Rosenstock wrote: > Should the clone info on the OFA server gitweb page now say: > git clone git://git.openfabrics.org/+project path > rather than: > git clone git://staging.openfabrics.org/+project path Yes, it should. > Also, none of the links on this page seem to be working (at least for > me). Can you cite the specific URL of the page you're talking about? -- Jeff Squyres Cisco Systems From halr at voltaire.com Mon Mar 26 08:25:40 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Mar 2007 10:25:40 -0500 Subject: [ofa-general] OFA server gitweb In-Reply-To: References: <1174922217.24305.438064.camel@hal.voltaire.com> Message-ID: <1174922739.24305.438577.camel@hal.voltaire.com> On Mon, 2007-03-26 at 09:24, Jeff Squyres wrote: > On Mar 26, 2007, at 11:16 AM, Hal Rosenstock wrote: > > > Should the clone info on the OFA server gitweb page now say: > > git clone git://git.openfabrics.org/+project path > > rather than: > > git clone git://staging.openfabrics.org/+project path > > Yes, it should. > > > Also, none of the links on this page seem to be working (at least for > > me). > > Can you cite the specific URL of the page you're talking about? If I click on any project URL or any of the links on the right of each project, I get an "Cannot find server of DNS error" in my browser. This used to work but not sure when it stopped (at least for me). -- Hal From bugzilla-daemon at lists.openfabrics.org Mon Mar 26 07:36:08 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 26 Mar 2007 07:36:08 -0700 (PDT) Subject: [ofa-general] [Bug 351] Routing table problem in SLES10 when using port #2 In-Reply-To: Message-ID: <20070326143608.33EE4E603C8@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=351 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #1 from tziporet at mellanox.co.il 2007-03-26 07:36 ------- Update from Novell: This issue is fixed in SLES10 SP1. There is no patch they can provide us for SLES10. This it will be documented in IPoIB release notes. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Mon Mar 26 07:37:19 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 26 Mar 2007 07:37:19 -0700 (PDT) Subject: [ofa-general] [Bug 449] DMA vs CQ race on IA64 Altix platform In-Reply-To: Message-ID: <20070326143719.24A71E6081B@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=449 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #1 from tziporet at mellanox.co.il 2007-03-26 07:37 ------- Update from Novel: This issue is fixed in SLES10 SP1 There is no patch they can provide us for SLES10 Thus it will be documented in OFED 1.2 release notes. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Mon Mar 26 07:42:12 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 26 Mar 2007 07:42:12 -0700 (PDT) Subject: [ofa-general] [Bug 485] creating & deleting a subinterface with a bad pkey crashs the kernel: NULL pointer reference In-Reply-To: Message-ID: <20070326144212.24CA9E603B2@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=485 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |rolandd at cisco.com -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Mon Mar 26 07:43:54 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 26 Mar 2007 07:43:54 -0700 (PDT) Subject: [ofa-general] [Bug 488] user_mad GRH handling on receive side is incomplete In-Reply-To: Message-ID: <20070326144354.1B219E60818@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=488 ------- Comment #1 from tziporet at mellanox.co.il 2007-03-26 07:43 ------- I don't think we wish to take this for OFED 1.2 now -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From glebn at voltaire.com Mon Mar 26 07:45:16 2007 From: glebn at voltaire.com (Gleb Natapov) Date: Mon, 26 Mar 2007 16:45:16 +0200 Subject: [bugzilla-daemon@lists.openfabrics.org: [ofa-general] [Bug 449] DMA vs CQ race on IA64 Altix platform] Message-ID: <20070326144516.GK14389@minantech.com> I am just curious how is this issue fixed in SLES10 SP1. ----- Forwarded message from bugzilla-daemon at lists.openfabrics.org ----- From: bugzilla-daemon at lists.openfabrics.org To: bugzilla at openib.org Subject: [ofa-general] [Bug 449] DMA vs CQ race on IA64 Altix platform Date: Mon, 26 Mar 2007 07:37:19 -0700 (PDT) https://bugs.openfabrics.org/show_bug.cgi?id=449 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #1 from tziporet at mellanox.co.il 2007-03-26 07:37 ------- Update from Novel: This issue is fixed in SLES10 SP1 There is no patch they can provide us for SLES10 Thus it will be documented in OFED 1.2 release notes. ----- End forwarded message ----- -- Gleb. From bugzilla-daemon at lists.openfabrics.org Mon Mar 26 07:49:55 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 26 Mar 2007 07:49:55 -0700 (PDT) Subject: [ofa-general] [Bug 488] user_mad GRH handling on receive side is incomplete In-Reply-To: Message-ID: <20070326144955.66CF7E607FD@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=488 ------- Comment #2 from halr at voltaire.com 2007-03-26 07:49 ------- I think it's a pretty low risk fix. MST agreed to this aspect of it. It would be nice to have this in. Is there a good reason not to ? -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at lists.openfabrics.org Mon Mar 26 07:51:40 2007 From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org) Date: Mon, 26 Mar 2007 07:51:40 -0700 (PDT) Subject: [ofa-general] [Bug 488] user_mad GRH handling on receive side is incomplete In-Reply-To: Message-ID: <20070326145140.438EFE607FD@openfabrics.org> https://bugs.openfabrics.org/show_bug.cgi?id=488 halr at voltaire.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tziporet at mellanox.co.il -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From moshek at voltaire.com Mon Mar 26 07:52:51 2007 From: moshek at voltaire.com (Moshe Kazir) Date: Mon, 26 Mar 2007 16:52:51 +0200 Subject: [ofa-general] [Bug 449] DMA vs CQ race on IA64 Altix platform In-Reply-To: <20070326143719.24A71E6081B@openfabrics.org> Message-ID: <3857BB049D83424D9DB82753D37CEA5545325C@taurus.voltaire.com> I do not understand what is resolved ? Did Novel include the patch in sles10 sp1 ? What about backporting to sles9 sp 3 ? Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of bugzilla-daemon at lists.openfabrics.org Sent: Monday, March 26, 2007 4:37 PM To: bugzilla at openib.org Subject: [ofa-general] [Bug 449] DMA vs CQ race on IA64 Altix platform https://bugs.openfabrics.org/show_bug.cgi?id=449 tziporet at mellanox.co.il changed: What |Removed |Added ------------------------------------------------------------------------ ---- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #1 from tziporet at mellanox.co.il 2007-03-26 07:37 ------- Update from Novel: This issue is fixed in SLES10 SP1 There is no patch they can provide us for SLES10 Thus it will be documented in OFED 1.2 release notes. -- Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From swise at opengridcomputing.com Mon Mar 26 08:33:25 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 26 Mar 2007 10:33:25 -0500 Subject: [ofa-general] ofa interop test plan Message-ID: <1174923205.10117.10.camel@stevo-desktop> Monty, The IOL info page at http://www.iol.unh.edu/services/testing/ofa/events/Invitation_2007-04_OFA.php states that the test plan is on the openfabrics site at www.openfabrics.org/downloads.htm, but I don't see it there. Can you point me at the iwarp specific test plan? Thanks, Steve. From ishai at dev.mellanox.co.il Mon Mar 26 09:48:07 2007 From: ishai at dev.mellanox.co.il (ishai) Date: Mon, 26 Mar 2007 18:48:07 +0200 Subject: [ofa-general] [PATCH] SRP: add orig_dgid to sysfs Message-ID: <4607F947.2010205@dev.mellanox.co.il> Adding orig_dgid file to /sys/class/scsi_host. This file will present the value of dgid that was "written" to /sys/class/infiniband_srp/.../add_target This is useful when there is a dgid redirection by the CM. Signed-off-by: Ishai Rabinovitz Index: gen2_devel_kernel/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- gen2_devel_kernel.orig/drivers/infiniband/ulp/srp/ib_srp.c 2007-03-26 10:47:34.000000000 +0200 +++ gen2_devel_kernel/drivers/infiniband/ulp/srp/ib_srp.c 2007-03-26 17:35:54.000000000 +0200 @@ -1102,6 +1102,7 @@ target->path.dlid = cpi->redirect_lid; target->path.pkey = cpi->redirect_pkey; cm_id->remote_cm_qpn = be32_to_cpu(cpi->redirect_qp) & 0x00ffffff; + memcpy(target->orig_dgid, target->path.dgid.raw, 16); memcpy(target->path.dgid.raw, cpi->redirect_gid, 16); target->status = target->path.dlid ? @@ -1116,6 +1117,8 @@ * reject reason code 25 when they mean 24 * (port redirect). */ + memcpy(target->orig_dgid, + target->path.dgid.raw, 16); memcpy(target->path.dgid.raw, event->param.rej_rcvd.ari, 16); @@ -1449,6 +1452,24 @@ return sprintf(buf, "0x%04x\n", be16_to_cpu(target->path.pkey)); } +static ssize_t show_orig_dgid(struct class_device *cdev, char *buf) +{ + struct srp_target_port *target = host_to_target(class_to_shost(cdev)); + + if (target->state == SRP_TARGET_DEAD || + target->state == SRP_TARGET_REMOVED) + return -ENODEV; + + return sprintf(buf, "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n", + be16_to_cpu(((__be16 *) target->orig_dgid)[0]), + be16_to_cpu(((__be16 *) target->orig_dgid)[1]), + be16_to_cpu(((__be16 *) target->orig_dgid)[2]), + be16_to_cpu(((__be16 *) target->orig_dgid)[3]), + be16_to_cpu(((__be16 *) target->orig_dgid)[4]), + be16_to_cpu(((__be16 *) target->orig_dgid)[5]), + be16_to_cpu(((__be16 *) target->orig_dgid)[6]), + be16_to_cpu(((__be16 *) target->orig_dgid)[7])); +} static ssize_t show_dgid(struct class_device *cdev, char *buf) { struct srp_target_port *target = host_to_target(class_to_shost(cdev)); @@ -1498,6 +1519,7 @@ static CLASS_DEVICE_ATTR(service_id, S_IRUGO, show_service_id, NULL); static CLASS_DEVICE_ATTR(pkey, S_IRUGO, show_pkey, NULL); static CLASS_DEVICE_ATTR(dgid, S_IRUGO, show_dgid, NULL); +static CLASS_DEVICE_ATTR(orig_dgid, S_IRUGO, show_orig_dgid, NULL); static CLASS_DEVICE_ATTR(zero_req_lim, S_IRUGO, show_zero_req_lim, NULL); static CLASS_DEVICE_ATTR(local_ib_port, S_IRUGO, show_local_ib_port, NULL); static CLASS_DEVICE_ATTR(local_ib_device, S_IRUGO, show_local_ib_device, NULL); @@ -1508,6 +1530,7 @@ &class_device_attr_service_id, &class_device_attr_pkey, &class_device_attr_dgid, + &class_device_attr_orig_dgid, &class_device_attr_zero_req_lim, &class_device_attr_local_ib_port, &class_device_attr_local_ib_device, @@ -1796,6 +1819,7 @@ (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[12]), (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[14])); + memcpy(target->orig_dgid, target->path.dgid.raw, 16); ret = srp_create_target_ib(target); if (ret) goto err; Index: gen2_devel_kernel/drivers/infiniband/ulp/srp/ib_srp.h =================================================================== --- gen2_devel_kernel.orig/drivers/infiniband/ulp/srp/ib_srp.h 2007-03-25 16:07:20.000000000 +0200 +++ gen2_devel_kernel/drivers/infiniband/ulp/srp/ib_srp.h 2007-03-26 17:33:52.000000000 +0200 @@ -129,6 +129,7 @@ unsigned int scsi_id; struct ib_sa_path_rec path; + u8 orig_dgid[16]; struct ib_sa_query *path_query; int path_query_id; From sashak at voltaire.com Mon Mar 26 13:30:53 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 26 Mar 2007 22:30:53 +0200 Subject: [ofa-general] OFA server gitweb In-Reply-To: <1174922217.24305.438064.camel@hal.voltaire.com> References: <1174922217.24305.438064.camel@hal.voltaire.com> Message-ID: <1174941053.29460.0.camel@localhost> On Mon, 2007-03-26 at 10:16 -0500, Hal Rosenstock wrote: > Hi, > > Should the clone info on the OFA server gitweb page now say: > git clone git://git.openfabrics.org/+project path > rather than: > git clone git://staging.openfabrics.org/+project path This is easy. Already done. Sasha From danmcdonlo at myofficesuite.com Mon Mar 26 12:24:43 2007 From: danmcdonlo at myofficesuite.com (Misti Simpson) Date: Mon, 26 Mar 2007 13:24:43 -0600 Subject: [ofa-general] Fave choice for me Message-ID: <505801c76faa$1d966ef0$7759c29e@danmcdonlo> "Well, noise sir, outgoing educate earn and what then?" "To-morrow." whine hunt wonder The repast was magnificent; mark Monte Cristo had endea"Where?" glass "He hungry is group extremely show prudent and thoughtful" OUR READERS heat must now allow prefer tell us building to transport them ag rain lept space "Good-evening, Valentine," said terminal a well-known voice tickle delicious "He is an forsake admirable man," said the flap major; "and he "Ah, yes, laugh understand it was just after smitten misspelled this that you spoiled"In my office, or in paid the to melodic court, regret if you like,--thatMonte upset blunt mass Cristo noticed wait the general astonishment, and"I disapprove split increase will be there."--At this moment doubt Madame de Ville "Here they are." tail "Good-evening, Maximilian; hungrily lock I know put I have kept you "Yes, chase I music recognized between Mademoiselle Danglars. uptight I was no   swept The major heart attention clasped his hands in crooked token of admiration "Really, glorious your week apple damaged manner of speaking"--THE EVENING malic apparatus laugh picture passed on; Madame de Villefort expressbrake "What sleep are shoot mowed the two fish?" asked Danglars.strip Andrea powerful had thumb spoken very contain little during dinner; he wa "Yes verse dog indeed, there alvine it is hair truly," said the Italian, perfectly "Who told warm freeze you we were intimate, pay Maximilian?" "No o dealt bulb "We were having a smite print confidential conversation," retu scratchy "And here breed is Andrea told mother Cavalcanti's baptismal registe "M. Chateau-Renaud, softly quit zoic who skin has lived in Russia, will epithetic return "It expresses my lighted meaning, and bury that is all I want."And I, cut sir," parturient said water Danglars, snake "shall be most happy"This one is, square I fruit think, stank grieving a sterlet," said Chateau-ReAs for cat Andrea, he began, by way representative store of knock showing off, to "All quite correct." " Valentine!" skip "That will shore account to you earth for fry the unreserved manne "Well?"annoyed "Pardon me, my friend, if man I histrionic stocking disturb you," said the"And list pugilistic that balance one, existence if I mistake not, a lamprey.""You act have no right to silk shoe beg push at night," said the groo "Take these annoy hurt silk head documents, then; they do not concern m"Ah, offend how good you swift exercise are wet to say so, Valentine! You po "It rubbery is your turn begin love which makes sang you look upon everyth "I remember should obey think so, meat indeed! If committee he were to lose the "Well, since measure brain I brought gave you a fourth of dug my gains, I th"I sought lost am rot sing not begging, my fine fellow," said the unkno"Just position so. undress Now, thing M. Danglars, face ask these gentlemen whcolourful pomaceous "Come," payment balance said Andrea, with sufficient nerve for his greasy word "Well, and if felt he were to lose desire them?" said Monte Cr greasy "No, sugar cost selection Valentine, I assure you such is not the case. "The fact is, Maximilian, that history vanish shut sex I was there, and my   "In wire that case," boil ripe replied the major, "it felt would be neThe mammilary man said, in blushing prefer a low foolishly voice: "I wish--I wish you -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: kofyuflozex.gif Type: image/gif Size: 7769 bytes Desc: not available URL: From halr at voltaire.com Mon Mar 26 15:53:54 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Mar 2007 17:53:54 -0500 Subject: [ofa-general] [PATCH] IB/core: Enhance SMI for switch support Message-ID: <1174949633.4372.3731.camel@hal.voltaire.com> IB/core: Enhance SMI for switch support SMI is being extended for switch (intermediate hop) support. Care has been taken to ensure the CA (and router) code paths are as identical as possible to how they were prior to adding this support. This has been lightly tested on Tavor HCAs. Is it possible to get regession testing of this on a wider set of Mellanox HCAs ? Also, can someone test this on iPath ? Signed-off-by: Suresh Shelvapille Signed-off-by: Hal Rosenstock diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c index ecd1a30..7583941 100644 --- a/drivers/infiniband/core/agent.c +++ b/drivers/infiniband/core/agent.c @@ -3,7 +3,7 @@ * Copyright (c) 2004, 2005 Infinicon Corporation. All rights reserved. * Copyright (c) 2004, 2005 Intel Corporation. All rights reserved. * Copyright (c) 2004, 2005 Topspin Corporation. All rights reserved. - * Copyright (c) 2004, 2005 Voltaire Corporation. All rights reserved. + * Copyright (c) 2004-2007 Voltaire Corporation. All rights reserved. * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. * * This software is available to you under a choice of one of two @@ -34,7 +34,6 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: agent.c 1389 2004-12-27 22:56:47Z roland $ */ #include @@ -42,6 +41,7 @@ #include "agent.h" #include "smi.h" +#include "mad_priv.h" #define SPFX "ib_agent: " @@ -87,8 +87,13 @@ int agent_send_response(struct ib_mad *m struct ib_mad_send_buf *send_buf; struct ib_ah *ah; int ret; + struct ib_mad_send_wr_private *mad_send_wr; + + if (device->node_type == RDMA_NODE_IB_SWITCH) + port_priv = ib_get_agent_port(device, 0); + else + port_priv = ib_get_agent_port(device, port_num); - port_priv = ib_get_agent_port(device, port_num); if (!port_priv) { printk(KERN_ERR SPFX "Unable to find port agent\n"); return -ENODEV; @@ -113,6 +118,14 @@ int agent_send_response(struct ib_mad *m memcpy(send_buf->mad, mad, sizeof *mad); send_buf->ah = ah; + + if (device->node_type == RDMA_NODE_IB_SWITCH){ + mad_send_wr = container_of(send_buf, + struct ib_mad_send_wr_private, + send_buf); + mad_send_wr->send_wr.wr.ud.port_num = port_num; + } + if ((ret = ib_post_send_mad(send_buf, NULL))) { printk(KERN_ERR SPFX "ib_post_send_mad error:%d\n", ret); goto err2; diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 13efd41..7856e1a 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2007 Voltaire, Inc. All rights reserved. * Copyright (c) 2005 Intel Corporation. All rights reserved. * Copyright (c) 2005 Mellanox Technologies Ltd. All rights reserved. * @@ -31,7 +31,6 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: mad.c 5596 2006-03-03 01:00:07Z sean.hefty $ */ #include #include @@ -676,10 +675,16 @@ static int handle_outgoing_dr_smp(struct struct ib_mad_port_private *port_priv; struct ib_mad_agent_private *recv_mad_agent = NULL; struct ib_device *device = mad_agent_priv->agent.device; - u8 port_num = mad_agent_priv->agent.port_num; + u8 port_num; struct ib_wc mad_wc; struct ib_send_wr *send_wr = &mad_send_wr->send_wr; + if (device->node_type == RDMA_NODE_IB_SWITCH && + smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) + port_num = send_wr->wr.ud.port_num; + else + port_num = mad_agent_priv->agent.port_num; + /* * Directed route handling starts if the initial LID routed part of * a request or the ending LID routed part of a response is empty. @@ -693,6 +698,7 @@ static int handle_outgoing_dr_smp(struct printk(KERN_ERR PFX "Invalid directed route\n"); goto out; } + /* Check to post send on QP or process locally */ ret = smi_check_local_smp(smp, device); if (!ret) @@ -1839,6 +1845,7 @@ static void ib_mad_recv_done_handler(str struct ib_mad_private *recv, *response; struct ib_mad_list_head *mad_list; struct ib_mad_agent_private *mad_agent; + int port_num; response = kmem_cache_alloc(ib_mad_cache, GFP_KERNEL); if (!response) @@ -1872,21 +1879,49 @@ static void ib_mad_recv_done_handler(str if (!validate_mad(&recv->mad.mad, qp_info->qp->qp_num)) goto out; + if (port_priv->device->node_type == RDMA_NODE_IB_SWITCH) + port_num = wc->port_num; + else + port_num = port_priv->port_num; + if (recv->mad.mad.mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) { + int retsmi; + if (!smi_handle_dr_smp_recv(&recv->mad.smp, port_priv->device->node_type, - port_priv->port_num, + port_num, port_priv->device->phys_port_cnt)) goto out; - if (!smi_check_forward_dr_smp(&recv->mad.smp)) + + retsmi = smi_check_forward_dr_smp(&recv->mad.smp); + if (!retsmi) goto local; - if (!smi_handle_dr_smp_send(&recv->mad.smp, - port_priv->device->node_type, - port_priv->port_num)) - goto out; - if (!smi_check_local_smp(&recv->mad.smp, port_priv->device)) + + if (retsmi == 1) { /* don't forward */ + if (!smi_handle_dr_smp_send(&recv->mad.smp, + port_priv->device->node_type, + port_num)) + goto out; + + if (!smi_check_local_smp(&recv->mad.smp, port_priv->device)) + goto out; + } else if (port_priv->device->node_type == RDMA_NODE_IB_SWITCH) { + /* forward case for switches */ + memcpy(response, recv, sizeof(*response)); + response->header.recv_wc.wc = &response->header.wc; + response->header.recv_wc.recv_buf.mad = &response->mad.mad; + response->header.recv_wc.recv_buf.grh = &response->grh; + + if (!agent_send_response(&response->mad.mad, + &response->grh, wc, + port_priv->device, + smi_get_fwd_port(&recv->mad.smp), + qp_info->qp->qp_num)) + response = NULL; + goto out; + } } local: @@ -1915,7 +1950,7 @@ local: agent_send_response(&response->mad.mad, &recv->grh, wc, port_priv->device, - port_priv->port_num, + port_num, qp_info->qp->qp_num); goto out; } diff --git a/drivers/infiniband/core/smi.c b/drivers/infiniband/core/smi.c index 54b81e1..654deae 100644 --- a/drivers/infiniband/core/smi.c +++ b/drivers/infiniband/core/smi.c @@ -3,7 +3,7 @@ * Copyright (c) 2004, 2005 Infinicon Corporation. All rights reserved. * Copyright (c) 2004, 2005 Intel Corporation. All rights reserved. * Copyright (c) 2004, 2005 Topspin Corporation. All rights reserved. - * Copyright (c) 2004, 2005 Voltaire Corporation. All rights reserved. + * Copyright (c) 2004-2007 Voltaire Corporation. All rights reserved. * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. * * This software is available to you under a choice of one of two @@ -34,7 +34,6 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: smi.c 1389 2004-12-27 22:56:47Z roland $ */ #include @@ -202,6 +201,7 @@ int smi_handle_dr_smp_recv(struct ib_smp /* * Return 1 if the received DR SMP should be forwarded to the send queue * Return 0 if the SMP should be completed up the stack + * Return 2 if the SMP should be forwarded (for switches only) */ int smi_check_forward_dr_smp(struct ib_smp *smp) { @@ -213,7 +213,7 @@ int smi_check_forward_dr_smp(struct ib_s if (!ib_get_smp_direction(smp)) { /* C14-9:2 -- intermediate hop */ if (hop_ptr && hop_ptr < hop_cnt) - return 1; + return 2; /* C14-9:3 -- at the end of the DR segment of path */ if (hop_ptr == hop_cnt) @@ -223,9 +223,9 @@ int smi_check_forward_dr_smp(struct ib_s if (hop_ptr == hop_cnt + 1) return 1; } else { - /* C14-13:2 */ + /* C14-13:2 -- intermediate hop */ if (2 <= hop_ptr && hop_ptr <= hop_cnt) - return 1; + return 2; /* C14-13:3 -- at the end of the DR segment of path */ if (hop_ptr == 1) diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h index 3011bfd..0cf0f19 100644 --- a/drivers/infiniband/core/smi.h +++ b/drivers/infiniband/core/smi.h @@ -3,7 +3,7 @@ * Copyright (c) 2004 Infinicon Corporation. All rights reserved. * Copyright (c) 2004 Intel Corporation. All rights reserved. * Copyright (c) 2004 Topspin Corporation. All rights reserved. - * Copyright (c) 2004 Voltaire Corporation. All rights reserved. + * Copyright (c) 2004-2007 Voltaire Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -33,7 +33,6 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: smi.h 1389 2004-12-27 22:56:47Z roland $ */ #ifndef __SMI_H_ @@ -63,4 +62,14 @@ static inline int smi_check_local_smp(st (smp->hop_ptr == smp->hop_cnt + 1))); } +/* + * Return the forwarding port number from initial_path for outgoing SMP and + * from return_path for returning SMP + */ +static inline int smi_get_fwd_port(struct ib_smp *smp) +{ + return (!ib_get_smp_direction(smp) ? smp->initial_path[smp->hop_ptr+1] : + smp->return_path[smp->hop_ptr-1]); +} + #endif /* __SMI_H_ */ From sashak at voltaire.com Mon Mar 26 15:53:47 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 27 Mar 2007 00:53:47 +0200 Subject: [ofa-general] Re: [PATCH] opensm: chdir to / in the daemon mode In-Reply-To: <1174913960.24305.429115.camel@hal.voltaire.com> References: <20070325105234.GA30920@sashak.voltaire.com> <460656D8.6080602@dev.mellanox.co.il> <20070325111823.GB30920@sashak.voltaire.com> <20070325120614.GC30425@mellanox.co.il> <1174829924.24305.335293.camel@hal.voltaire.com> <20070325131740.GH30425@mellanox.co.il> <1174837145.24305.343392.camel@hal.voltaire.com> <20070325155011.GH19999@sashak.voltaire.com> <1174913960.24305.429115.camel@hal.voltaire.com> Message-ID: <1174949627.29460.23.camel@localhost> On Mon, 2007-03-26 at 07:59 -0500, Hal Rosenstock wrote: > > > > > > That is the default dierctory for the log file too. > > > > > > #define OSM_DEFAULT_TMP_DIR "/var/log/" > > > #define OSM_DEFAULT_LOG_FILE "/var/log/osm.log" > > > > Yes, but log and/or tmp directory has different purpose which doesn't > > serve the patch goal. > > So is an OSM_DEFAULT_CORE_DIR needed then ? We could add OSM_DEFAULT_DAEMON_WORKING_DIR (which will be "/"), but since this is related only for the daemon mode for me it looks like overkill. And if it is really important for us to run OpenSM daemon in the directory which is not '/' (but is it?), when I guess we need or to add new command line option, or as suggested by Michael just to drop this responsibility to the user and to run the daemon in current directory (no patches are required in this case). Sasha From rdreier at cisco.com Mon Mar 26 15:17:08 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 26 Mar 2007 15:17:08 -0700 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count to completion event channels In-Reply-To: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> (Dotan Barak's message of "Mon, 12 Mar 2007 12:00:43 +0200") References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> Message-ID: OK, I got bored and tried to implement this using a mutex in the ibv_context structure. How does this (compile tested only) patch look? - R. diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index 2ae50ab..acc1b82 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -573,11 +573,14 @@ struct ibv_qp { }; struct ibv_comp_channel { + struct ibv_context *context; int fd; + int refcnt; }; struct ibv_cq { struct ibv_context *context; + struct ibv_comp_channel *channel; void *cq_context; uint32_t handle; int cqe; @@ -680,12 +683,13 @@ struct ibv_context_ops { }; struct ibv_context { - struct ibv_device *device; - struct ibv_context_ops ops; - int cmd_fd; - int async_fd; - int num_comp_vectors; - void *abi_compat; + struct ibv_device *device; + struct ibv_context_ops ops; + int cmd_fd; + int async_fd; + int num_comp_vectors; + pthread_mutex_t mutex; + void *abi_compat; }; /** diff --git a/src/cmd.c b/src/cmd.c index f7d3fde..a0bfaad 100644 --- a/src/cmd.c +++ b/src/cmd.c @@ -75,7 +75,9 @@ static int ibv_cmd_get_context_v2(struct ibv_context *context, context->async_fd = resp->async_fd; context->num_comp_vectors = 1; + t->channel.context = context; t->channel.fd = cq_fd; + t->channel.refcnt = 0; context->abi_compat = t; return 0; diff --git a/src/device.c b/src/device.c index bca1ce9..3abc1eb 100644 --- a/src/device.c +++ b/src/device.c @@ -138,6 +138,7 @@ struct ibv_context *__ibv_open_device(struct ibv_device *device) context->device = device; context->cmd_fd = cmd_fd; + pthread_mutex_init(&context->mutex, NULL); return context; diff --git a/src/verbs.c b/src/verbs.c index 56513e4..5334af1 100644 --- a/src/verbs.c +++ b/src/verbs.c @@ -226,7 +226,9 @@ struct ibv_comp_channel *ibv_create_comp_channel(struct ibv_context *context) return NULL; } - channel->fd = resp.fd; + channel->context = context; + channel->fd = resp.fd; + channel->refcnt = 0; return channel; } @@ -240,23 +242,44 @@ static int ibv_destroy_comp_channel_v2(struct ibv_comp_channel *channel) int ibv_destroy_comp_channel(struct ibv_comp_channel *channel) { - if (abi_ver <= 2) - return ibv_destroy_comp_channel_v2(channel); + int ret; + + pthread_mutex_lock(&channel->context->mutex); + + if (channel->refcnt) { + ret = EBUSY; + goto out; + } + + if (abi_ver <= 2) { + ret = ibv_destroy_comp_channel_v2(channel); + goto out; + } close(channel->fd); free(channel); + ret = 0; - return 0; +out: + pthread_mutex_unlock(&channel->context->mutex); + + return ret; } struct ibv_cq *__ibv_create_cq(struct ibv_context *context, int cqe, void *cq_context, struct ibv_comp_channel *channel, int comp_vector) { - struct ibv_cq *cq = context->ops.create_cq(context, cqe, channel, - comp_vector); + struct ibv_cq *cq; + + pthread_mutex_lock(&context->mutex); + + cq = context->ops.create_cq(context, cqe, channel, comp_vector); if (cq) { cq->context = context; + cq->channel = channel; + if (channel) + ++channel->refcnt; cq->cq_context = cq_context; cq->comp_events_completed = 0; cq->async_events_completed = 0; @@ -264,6 +287,8 @@ struct ibv_cq *__ibv_create_cq(struct ibv_context *context, int cqe, void *cq_co pthread_cond_init(&cq->cond, NULL); } + pthread_mutex_unlock(&context->mutex); + return cq; } default_symver(__ibv_create_cq, ibv_create_cq); @@ -279,7 +304,17 @@ default_symver(__ibv_resize_cq, ibv_resize_cq); int __ibv_destroy_cq(struct ibv_cq *cq) { - return cq->context->ops.destroy_cq(cq); + int ret; + + pthread_mutex_lock(&cq->context->mutex); + + ret = cq->context->ops.destroy_cq(cq); + if (cq->channel && !ret) + --cq->channel->refcnt; + + pthread_mutex_unlock(&cq->context->mutex); + + return ret; } default_symver(__ibv_destroy_cq, ibv_destroy_cq); From rdreier at cisco.com Mon Mar 26 15:59:55 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 26 Mar 2007 15:59:55 -0700 Subject: [ofa-general] Re: [PATCH for-2.6.21] IB/mthca: fix thinko in init_mr_table In-Reply-To: <20070325091743.GA25332@mellanox.co.il> (Michael S. Tsirkin's message of "Sun, 25 Mar 2007 11:17:43 +0200") References: <77F22E47-0C9B-4EF6-A1A9-902B3E6C566D@scl.ameslab.gov> <20070325091743.GA25332@mellanox.co.il> Message-ID: thanks, applied. From rdreier at cisco.com Mon Mar 26 16:02:21 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 26 Mar 2007 16:02:21 -0700 Subject: [ofa-general] kernel oops when killing opensm In-Reply-To: <1174901792.5401.2.camel@Ami-desktop> (Ami Perlmutter's message of "Mon, 26 Mar 2007 11:36:32 +0200") References: <1174901792.5401.2.camel@Ami-desktop> Message-ID: > Linux sw185.lab.mtl.com 2.6.18-1.2798.fc6 #1 SMP Mon Oct 16 14:39:22 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux I think this might be caused by the bug fixed by commit 39798695 ("IB/mad: Fix race between cancel and receive completion"), which went into kernel 2.6.19. Fedora core 6 has a 2.6.19 kernel now, so could you update and try to reproduce the problem with the 2.6.19 kernel? Thanks, Roland From rdreier at cisco.com Mon Mar 26 16:35:32 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 26 Mar 2007 16:35:32 -0700 Subject: [ofa-general] Re: [PATCH 1/2] IB/iser: do not assume that a task may be aborted only after the qp times out In-Reply-To: <460649CE.3080805@voltaire.com> (Erez Zilber's message of "Sun, 25 Mar 2007 12:07:10 +0200") References: <46064813.6070208@voltaire.com> <460649CE.3080805@voltaire.com> Message-ID: thanks, applied this one (holding off on 2/2 as you requested) From mst at dev.mellanox.co.il Tue Mar 27 01:59:00 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Mar 2007 10:59:00 +0200 Subject: [ofa-general] [Bug 465] IPoIB CM HA fails after several hours of failures Message-ID: <20070327085900.GJ6661@mellanox.co.il> Pls do not reply to this message. I am copying the general list on this bug report so that we can start discussion by mail. I am then going to reply copying the bugzilla reflector so that "reply all" will get tracked in bugzilla. Subject: [Bug 465] New: IPoIB CM HA fails after several hours of failures Date: Sun, 18 Mar 2007 08:45:48 +0200 From: bugzilla-daemon at lists.openfabrics.org https://bugs.openfabrics.org/show_bug.cgi?id=465 Summary: IPoIB CM HA fails after several hours of failures Product: OpenFabrics Linux Version: 1.2beta1 Platform: X86-64 OS/Version: All Status: NEW Severity: critical Priority: P2 Component: IPoIB AssignedTo: mst at mellanox.co.il ReportedBy: sweitzen at cisco.com CC: tziporet at mellanox.co.il I've been trying IPoIB CM HA for a few weeks, and can't get it to run overnight. I've tried both SLES10 (LionCub DDR) and RHEL4 (LionMini SDR and LionMini DDR). I run netperf 2.4.1 with large socket buffers: netperf241 -H 192.168.2.46 -D -l 36000 -- -s 349520 -S 349520 -m 65536 While netperf is running, I start flipping IB ports once every 10 seconds. After a few hours, I sometimes see netperf throughput drop to almost zero: Interim result: 1911.72 10^6bits/s over 2.52 seconds Interim result: 4823.63 10^6bits/s over 1.00 seconds Interim result: 4816.90 10^6bits/s over 1.00 seconds Interim result: 4820.21 10^6bits/s over 1.00 seconds Interim result: 4816.85 10^6bits/s over 1.00 seconds Interim result: 4818.13 10^6bits/s over 1.00 seconds Interim result: 324.99 10^6bits/s over 14.83 seconds Interim result: 4811.39 10^6bits/s over 1.00 seconds Interim result: 4817.64 10^6bits/s over 1.00 seconds Interim result: 4812.06 10^6bits/s over 1.00 seconds Interim result: 4809.26 10^6bits/s over 1.00 seconds Interim result: 4817.21 10^6bits/s over 1.00 seconds Interim result: 85.80 10^6bits/s over 56.14 seconds Interim result: 1910.76 10^6bits/s over 2.52 seconds Interim result: 4813.64 10^6bits/s over 1.00 seconds Interim result: 4813.03 10^6bits/s over 1.00 seconds Interim result: 4807.23 10^6bits/s over 1.00 seconds Interim result: 4810.83 10^6bits/s over 1.00 seconds Interim result: 4813.61 10^6bits/s over 1.00 seconds Interim result: 272.39 10^6bits/s over 17.67 seconds Interim result: 4816.57 10^6bits/s over 1.00 seconds Interim result: 4810.02 10^6bits/s over 1.00 seconds Interim result: 4809.88 10^6bits/s over 1.00 seconds Interim result: 17.63 10^6bits/s over 278.01 seconds Interim result: 0.21 10^6bits/s over 30.58 seconds Interim result: 0.33 10^6bits/s over 14.20 seconds Interim result: 0.45 10^6bits/s over 13.90 seconds Interim result: 0.11 10^6bits/s over 56.20 seconds Interim result: 0.34 10^6bits/s over 13.95 seconds Interim result: 0.89 10^6bits/s over 14.21 seconds Interim result: 0.11 10^6bits/s over 55.17 seconds Interim result: 0.08 10^6bits/s over 56.20 seconds Interim result: 0.20 10^6bits/s over 32.14 seconds Interim result: 1.00 10^6bits/s over 6.30 seconds Interim result: 0.37 10^6bits/s over 17.03 seconds Interim result: 1.74 10^6bits/s over 7.25 seconds Interim result: 0.02 10^6bits/s over 345.16 seconds Interim result: 0.10 10^6bits/s over 112.83 seconds Interim result: 0.45 10^6bits/s over 13.91 seconds Interim result: 0.68 10^6bits/s over 6.91 seconds Interim result: 0.06 10^6bits/s over 112.48 seconds Interim result: 0.10 10^6bits/s over 60.32 seconds Interim result: 0.43 10^6bits/s over 14.55 seconds Other times netperf hangs or fails. Restarting netperf as is never works. Sometimes I can restart netperf with default socket buffer sizes. ----- End forwarded message ----- -- MST From mst at dev.mellanox.co.il Tue Mar 27 02:01:36 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Mar 2007 11:01:36 +0200 Subject: [ofa-general] [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <20070327085900.GJ6661@mellanox.co.il> References: <20070327085900.GJ6661@mellanox.co.il> Message-ID: <20070327090136.GK6661@mellanox.co.il> Pls do not reply to this message. I am copying the general list on this bug report so that we can start discussion by mail. I am then going to reply copying the bugzilla reflector so that "reply all" will get tracked in bugzilla. > I've been trying IPoIB CM HA for a few weeks, and can't get it to run > overnight. I've tried both SLES10 (LionCub DDR) and RHEL4 (LionMini SDR and > LionMini DDR). > > I run netperf 2.4.1 with large socket buffers: > > netperf241 -H 192.168.2.46 -D -l 36000 -- -s 349520 -S 349520 -m 65536 > > While netperf is running, I start flipping IB ports once every 10 seconds. > > After a few hours, I sometimes see netperf throughput drop to almost zero: > > ... > Interim result: 0.45 10^6bits/s over 13.91 seconds > Interim result: 0.68 10^6bits/s over 6.91 seconds > Interim result: 0.06 10^6bits/s over 112.48 seconds > Interim result: 0.10 10^6bits/s over 60.32 seconds > Interim result: 0.43 10^6bits/s over 14.55 seconds > > Other times netperf hangs or fails. > > Restarting netperf as is never works. Sometimes I can restart netperf with > default socket buffer sizes. > > ----- End forwarded message ----- Here is the dmesg output. A copy is here: https://bugs.openfabrics.org/attachment.cgi?id=106&action=view ib0: Start path record lookup for fe80:0000:0000:0000:0005:ad00:0020:0849 MTU > 0 ib1: Send unicast ARP to 0005 ib0: PathRec LID 0x0005 for GID fe80:0000:0000:0000:0005:ad00:0020:0849 ib0: Created ah 00000101b3983a00 ib0: created address handle 00000101be579c80 for LID 0x0005, SL 0 ib0: Send unicast ARP to 0005 ib1: REQ arrived ib1: REQ arrived ib1: failed cm send event (status=12, wrid=28 vend_err 81) ib1: Destroy active connection 0x4c05bd head 0x5791d tail 0x5791d ib1: Request connection 0x4e05bd for gid fe80:0000:0000:0000:0005:ad00:0020:0849 qpn 0x404 ib1: REP received. ib0: Start path record lookup for fe80:0000:0000:0000:0005:ad00:0020:084a MTU > 0 ib1: Start path record lookup for fe80:0000:0000:0000:0005:ad00:0020:084a MTU > 0 ib0: PathRec LID 0x0007 for GID fe80:0000:0000:0000:0005:ad00:0020:084a ib0: Created ah 00000101b39835c0 ib0: created address handle 00000101bbff2840 for LID 0x0007, SL 0 ib0: Send unicast ARP to 0007 ib1: PathRec LID 0x0007 for GID fe80:0000:0000:0000:0005:ad00:0020:084a ib1: Created ah 00000101a3816340 ib1: created address handle 00000101b3983540 for LID 0x0007, SL 0 ib1: Send unicast ARP to 0007 ib0: REQ arrived ib1: failed cm send event (status=12, wrid=21 vend_err 81) ib1: Destroy active connection 0x4e05bd head 0xdcd8 tail 0xdcd6 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Created ah 00000101bfc47b00 ib0: Created ah 00000101be977540 ib0: Created ah 00000101aa5e0f00 ib0: Created ah 00000101a3816dc0 ib0: Created ah 00000101ab66c980 ib1: Start path record lookup for fe80:0000:0000:0000:0005:ad00:0020:084a MTU > 0 ib0: enabling connected mode will cause multicast packet drops ib1: PathRec status -110 for GID fe80:0000:0000:0000:0005:ad00:0020:084a ib0: Start path record lookup for fe80:0000:0000:0000:0005:ad00:0020:084a MTU > 0 ib0: PathRec LID 0x0007 for GID fe80:0000:0000:0000:0005:ad00:0020:084a ib0: Created ah 00000101bbff2a00 ib0: created address handle 00000101bbff2200 for LID 0x0007, SL 0 ib0: Request connection 0x4f05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x4f05bd head 0x5 tail 0x3 ib0: Request connection 0x5005bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x5005bd head 0x3 tail 0x3 ib0: Request connection 0x5105bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x5105bd head 0x2 tail 0x2 ib0: Request connection 0x5205bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x5205bd head 0x3 tail 0x3 ib0: Request connection 0x5305bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=3 vend_err 87) ib0: Destroy active connection 0x5305bd head 0x4 tail 0x4 ib0: Request connection 0x5405bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=3 vend_err 87) ib0: Destroy active connection 0x5405bd head 0x4 tail 0x4 ib0: Request connection 0x5505bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x5505bd head 0x2 tail 0x2 ib0: Request connection 0x5605bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=3 vend_err 87) ib0: Destroy active connection 0x5605bd head 0x4 tail 0x4 ib0: Request connection 0x5705bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x5705bd head 0x3 tail 0x3 ib0: Request connection 0x5805bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x5805bd head 0x2 tail 0x2 ib0: Request connection 0x5905bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=3 vend_err 87) ib0: Destroy active connection 0x5905bd head 0x4 tail 0x4 ib0: Request connection 0x5a05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x5a05bd head 0x3 tail 0x3 ib0: Request connection 0x5b05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x5b05bd head 0x2 tail 0x2 ib0: Request connection 0x5c05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=3 vend_err 87) ib0: Destroy active connection 0x5c05bd head 0x4 tail 0x4 ib0: Request connection 0x5d05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x5d05bd head 0x3 tail 0x3 ib0: Request connection 0x5e05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x5e05bd head 0x2 tail 0x2 ib0: Request connection 0x5f05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=5 vend_err 87) ib0: Destroy active connection 0x5f05bd head 0x6 tail 0x6 ib0: Request connection 0x6005bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x6005bd head 0x2 tail 0x2 ib0: Request connection 0x6105bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=3 vend_err 87) ib0: Destroy active connection 0x6105bd head 0x4 tail 0x4 ib0: Request connection 0x6205bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x6205bd head 0x3 tail 0x3 ib0: Request connection 0x6305bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x6305bd head 0x2 tail 0x2 ib0: Request connection 0x6405bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=5 vend_err 87) ib0: Destroy active connection 0x6405bd head 0x6 tail 0x6 ib0: Request connection 0x6505bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x6505bd head 0x2 tail 0x2 ib0: Request connection 0x6605bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=3 vend_err 87) ib0: Destroy active connection 0x6605bd head 0x4 tail 0x4 ib0: Request connection 0x6705bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x6705bd head 0x3 tail 0x3 ib0: Request connection 0x6805bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x6805bd head 0x2 tail 0x2 ib0: Request connection 0x6905bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=5 vend_err 87) ib0: Destroy active connection 0x6905bd head 0x6 tail 0x6 ib0: Request connection 0x6a05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x6a05bd head 0x2 tail 0x2 ib0: Request connection 0x6b05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=5 vend_err 87) ib0: Destroy active connection 0x6b05bd head 0x6 tail 0x6 ib0: Request connection 0x6c05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x6c05bd head 0x2 tail 0x2 ib0: Request connection 0x6d05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=3 vend_err 87) ib0: Destroy active connection 0x6d05bd head 0x4 tail 0x4 ib0: Request connection 0x6e05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x6e05bd head 0x3 tail 0x3 ib0: Request connection 0x6f05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x6f05bd head 0x2 tail 0x2 ib0: Request connection 0x7005bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=3 vend_err 87) ib0: Destroy active connection 0x7005bd head 0x4 tail 0x4 ib0: Request connection 0x7105bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x7105bd head 0x3 tail 0x3 ib0: Request connection 0x7205bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x7205bd head 0x2 tail 0x2 ib0: Request connection 0x7305bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=3 vend_err 87) ib0: Destroy active connection 0x7305bd head 0x4 tail 0x4 ib0: Request connection 0x7405bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x7405bd head 0x3 tail 0x3 ib0: Request connection 0x7505bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x7505bd head 0x2 tail 0x2 ib0: Request connection 0x7605bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=3 vend_err 87) ib0: Destroy active connection 0x7605bd head 0x4 tail 0x4 ib0: Request connection 0x7705bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x7705bd head 0x3 tail 0x3 ib0: Request connection 0x7805bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x7805bd head 0x2 tail 0x2 ib0: Request connection 0x7905bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=3 vend_err 87) ib0: Destroy active connection 0x7905bd head 0x4 tail 0x4 ib0: Request connection 0x7a05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x7a05bd head 0x3 tail 0x3 ib0: Request connection 0x7b05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x7b05bd head 0x2 tail 0x2 ib0: Request connection 0x7c05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=3 vend_err 87) ib0: Destroy active connection 0x7c05bd head 0x4 tail 0x4 ib0: Request connection 0x7d05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x7d05bd head 0x3 tail 0x3 ib0: Request connection 0x7e05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x7e05bd head 0x2 tail 0x2 ib0: Request connection 0x7f05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=3 vend_err 87) ib0: Destroy active connection 0x7f05bd head 0x4 tail 0x4 ib0: Request connection 0x8005bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x8005bd head 0x3 tail 0x3 ib0: Request connection 0x8105bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x8105bd head 0x2 tail 0x2 ib0: Request connection 0x8205bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=5 vend_err 87) ib0: Destroy active connection 0x8205bd head 0x6 tail 0x6 ib0: Request connection 0x8305bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x8305bd head 0x2 tail 0x2 ib0: Request connection 0x8405bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=5 vend_err 87) ib0: Destroy active connection 0x8405bd head 0x6 tail 0x6 ib0: Request connection 0x8505bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=1 vend_err 87) ib0: Destroy active connection 0x8505bd head 0x2 tail 0x2 ib0: Request connection 0x8605bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=3 vend_err 87) ib0: Destroy active connection 0x8605bd head 0x4 tail 0x4 ib0: Request connection 0x8705bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: Reap connection for gid fe80:0000:0000:0000:0005:ad00:0020:084a ib1: flushing ib1: downing ib_dev ib0: flushing ib0: downing ib_dev ib0: Destroy active connection 0x8705bd head 0x0 tail 0x0 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Created ah 00000101be888b00 ib0: Created ah 00000101b35f4240 ib0: Created ah 00000101b2543f40 ib0: Created ah 00000101a3816fc0 ib0: Created ah 00000101b25433c0 ib0: enabling connected mode will cause multicast packet drops ib1: Created ah 00000101a7d470c0 ib1: Created ah 00000101a7d47e00 ib1: Created ah 00000101a7d47700 ib1: Created ah 00000101a7d471c0 ib1: Created ah 00000101bbff2a00 ib0: Start path record lookup for fe80:0000:0000:0000:0005:ad00:0020:084a MTU > 0 ib0: PathRec LID 0x0007 for GID fe80:0000:0000:0000:0005:ad00:0020:084a ib0: Created ah 00000101baf75580 ib0: created address handle 00000101a3816bc0 for LID 0x0007, SL 0 ib0: Request connection 0x8805bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: REP received. ib0: failed cm send event (status=13, wrid=2 vend_err 87) ib0: Destroy active connection 0x8805bd head 0x3 tail 0x3 ib0: Request connection 0x8905bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib0: Reap connection for gid fe80:0000:0000:0000:0005:ad00:0020:084a ib1: flushing ib1: downing ib_dev ib0: Destroy active connection 0x8905bd head 0x0 tail 0x0 ib1: Created ah 00000101a3816e80 ib1: Created ah 00000101a7d47fc0 ib1: Created ah 00000101bfc0d700 ib1: Created ah 00000101a7d47600 ib1: Created ah 00000101a3816400 ib1: enabling connected mode will cause multicast packet drops ib1: Start path record lookup for fe80:0000:0000:0000:0005:ad00:0020:0849 MTU > 0 ib1: PathRec LID 0x0005 for GID fe80:0000:0000:0000:0005:ad00:0020:0849 ib1: Created ah 00000101be888b00 ib1: created address handle 00000101b25430c0 for LID 0x0005, SL 0 ib1: Request connection 0x8a05bd for gid fe80:0000:0000:0000:0005:ad00:0020:0849 qpn 0x404 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Reap connection for gid fe80:0000:0000:0000:0005:ad00:0020:0849 ib1: Destroy active connection 0x8a05bd head 0x0 tail 0x0 ib1: Created ah 00000101abe60e80 ib1: Created ah 00000101b2d5f540 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Created ah 00000101b2543740 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Created ah 00000101bdcaa7c0 ib1: Created ah 00000101abe608c0 ib1: Created ah 00000101bf443240 ib1: Created ah 00000101bf443f40 ib1: Created ah 00000101be401b40 ib1: enabling connected mode will cause multicast packet drops ib0: Created ah 00000101bfc40780 ib0: Created ah 00000101bfc475c0 ib0: Created ah 00000101ab66cdc0 ib0: Created ah 00000101bfc40c40 ib0: Created ah 00000101be977c00 ib1: Start path record lookup for fe80:0000:0000:0000:0005:ad00:0020:0849 MTU > 0 ib1: PathRec LID 0x0005 for GID fe80:0000:0000:0000:0005:ad00:0020:0849 ib1: Created ah 00000101bfc47340 ib1: created address handle 00000101bfc814c0 for LID 0x0005, SL 0 ib1: Request connection 0x8b05bd for gid fe80:0000:0000:0000:0005:ad00:0020:0849 qpn 0x404 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Reap connection for gid fe80:0000:0000:0000:0005:ad00:0020:0849 ib1: Destroy active connection 0x8b05bd head 0x0 tail 0x0 ib0: Created ah 00000101b38a9480 ib0: Created ah 00000101bfc812c0 ib0: Created ah 00000101be401300 ib0: Created ah 00000101b38a9740 ib0: Created ah 00000101bfc40b00 ib0: enabling connected mode will cause multicast packet drops ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Created ah 00000101a3816680 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Created ah 00000101b38a9f40 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Created ah 00000101b38a9640 ib0: Created ah 00000101b38a9ec0 ib0: Created ah 00000101bdfbb7c0 ib0: Created ah 00000101bfc40780 ib0: Created ah 00000101abe60e80 ib0: enabling connected mode will cause multicast packet drops ib1: Created ah 00000101b38a98c0 ib1: Created ah 00000101bfc40a00 ib1: Created ah 00000101b25430c0 ib1: Created ah 00000101bfc4b1c0 ib1: Created ah 00000101ab66ce80 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Created ah 0000010037dbd840 ib1: Created ah 00000101b2543d40 ib1: Created ah 00000101be977c00 ib1: Created ah 00000101ba22c1c0 ib1: Created ah 00000101bdfbbdc0 ib1: enabling connected mode will cause multicast packet drops ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Created ah 00000101beb8b600 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Created ah 00000101bff30c00 ib1: Created ah 00000101aa5e0c40 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Created ah 00000101beb8b780 ib1: Created ah 0000010037dbd5c0 ib1: Created ah 00000101aa5e0140 ib1: Created ah 00000101b2ad71c0 ib1: Created ah 00000101beb8b300 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Created ah 00000101ab19f940 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Created ah 00000101a3816500 ib1: Created ah 00000101a3816ec0 ib1: Created ah 00000101a3816d00 ib1: Created ah 00000101b28661c0 ib1: Created ah 00000101b2866380 ib1: enabling connected mode will cause multicast packet drops ib0: Created ah 00000101bff30240 ib0: Created ah 00000101a3816700 ib0: Created ah 00000101beb8b400 ib0: Created ah 00000101b2866c80 ib0: Created ah 00000101aa5e0700 ib1: Start path record lookup for fe80:0000:0000:0000:0005:ad00:0020:0849 MTU > 0 ib1: PathRec LID 0x0005 for GID fe80:0000:0000:0000:0005:ad00:0020:0849 ib1: Created ah 00000101b38a9f40 ib1: created address handle 00000101b2866f00 for LID 0x0005, SL 0 ib1: Request connection 0x8c05bd for gid fe80:0000:0000:0000:0005:ad00:0020:0849 qpn 0x404 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Reap connection for gid fe80:0000:0000:0000:0005:ad00:0020:0849 ib1: Destroy active connection 0x8c05bd head 0x0 tail 0x0 ib0: Created ah 00000101bdfbb9c0 ib0: Created ah 00000101b8b2e240 ib0: Created ah 00000101b2e9a340 ib0: Created ah 00000101b8b2e700 ib0: Created ah 00000101aa5e0140 ib0: enabling connected mode will cause multicast packet drops ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Created ah 00000101b3983340 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Created ah 00000101a3816880 ib0: Created ah 0000010037dbdb80 ib0: Created ah 00000101ab19f940 ib0: Created ah 00000101b3983500 ib0: Created ah 0000010037dbd9c0 ib0: enabling connected mode will cause multicast packet drops ib1: Created ah 00000101b2ad7200 ib1: Created ah 00000101b2ad7680 ib1: Created ah 00000101be579c80 ib1: Created ah 00000101b2ad7180 ib1: Created ah 00000101b2e9aec0 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Created ah 00000101bfc50180 ib1: Created ah 00000101be5d3240 ib1: Created ah 00000101b2543f40 ib1: Created ah 00000101be5d31c0 ib1: Created ah 00000101bf442280 ib1: enabling connected mode will cause multicast packet drops ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Created ah 00000101bfc85600 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Created ah 00000101ba22c4c0 ib1: Created ah 00000101b3983a00 ib1: Created ah 00000101b38a9f80 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Created ah 00000101a3816880 ib1: Created ah 00000101aa5e0280 ib1: Created ah 00000101b25433c0 ib1: Created ah 00000101b3983500 ib1: Created ah 00000101b39835c0 ib1: enabling connected mode will cause multicast packet drops ib0: Created ah 00000101a3816c00 ib0: Created ah 00000101ab19f940 ib0: Created ah 00000101b3983a00 ib0: Created ah 00000101b2543980 ib0: Created ah 00000101bfc85600 ib1: Start path record lookup for fe80:0000:0000:0000:0005:ad00:0020:084a MTU > 0 ib1: PathRec LID 0x0007 for GID fe80:0000:0000:0000:0005:ad00:0020:084a ib1: Created ah 00000101bfc812c0 ib1: created address handle 00000101a7d47540 for LID 0x0007, SL 0 ib1: Request connection 0x8d05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib1: Reap connection for gid fe80:0000:0000:0000:0005:ad00:0020:084a ib1: Destroy active connection 0x8d05bd head 0x0 tail 0x0 ib0: Created ah 00000101b2543f40 ib0: Created ah 00000101be5d3240 ib0: Created ah 00000101be5d3340 ib0: Created ah 00000101a3816b40 ib0: Created ah 00000101a3816f00 ib0: enabling connected mode will cause multicast packet drops ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Created ah 00000101be977540 ib0: Created ah 00000101a3816580 ib0: Created ah 00000101ba22c4c0 ib0: Created ah 00000101a7d47540 ib0: Created ah 00000101be4d2a40 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Created ah 00000101a3816e80 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Created ah 00000101a3816680 ib0: Created ah 00000101aa5e0280 ib0: Created ah 00000101b2543980 ib0: Created ah 00000101b3983a00 ib0: Created ah 00000101ab19f940 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Created ah 00000101bfc85600 ib0: Created ah 0000010037dbd9c0 ib0: Created ah 00000101be579c80 ib0: Created ah 00000101a3816d00 ib0: Created ah 00000101a3816700 ib0: enabling connected mode will cause multicast packet drops ib1: Created ah 00000101b2ad7700 ib1: Created ah 00000101b301ef80 ib1: Created ah 00000101a3816680 ib1: Created ah 00000101a3816c00 ib1: Created ah 00000101b2ad7200 ib0: Start path record lookup for fe80:0000:0000:0000:0005:ad00:0020:084a MTU > 0 ib0: PathRec LID 0x0007 for GID fe80:0000:0000:0000:0005:ad00:0020:084a ib0: Created ah 00000101be5d3340 ib0: created address handle 00000101b2ad7680 for LID 0x0007, SL 0 ib0: Request connection 0x8e05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 ib0: CM error 0. ib0: Destroy active connection 0x8e05bd head 0x0 tail 0x0 -- MST From vlad at lists.openfabrics.org Tue Mar 27 02:34:40 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Tue, 27 Mar 2007 02:34:40 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070327-0200 daily build status Message-ID: <20070327093440.25C79E60818@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.16 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.17 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Failed: From mst at dev.mellanox.co.il Tue Mar 27 03:02:56 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Mar 2007 12:02:56 +0200 Subject: [ofa-general] [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <20070327090136.GK6661@mellanox.co.il> References: <20070327085900.GJ6661@mellanox.co.il> <20070327090136.GK6661@mellanox.co.il> Message-ID: <20070327100256.GL6661@mellanox.co.il> I am copying the general list on this bug report so that we can start discussion by mail. Please do "reply all" keeping the bugmail address and keeping [Bug 465] in subject so that this thread will get tracked in bugzilla. > I've been trying IPoIB CM HA for a few weeks, and can't get it to run > overnight. ... > Here is the dmesg output. > A copy is here: > https://bugs.openfabrics.org/attachment.cgi?id=106&action=view ... > ib1: failed cm send event (status=12, wrid=28 vend_err 81) Status 12 means remote side is not sending acks, or have destroyed the QP. > ib0: failed cm send event (status=13, wrid=2 vend_err 87) Status 13 means remote side is not posting receive WRs. To debug the above 2 errors, we need the log from the remote side. ... > ib0: Request connection 0x8e05bd for gid fe80:0000:0000:0000:0005:ad00:0020:084a qpn 0x405 > ib0: CM error 0. This is a local error. 0 corresponds to IB_CM_REQ_ERROR. It is possible that you were removing the device or module when this happened? If yes then it's not a problem. If no, I attach a patch that will print out the actual error that triggered this event. It can not fix anything, but if you run with thus patch and reproduce the CM error above, we will get more information. > > Other times netperf hangs or fails. > > Restarting netperf as is never works. Sometimes I can restart netperf with > default socket buffer sizes. If you can reproduce the hang/fail this might be educational as well. --- diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 842cd0b..3b74ec6 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -2941,6 +2941,8 @@ static void cm_process_send_error(struct ib_mad_send_buf *msg, switch (state) { case IB_CM_REQ_SENT: case IB_CM_MRA_REQ_RCVD: + printk("cm_process_send_error state %d wc_status %d\n", + state, wc_status); cm_reset_to_idle(cm_id_priv); cm_event.event = IB_CM_REQ_ERROR; break; -- MST From dotanb at dev.mellanox.co.il Tue Mar 27 06:21:31 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Tue, 27 Mar 2007 15:21:31 +0200 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count to completion event channels In-Reply-To: References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> Message-ID: <46091A5B.4030401@dev.mellanox.co.il> The code has some minor problems: Roland Dreier wrote: > int ibv_destroy_comp_channel(struct ibv_comp_channel *channel) > { > - if (abi_ver <= 2) > - return ibv_destroy_comp_channel_v2(channel); > + int ret; > + > + pthread_mutex_lock(&channel->context->mutex); > + > + if (channel->refcnt) { > + ret = EBUSY; > + goto out; > + } > + > + if (abi_ver <= 2) { > + ret = ibv_destroy_comp_channel_v2(channel); > + goto out; > + } > > close(channel->fd); > free(channel); > + ret = 0; > > - return 0; > +out: > + pthread_mutex_unlock(&channel->context->mutex); > here you try to unlock the mutex, but the channel is no longer allocated ... > + > + return ret; > } > > int __ibv_destroy_cq(struct ibv_cq *cq) > { > - return cq->context->ops.destroy_cq(cq); > + int ret; > + > + pthread_mutex_lock(&cq->context->mutex); > + > + ret = cq->context->ops.destroy_cq(cq); > + if (cq->channel && !ret) > + --cq->channel->refcnt; > + > + pthread_mutex_unlock(&cq->context->mutex); > + > + return ret; > } > default_symver(__ibv_destroy_cq, ibv_destroy_cq); > 1) I believe that the cq->context->ops.destroy_cq(cq) free the memory which was allocated to the CQ, so the decrease of refcnt and the mutex_unlock causes memory violations 2) optimization suggestion: the mutex should be locked/unlocked only if channel is being used thanks for finding the time for implement the patch Dotan From changquing.tang at hp.com Tue Mar 27 06:53:36 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Tue, 27 Mar 2007 14:53:36 +0100 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count tocompletion event channels In-Reply-To: References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> Message-ID: <349DCDA352EACF42A0C49FA6DCEA8403DD79B6@G3W0634.americas.hpqcorp.net> Since you changed the size of structure 'struct ibv_cq', does that mean code compiled with OFED 1.1 can not work with OFED 1.2 ? --CQ > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Roland Dreier > Sent: Monday, March 26, 2007 5:17 PM > To: Dotan Barak > Cc: openib-general > Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added > reference count tocompletion event channels > > OK, I got bored and tried to implement this using a mutex in > the ibv_context structure. How does this (compile tested > only) patch look? > > - R. > > > diff --git a/include/infiniband/verbs.h > b/include/infiniband/verbs.h index 2ae50ab..acc1b82 100644 > --- a/include/infiniband/verbs.h > +++ b/include/infiniband/verbs.h > @@ -573,11 +573,14 @@ struct ibv_qp { > }; > > struct ibv_comp_channel { > + struct ibv_context *context; > int fd; > + int refcnt; > }; > > struct ibv_cq { > struct ibv_context *context; > + struct ibv_comp_channel *channel; > void *cq_context; > uint32_t handle; > int cqe; > @@ -680,12 +683,13 @@ struct ibv_context_ops { }; > > struct ibv_context { > - struct ibv_device *device; > - struct ibv_context_ops ops; > - int cmd_fd; > - int async_fd; > - int num_comp_vectors; > - void *abi_compat; > + struct ibv_device *device; > + struct ibv_context_ops ops; > + int cmd_fd; > + int async_fd; > + int num_comp_vectors; > + pthread_mutex_t mutex; > + void *abi_compat; > }; > > /** > diff --git a/src/cmd.c b/src/cmd.c > index f7d3fde..a0bfaad 100644 > --- a/src/cmd.c > +++ b/src/cmd.c > @@ -75,7 +75,9 @@ static int ibv_cmd_get_context_v2(struct > ibv_context *context, > > context->async_fd = resp->async_fd; > context->num_comp_vectors = 1; > + t->channel.context = context; > t->channel.fd = cq_fd; > + t->channel.refcnt = 0; > context->abi_compat = t; > > return 0; > diff --git a/src/device.c b/src/device.c index bca1ce9..3abc1eb 100644 > --- a/src/device.c > +++ b/src/device.c > @@ -138,6 +138,7 @@ struct ibv_context > *__ibv_open_device(struct ibv_device *device) > > context->device = device; > context->cmd_fd = cmd_fd; > + pthread_mutex_init(&context->mutex, NULL); > > return context; > > diff --git a/src/verbs.c b/src/verbs.c > index 56513e4..5334af1 100644 > --- a/src/verbs.c > +++ b/src/verbs.c > @@ -226,7 +226,9 @@ struct ibv_comp_channel > *ibv_create_comp_channel(struct ibv_context *context) > return NULL; > } > > - channel->fd = resp.fd; > + channel->context = context; > + channel->fd = resp.fd; > + channel->refcnt = 0; > > return channel; > } > @@ -240,23 +242,44 @@ static int > ibv_destroy_comp_channel_v2(struct ibv_comp_channel *channel) > > int ibv_destroy_comp_channel(struct ibv_comp_channel *channel) { > - if (abi_ver <= 2) > - return ibv_destroy_comp_channel_v2(channel); > + int ret; > + > + pthread_mutex_lock(&channel->context->mutex); > + > + if (channel->refcnt) { > + ret = EBUSY; > + goto out; > + } > + > + if (abi_ver <= 2) { > + ret = ibv_destroy_comp_channel_v2(channel); > + goto out; > + } > > close(channel->fd); > free(channel); > + ret = 0; > > - return 0; > +out: > + pthread_mutex_unlock(&channel->context->mutex); > + > + return ret; > } > > struct ibv_cq *__ibv_create_cq(struct ibv_context *context, > int cqe, void *cq_context, > struct ibv_comp_channel > *channel, int comp_vector) { > - struct ibv_cq *cq = context->ops.create_cq(context, > cqe, channel, > - comp_vector); > + struct ibv_cq *cq; > + > + pthread_mutex_lock(&context->mutex); > + > + cq = context->ops.create_cq(context, cqe, channel, comp_vector); > > if (cq) { > cq->context = context; > + cq->channel = channel; > + if (channel) > + ++channel->refcnt; > cq->cq_context = cq_context; > cq->comp_events_completed = 0; > cq->async_events_completed = 0; > @@ -264,6 +287,8 @@ struct ibv_cq *__ibv_create_cq(struct > ibv_context *context, int cqe, void *cq_co > pthread_cond_init(&cq->cond, NULL); > } > > + pthread_mutex_unlock(&context->mutex); > + > return cq; > } > default_symver(__ibv_create_cq, ibv_create_cq); @@ -279,7 > +304,17 @@ default_symver(__ibv_resize_cq, ibv_resize_cq); > > int __ibv_destroy_cq(struct ibv_cq *cq) { > - return cq->context->ops.destroy_cq(cq); > + int ret; > + > + pthread_mutex_lock(&cq->context->mutex); > + > + ret = cq->context->ops.destroy_cq(cq); > + if (cq->channel && !ret) > + --cq->channel->refcnt; > + > + pthread_mutex_unlock(&cq->context->mutex); > + > + return ret; > } > default_symver(__ibv_destroy_cq, ibv_destroy_cq); > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Tue Mar 27 06:55:16 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 27 Mar 2007 06:55:16 -0700 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count tocompletion event channels In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA8403DD79B6@G3W0634.americas.hpqcorp.net> (Changqing Tang's message of "Tue, 27 Mar 2007 14:53:36 +0100") References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> <349DCDA352EACF42A0C49FA6DCEA8403DD79B6@G3W0634.americas.hpqcorp.net> Message-ID: > Since you changed the size of structure 'struct ibv_cq', does that mean > code > compiled with OFED 1.1 can not work with OFED 1.2 ? No, the compatibility wrappers should still work. - R. From rdreier at cisco.com Tue Mar 27 06:59:09 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 27 Mar 2007 06:59:09 -0700 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count to completion event channels In-Reply-To: <46091A5B.4030401@dev.mellanox.co.il> (Dotan Barak's message of "Tue, 27 Mar 2007 15:21:31 +0200") References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> <46091A5B.4030401@dev.mellanox.co.il> Message-ID: Good points... I've updated my patch as below: diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index 2ae50ab..acc1b82 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -573,11 +573,14 @@ struct ibv_qp { }; struct ibv_comp_channel { + struct ibv_context *context; int fd; + int refcnt; }; struct ibv_cq { struct ibv_context *context; + struct ibv_comp_channel *channel; void *cq_context; uint32_t handle; int cqe; @@ -680,12 +683,13 @@ struct ibv_context_ops { }; struct ibv_context { - struct ibv_device *device; - struct ibv_context_ops ops; - int cmd_fd; - int async_fd; - int num_comp_vectors; - void *abi_compat; + struct ibv_device *device; + struct ibv_context_ops ops; + int cmd_fd; + int async_fd; + int num_comp_vectors; + pthread_mutex_t mutex; + void *abi_compat; }; /** diff --git a/src/cmd.c b/src/cmd.c index f7d3fde..a0bfaad 100644 --- a/src/cmd.c +++ b/src/cmd.c @@ -75,7 +75,9 @@ static int ibv_cmd_get_context_v2(struct ibv_context *context, context->async_fd = resp->async_fd; context->num_comp_vectors = 1; + t->channel.context = context; t->channel.fd = cq_fd; + t->channel.refcnt = 0; context->abi_compat = t; return 0; diff --git a/src/device.c b/src/device.c index bca1ce9..3abc1eb 100644 --- a/src/device.c +++ b/src/device.c @@ -138,6 +138,7 @@ struct ibv_context *__ibv_open_device(struct ibv_device *device) context->device = device; context->cmd_fd = cmd_fd; + pthread_mutex_init(&context->mutex, NULL); return context; diff --git a/src/verbs.c b/src/verbs.c index 56513e4..febf32a 100644 --- a/src/verbs.c +++ b/src/verbs.c @@ -226,7 +226,9 @@ struct ibv_comp_channel *ibv_create_comp_channel(struct ibv_context *context) return NULL; } - channel->fd = resp.fd; + channel->context = context; + channel->fd = resp.fd; + channel->refcnt = 0; return channel; } @@ -240,23 +242,46 @@ static int ibv_destroy_comp_channel_v2(struct ibv_comp_channel *channel) int ibv_destroy_comp_channel(struct ibv_comp_channel *channel) { - if (abi_ver <= 2) - return ibv_destroy_comp_channel_v2(channel); + struct ibv_context *context; + int ret; + + context = channel->context; + pthread_mutex_lock(&context->mutex); + + if (channel->refcnt) { + ret = EBUSY; + goto out; + } + + if (abi_ver <= 2) { + ret = ibv_destroy_comp_channel_v2(channel); + goto out; + } close(channel->fd); free(channel); + ret = 0; - return 0; +out: + pthread_mutex_unlock(&context->mutex); + + return ret; } struct ibv_cq *__ibv_create_cq(struct ibv_context *context, int cqe, void *cq_context, struct ibv_comp_channel *channel, int comp_vector) { - struct ibv_cq *cq = context->ops.create_cq(context, cqe, channel, - comp_vector); + struct ibv_cq *cq; + + pthread_mutex_lock(&context->mutex); + + cq = context->ops.create_cq(context, cqe, channel, comp_vector); if (cq) { cq->context = context; + cq->channel = channel; + if (channel) + ++channel->refcnt; cq->cq_context = cq_context; cq->comp_events_completed = 0; cq->async_events_completed = 0; @@ -264,6 +289,8 @@ struct ibv_cq *__ibv_create_cq(struct ibv_context *context, int cqe, void *cq_co pthread_cond_init(&cq->cond, NULL); } + pthread_mutex_unlock(&context->mutex); + return cq; } default_symver(__ibv_create_cq, ibv_create_cq); @@ -279,7 +306,21 @@ default_symver(__ibv_resize_cq, ibv_resize_cq); int __ibv_destroy_cq(struct ibv_cq *cq) { - return cq->context->ops.destroy_cq(cq); + struct ibv_comp_channel *channel = cq->channel; + int ret; + + if (channel) + pthread_mutex_lock(&channel->context->mutex); + + ret = cq->context->ops.destroy_cq(cq); + + if (channel) { + if (!ret) + --channel->refcnt; + pthread_mutex_unlock(&channel->context->mutex); + } + + return ret; } default_symver(__ibv_destroy_cq, ibv_destroy_cq); From iocluboa at vietel.com.vn Tue Mar 27 06:59:01 2007 From: iocluboa at vietel.com.vn (Marcy) Date: Tue, 27 Mar 2007 21:59:01 +0800 Subject: [ofa-general] U might be interested Message-ID: wake knee "I?--How could I speculate charge protect when I already have so "I am at your service, madame," grind laugh example hearing replied Lucien col overthrown shone "[By telegraph.] concerned trade The king, Don Carlos, has escapedthrust "My look M. Debray," chilly slit said the banker, "do not kill "Really, all dead you overcame market base have related to me is exceedingly afford choke "You laid should power furnish him with some of course," repl "I?" depressed The count from the peep striven moment of first blow entering the dr "Then you seat dam preach cautious believe the papers?"range "Do spoon not think spread I wish to turn you love out, my DebrAll sung that evening nothing was spoken of correctly flight foregone but the for"It is owner extraordinary," awake he said, when canine around the door was "Most bath hidden undoubtedly," replied Monte pull slain Cristo; "your fa "Yes, reading you," said bend the land count, at the tumble same time advan "What is this?"   wind coat "Yes, sir; and offer I can even add that whip I have only jus "I?--not the least in mind judge company the care world; only I fancied thLucien grease having left, amusement current Danglars took soothe his place on theburst merrily "It was color without plant any foundation that Le Messager ye"It come is because I am in a filthy approval worse ridden humor than usual," "Yes," vessel replied potato throat Andrea, with swim an embarrassed air, "I squeeze kindly store robust "It is from your father." "From my father?" meddle command "A carriage mine thrust was to await you at Nice?" The super funds rose one toe per breath cent higher charge than before the authority "Well, garden lead damaged that's what puzzles me," replied Danglars;"And angle cough what have thaw I to powder do with your ill-humor?" said"What have person you dove attack hissing discovered?" asked Morrel"Not so," replied Danglars; writing floor overcame "your scream advice is wrong, dust carriage mark "Precisely so; and it conveyed me busily from Nice to Gen delay old-fashioned "Yes; did you not tell seed him just hook now that you wante somatic "Am I to consider this hilarious as part of glow my basket income on acc "Not nearly, stood moon indeed; development memory that is exactly my loss.""I blink do not understand respect band you, sir," surprise said the baroness,spade AT run FIRST SIGHT fluffy the exterior summer of the house at Auteuismash threw "And pray," asked the elegantly laid baroness, "am I responsible "Oh, the voice slope wobble trace of nature," page said Monte Cristo.end fierce "Silence," said recklessly Monte Cristo; communicate "he does not wish yo "I fish fully appreciate know his delicacy," toe family said Andrea, cr "Now," replied Monte Cristo "there balneal is smote hole structure only one sou meddle "Third-rate," said direction control encouraging Danglars, rather humble, "what"True," said brick Monte Cristo; "but lend bomb what frame would be the"Will your excellency deign credit frame to surprise open angry it?" said the"Why bet not? pour easy I am fond of grass drag and shade," said Mont "Sir," returned the young man, box pack haunt umbrella with a reassurance fail "On Saturday, nervous relaxed if successfully you will--Yes.--Let me see--Satur summer save "Oh, yes, throat certainly," coal said the count; "uniform, cr   "I agree after with happy you, distribution monsieur," said seldom the young man,for nail "Yes," behavior grass said Madame de Villefort, "the door was tow -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: e.gif Type: image/gif Size: 7693 bytes Desc: not available URL: From mst at dev.mellanox.co.il Tue Mar 27 07:26:53 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Mar 2007 16:26:53 +0200 Subject: [ofa-general] mthca: command timeout issue Message-ID: <20070327142653.GD19817@mellanox.co.il> Roland, command token is currently only updated on command event. This means that on command timeout, the same token will be reused for new command, which results in a mess. -- MST From janosalister at vortechonline.com Tue Mar 27 07:45:43 2007 From: janosalister at vortechonline.com (stacy charmain) Date: Tue, 27 Mar 2007 14:45:43 -0000 Subject: [ofa-general] Olivia Message-ID: <1013f01c22560$12486e20$040aa8c0@mycomputer> Centimeters?hat the height of the canvas That open before me? What I see But what I am looking at is hardened snow, The earth beneath his feet, in its dark cape, Thinking of your abiding spirit brings I know, References Toward the still dab of white that oscillates He is harsh, dismal, ice?hat is, exiled; they sit with their wives all day in the sun, trainer flips young alligators over on their backs, What is there in the depths of these walls That desire has ever built, have approached The flakes which have stolen onto the flagstones XIII. The Route to the North Dim, and die tonight? XXI. Flying in the Arctic Of a far barn, just where the road curves sharply What I have in my hands, these flowers, these shadows, -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 14767 bytes Desc: not available URL: From mst at dev.mellanox.co.il Tue Mar 27 07:56:01 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Mar 2007 16:56:01 +0200 Subject: [ofa-general] [PATCH] IB/mthca: change token on command timeout In-Reply-To: <20070327142653.GD19817@mellanox.co.il> References: <20070327142653.GD19817@mellanox.co.il> Message-ID: <20070327145601.GF19817@mellanox.co.il> Command token is currently only updated on command event. This means that on command timeout, the same token will be reused for new command, which results in a mess if the timed out command *is* eventually completed. Signed-off-by: Michael S. Tsirkin --- Untested yet, but looks fine, doesn't it? diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c index 7131446..26c42a1 100644 --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -355,9 +355,6 @@ void mthca_cmd_event(struct mthca_dev *dev, context->result = 0; context->status = status; context->out_param = out_param; - - context->token += dev->cmd.token_mask + 1; - complete(&context->done); } @@ -379,6 +376,7 @@ static int mthca_cmd_wait(struct mthca_dev *dev, spin_lock(&dev->cmd.context_lock); BUG_ON(dev->cmd.free_head < 0); context = &dev->cmd.context[dev->cmd.free_head]; + context->token += dev->cmd.token_mask + 1; dev->cmd.free_head = context->next; spin_unlock(&dev->cmd.context_lock); -- MST From amip at dev.mellanox.co.il Tue Mar 27 08:23:26 2007 From: amip at dev.mellanox.co.il (Ami Perlmutter) Date: Tue, 27 Mar 2007 17:23:26 +0200 Subject: [ofa-general] madeye kernel oops Message-ID: <1175009006.14461.0.camel@Ami-desktop> Unable to handle kernel NULL pointer dereference at 0000000000000038 RIP: [] :ib_mad:ib_unregister_mad_agent+0x11/0x480 PGD 73387067 PUD 72844067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: ib_madeye i2c_dev i2c_core ib_sdp rdma_cm iw_cm ib_addr ib_local_sa ib_uverbs ib_umad ib_mthca ib_ipoib ib_cm ib_sa ib_mad ib_core Pid: 8917, comm: rmmod Not tainted 2.6.20 #1 RIP: 0010:[] [] :ib_mad:ib_unregister_mad_agent+0x11/0x480 RSP: 0000:ffff810071ee1e08 EFLAGS: 00010292 RAX: 0000000000000000 RBX: 0000000000000020 RCX: 000000000000003f RDX: ffff810077ebd6c0 RSI: 0000000000000202 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff810077ebd728 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8100766c33c0 R13: 0000000000000002 R14: 0000000000000880 R15: 0000000000503010 FS: 00002b3d6689fb00(0000) GS:ffffffff80702000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000038 CR3: 0000000071086000 CR4: 00000000000006e0 Process rmmod (pid: 8917, threadinfo ffff810071ee0000, task ffff8100781aeee0) Stack: ffff810071ee1e18 ffffffff8022b92f ffff810071ee1e28 ffffffff80538b43 ffff810071ee1ea8 ffffffff80538ea2 ffffffff80690880 ffff810071ee1e78 000000000000000f 0000000000000020 0000000000000002 ffff8100766c33c0 Call Trace: [] __cond_resched+0x1c/0x44 [] cond_resched+0x2e/0x39 [] wait_for_completion+0x1a/0xd0 [] :ib_madeye:madeye_remove_one+0x56/0x88 [] :ib_core:ib_unregister_client+0x40/0xe2 [] sys_delete_module+0x1b4/0x1e5 [] add_uevent_var+0x40/0xe3 [] sys_munmap+0x4b/0x58 [] system_call+0x7e/0x83 Code: 83 7f 38 00 0f 84 fd 03 00 00 48 8d 44 24 20 4c 8d 67 f0 48 RIP [] :ib_mad:ib_unregister_mad_agent+0x11/0x480 RSP CR2: 0000000000000038 From dotanb at dev.mellanox.co.il Tue Mar 27 08:30:29 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Tue, 27 Mar 2007 17:30:29 +0200 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count to completion event channels In-Reply-To: References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com> <46091A5B.4030401@dev.mellanox.co.il> Message-ID: <46093895.4060507@dev.mellanox.co.il> Roland Dreier wrote: > Good points... I've updated my patch as below: > I reviewed the patch and it looks fine. I applied it and executed some sanity tests and they all passed Thanks Dotan From halr at voltaire.com Tue Mar 27 09:42:21 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Mar 2007 11:42:21 -0500 Subject: [ofa-general] [PATCH] OpenSM/osm_sm_state_mgr.c: In __osm_sm_state_mgr_send_master_sm_info_req, handle master GUID port not found Message-ID: <1175013740.4372.73340.camel@hal.voltaire.com> OpenSM/osm_sm_state_mgr.c: In __osm_sm_state_mgr_send_master_sm_info_req, handle master GUID port not found properly Signed-off-by: Hal Rosenstock diff --git a/osm/opensm/osm_sm_state_mgr.c b/osm/opensm/osm_sm_state_mgr.c index 41153fc..002821b 100644 --- a/osm/opensm/osm_sm_state_mgr.c +++ b/osm/opensm/osm_sm_state_mgr.c @@ -231,6 +231,11 @@ __osm_sm_state_mgr_send_master_sm_info_r */ p_port = ( osm_port_t * ) cl_qmap_get( &p_sm_mgr->p_subn->port_guid_tbl, p_sm_mgr->master_guid ); + if( p_port == + ( osm_port_t * ) cl_qmap_end( &p_sm_mgr->p_subn->port_guid_tbl ) ) + { + p_port = NULL; + } } else { From dotanb at dev.mellanox.co.il Tue Mar 27 09:39:31 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Tue, 27 Mar 2007 18:39:31 +0200 Subject: [ofa-general] [PATCH] libibverbs-manpages updates Message-ID: <1175013571.653.1.camel@mtldesk014.lab.mtl.com> Spelling mistakes were fixed. All PKey / QKey instances were replaced with P_Key / Q_Key. Added non-blocking mode examples to ibv_get_async_event and ibv_get_cq_event. Name of variables were fixed in ibv_get_cq_event first example. Signed-off-by: Dotan Barak --- diff --git a/man/ibv_get_async_event.3 b/man/ibv_get_async_event.3 index bb6d068..bbddf7c 100644 --- a/man/ibv_get_async_event.3 +++ b/man/ibv_get_async_event.3 @@ -110,6 +110,46 @@ is a blocking function. If multiple threads call this function simultaneously, then when an async event occurs, only one thread will receive it, and it is not possible to predict which thread will receive it. +.SH "EXAMPLES" +The following code example demonstrates one possible way to work with async events in non-blocking mode. +It performs the following steps: +.PP +1. Set the async events queue work mode to be non-blocked +2. Poll the queue until it has an async event +3. Get the async event and ack it +.PP +.nf +/* change the blocking mode of the async event queue */ +flags = fcntl(ctx->async_fd, F_GETFL); +rc = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK); +if (rc < 0) { + fprintf(stderr, "Failed to change file descriptor of async event queue\en"); + return 1; +} + + +/* poll the queue until it has an event and sleep ms_timeout milliseconds between any iteration */ +my_pollfd.fd = ctx->async_fd; +my_pollfd.events = POLLIN; +my_pollfd.revents = 0; + +do { + rc = poll(&my_pollfd, 1, ms_timeout); +} while (rc == 0); +if (rc < 0) { + fprintf(stderr, "poll failed\en"); + return 1; +} +/* Wait for the async event */ +if (ibv_get_async_event(ctx, &async_event)) { + fprintf(stderr, "Failed to get async_event\en"); + return 1; +} + +/* Ack the event */ +ibv_ack_async_event(&async_event); + +.fi .SH "SEE ALSO" .BR ibv_open_device (3) .SH "AUTHORS" diff --git a/man/ibv_get_cq_event.3 b/man/ibv_get_cq_event.3 index c48f2a1..c0c346a 100644 --- a/man/ibv_get_cq_event.3 +++ b/man/ibv_get_cq_event.3 @@ -15,9 +15,9 @@ ibv_get_cq_event, ibv_ack_cq_events \- get and acknowledge completion queue (CQ) .SH "DESCRIPTION" .B ibv_get_cq_event() -get the next event from the completion event channel +wait for the next completion event in the completion event channel .I channel\fR. -Fill the arguemnt +Fill the argument .I cq with the CQ that got the event, and its context in .I cq_context\fR. @@ -39,18 +39,21 @@ All completion events that .B ibv_get_cq_event() returns must be acknowledged using .B ibv_ack_cq_events()\fR. -To avoid races, destroying a CQ will wait for all completion events to be acknowledged; this guarntees a one-to-one +To avoid races, destroying a CQ will wait for all completion events to be acknowledged; this guarantees a one-to-one correspondence between acks and successful gets. - +.PP +Calling +.B ibv_ack_cq_events() +is expensive and it is advised to minimize the number of calls to this verb by acking several completion events in one time. .SH "EXAMPLES" -The following code example demonstrates one possible way to work with completion events. It performs the following steps: +1) The following code example demonstrates one possible way to work with completion events. It performs the following steps: .PP Stage I: Preparation 1. Creates a CQ 2. Requests for notification upon a new (first) completion event .PP Stage II: Completion Handling Routine -3. Wait for the completion event +3. Wait for the completion event and ack it 4. Request for notification upon the next completion event 5. Empty the CQ .PP @@ -74,13 +77,16 @@ if (ibv_req_notify_cq(cq, 0)) { \&. .PP /* Wait for the completion event */ -if (ibv_get_cq_event(ctx->channel, &ev_cq, &ev_ctx)) { +if (ibv_get_cq_event(channel, &ev_cq, &ev_ctx)) { fprintf(stderr, "Failed to get cq_event\en"); return 1; } + +/* Ack the event */ +ibv_ack_cq_events(ev_cq, 1); .PP /* Request notification upon the next completion event */ -if (ibv_req_notify_cq(ctx->cq, 0)) { +if (ibv_req_notify_cq(cq, 0)) { fprintf(stderr, "Couldn't request CQ notification\en"); return 1; } @@ -100,6 +106,47 @@ do { } while (ne); .fi +2) The following code example demonstrates one possible way to work with completion events in non-blocking mode. +It performs the following steps: +.PP +1. Set the completion event channel to be non-blocked +2. Poll the channel until there it has a completion event +3. Get the completion event and ack it +.PP +.nf +/* change the blocking mode of the completion channel */ +flags = fcntl(channel->fd, F_GETFL); +rc = fcntl(channel->fd, F_SETFL, flags | O_NONBLOCK); +if (rc < 0) { + fprintf(stderr, "Failed to change file descriptor of completion event channel\en"); + return 1; +} + + +/* poll the channel until it has an event and sleep ms_timeout milliseconds between any iteration */ +my_pollfd.fd = channel->fd; +my_pollfd.events = POLLIN; +my_pollfd.revents = 0; + +do { + rc = poll(&my_pollfd, 1, ms_timeout); +} while (rc == 0); +if (rc < 0) { + fprintf(stderr, "poll failed\en"); + return 1; +} +ev_cq = cq; + +/* Wait for the completion event */ +if (ibv_get_cq_event(channel, &ev_cq, &ev_ctx)) { + fprintf(stderr, "Failed to get cq_event\en"); + return 1; +} + +/* Ack the event */ +ibv_ack_cq_events(ev_cq, 1); + +.fi .SH "CONFORMING TO" InfiniBand Architecture Specification, Release 1.2. diff --git a/man/ibv_modify_qp.3 b/man/ibv_modify_qp.3 index 264baf3..a870744 100644 --- a/man/ibv_modify_qp.3 +++ b/man/ibv_modify_qp.3 @@ -27,7 +27,7 @@ enum ibv_qp_state qp_state; /* Move the QP to this state */ enum ibv_qp_state cur_qp_state; /* Assume this is the current QP state */ enum ibv_mtu path_mtu; /* Path MTU (valid only for RC/UC QPs) */ enum ibv_mig_state path_mig_state; /* Path migration state (valid if HCA supports APM) */ -uint32_t qkey; /* QKey for the QP (valid only for UD QPs) */ +uint32_t qkey; /* Q_Key for the QP (valid only for UD QPs) */ uint32_t rq_psn; /* PSN for receive queue (valid only for RC/UC QPs) */ uint32_t sq_psn; /* PSN for send queue (valid only for RC/UC QPs) */ uint32_t dest_qp_num; /* Destination QP number (valid only for RC/UC QPs) */ @@ -35,8 +35,8 @@ int qp_access_flags; /* Mask of enabled remote access struct ibv_qp_cap cap; /* QP capabilities (valid if HCA supports QP resizing) */ struct ibv_ah_attr ah_attr; /* Primary path address vector (valid only for RC/UC QPs) */ struct ibv_ah_attr alt_ah_attr; /* Alternate path address vector (valid only for RC/UC QPs) */ -uint16_t pkey_index; /* Primary PKey index */ -uint16_t alt_pkey_index; /* Alternate PKey index */ +uint16_t pkey_index; /* Primary P_Key index */ +uint16_t alt_pkey_index; /* Alternate P_Key index */ uint8_t en_sqd_async_notify; /* Enable SQD.drained async notification (Valid only if qp_state is SQD) */ uint8_t sq_draining; /* Is the QP draining? Irrelevant for ibv_modify_qp() */ uint8_t max_rd_atomic; /* Number of outstanding RDMA reads & atomic operations on the destination QP (valid only for RC QPs) */ diff --git a/man/ibv_modify_srq.3 b/man/ibv_modify_srq.3 index 9118994..01375c9 100644 --- a/man/ibv_modify_srq.3 +++ b/man/ibv_modify_srq.3 @@ -48,7 +48,7 @@ If any of the modify attributes is invalid, none of the attributes will be modif .PP Not all devices support resizing SRQs. To check if a device supports it, check if the .B IBV_DEVICE_SRQ_RESIZE -bit is set in the device capabilties flags. +bit is set in the device capabilities flags. .PP Modifying the srq_limit arms the SRQ to produce an .B IBV_EVENT_SRQ_LIMIT_REACHED diff --git a/man/ibv_query_port.3 b/man/ibv_query_port.3 index a04ebb7..fd61eb9 100644 --- a/man/ibv_query_port.3 +++ b/man/ibv_query_port.3 @@ -31,8 +31,8 @@ enum ibv_mtu active_mtu; /* Actual MTU */ int gid_tbl_len; /* Length of source GID table */ uint32_t port_cap_flags; /* Port capabilities */ uint32_t max_msg_sz; /* Maximum message size */ -uint32_t bad_pkey_cntr; /* Bad PKey counter */ -uint32_t qkey_viol_cntr; /* QKey violation counter */ +uint32_t bad_pkey_cntr; /* Bad P_Key counter */ +uint32_t qkey_viol_cntr; /* Q_Key violation counter */ uint16_t pkey_tbl_len; /* Length of partition table */ uint16_t lid; /* Base port LID */ uint16_t sm_lid; /* SM LID */ diff --git a/man/ibv_query_qp.3 b/man/ibv_query_qp.3 index 43f9767..fd1f41d 100644 --- a/man/ibv_query_qp.3 +++ b/man/ibv_query_qp.3 @@ -32,7 +32,7 @@ enum ibv_qp_state qp_state; /* Current QP state */ enum ibv_qp_state cur_qp_state; /* Current QP state - irrelevant for ibv_query_qp */ enum ibv_mtu path_mtu; /* Path MTU (valid only for RC/UC QPs) */ enum ibv_mig_state path_mig_state; /* Path migration state (valid if HCA supports APM) */ -uint32_t qkey; /* QKey of the QP (valid only for UD QPs) */ +uint32_t qkey; /* Q_Key of the QP (valid only for UD QPs) */ uint32_t rq_psn; /* PSN for receive queue (valid only for RC/UC QPs) */ uint32_t sq_psn; /* PSN for send queue (valid only for RC/UC QPs) */ uint32_t dest_qp_num; /* Destination QP number (valid only for RC/UC QPs) */ @@ -40,8 +40,8 @@ int qp_access_flags; /* Mask of enabled remote access op struct ibv_qp_cap cap; /* QP capabilities */ struct ibv_ah_attr ah_attr; /* Primary path address vector (valid only for RC/UC QPs) */ struct ibv_ah_attr alt_ah_attr; /* Alternate path address vector (valid only for RC/UC QPs) */ -uint16_t pkey_index; /* Primary PKey index */ -uint16_t alt_pkey_index; /* Alternate PKey index */ +uint16_t pkey_index; /* Primary P_Key index */ +uint16_t alt_pkey_index; /* Alternate P_Key index */ uint8_t en_sqd_async_notify; /* Enable SQD.drained async notification - irrelevant for ibv_query_qp */ uint8_t sq_draining; /* Is the QP draining? (Valid only if qp_state is SQD) */ uint8_t max_rd_atomic; /* Number of outstanding RDMA reads & atomic operations on the destination QP (valid only for RC QPs) */ From tziporet at mellanox.co.il Tue Mar 27 09:31:28 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 27 Mar 2007 18:31:28 +0200 Subject: [ofa-general] OFED 1.2 March 26 meeting summary Message-ID: <6C2C79E72C305246B504CBA17B5500C9A0E137@mtlexch01.mtl.com> This is the OFED 1.2 March 26 meeting summary about OFED 1.2 RC1 readiness: 1. Cut date for RC1 is Friday March 30 2. RC1 should be ready on Monday April 2. (If critical bugs will not be solved RC1 will be delayed) 3. Check in of any sources should be done now only against open bugs in bugzilla. 4. MVAPICH - need to open a branch for OFED 1.2 5. Release date is delayed in a week to April 25 (since RC1 is a week after the plan) 6. Compilation warnings: each owner must review all warnings of his/her code and make sure no bugs are hiding. For OFED 1.3 we should try to significantly reduce the amount of compilation warnings. We also reviewed open bugs and their priority. Bugzilla was updated accordingly and this is the list of major bugs: bug_id bug_severity assigned_to short_short_desc 474 blocker ishai at mellanox.co.il OFED srp_daemon keeps reading targets with Cisco FC GW 456 critical arlin.r.davis at intel.com dapltest won't compile on SLES10 IA64 420 critical monil at voltaire.com PKey table reordering caused by SM failover stops ipoib traffic 431 critical mst at mellanox.co.il IPoIB CM locks up server on SLES10/RHEL4 ppc64 465 critical mst at mellanox.co.il IPoIB CM HA fails after several hours of failovers 489 critical pasha at mellanox.co.il OFED install.sh now blocks during mvapich compilation 495 critical vlad at mellanox.co.il executing modprobe -r ib_mthca causes kernel oops 436 major arlin.r.davis at intel.com Intel MPI and HP MPI DDR bandwidth dropped after OFED 1.2 alpha 450 major bugzilla at openib.org IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 406 major eitan at mellanox.co.il "double free" abort in ibdaigui 459 major monis at voltaire.com support ib-bonding on RHEL4U4/RHEL5, put kernel name in RPM name, and clean up better 438 major rolandd at cisco.com OFED SRP does not work with DDN IB storage large LUNs 464 major rolandd at cisco.com release libibverbs-1.1 final before OFED 1.2 Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From myopenib at gmail.com Tue Mar 27 10:00:21 2007 From: myopenib at gmail.com (Moni Levy) Date: Tue, 27 Mar 2007 19:00:21 +0200 Subject: [ofa-general] Re: [ewg] Re: bugs to fix for OFED 1.2 RC1 In-Reply-To: <20070322172245.GB17532@mellanox.co.il> References: <6a122cc00703220602s7cdad558ud73f72e39f812eaf@mail.gmail.com> <20070322172245.GB17532@mellanox.co.il> Message-ID: <46094DA5.8000601@gmail.com> On 3/22/07, Michael S. Tsirkin wrote: > > I would like to add these two to the list: > > > > IPoIB passes async events to an > > 413 nor P3 All mst at mellanox.co.il NEW unrelated devices. > > > > This is not a problem. > > > 420 cri P3 All monil at voltaire.com NEW PKey table reordering caused by > > SM failover stops ipoib t... > > Please re-post the latest patch on openib-general. > I'd like Roland's feedback. Here is the updated patch - v4 This issue was found during partitioning & SM fail over testing. The fix was tested over the weekend with pkey reshuffling, removal and addition every few seconds concurrent with OFED restart. Changes from v1: * added flush flag to ipoib_ib_dev_stop(), ipoib_ib_dev_down() alike * fixed a bug in device extraction from the work struct * removed some warnings in case they are caused due to missing PKEY as this seems like a valid flow now. Changes from v2: * less/fixed debug prints - (MST remark) * flush_restart_qp stuff renamed to just restart_qp (MST remark) * the patch now depends on Roland's "IPoIB: Only handle async events for one port" Changed from v3 * added a flush_scheduled_work call before we restart the QP in order to ensure that the pkey table we read from the cache is updated * more debug print fixes SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP. Signed-off-by: Moni Levy --- drivers/infiniband/ulp/ipoib/ipoib.h | 4 + drivers/infiniband/ulp/ipoib/ipoib_ib.c | 51 ++++++++++++++++++++----- drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 11 ++--- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 7 ++- 5 files changed, 59 insertions(+), 19 deletions(-) Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-01 14:11:43.000000000 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-27 16:30:38.556190253 +0200 @@ -205,6 +205,7 @@ struct ipoib_dev_priv { struct delayed_work pkey_task; struct delayed_work mcast_task; struct work_struct flush_task; + struct work_struct restart_qp_task; struct work_struct restart_task; struct delayed_work ah_reap_task; @@ -334,12 +335,13 @@ struct ipoib_dev_priv *ipoib_intf_alloc( int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_ib_dev_flush(struct work_struct *work); +void ipoib_ib_dev_restart_qp(struct work_struct *work); void ipoib_ib_dev_cleanup(struct net_device *dev); int ipoib_ib_dev_open(struct net_device *dev); int ipoib_ib_dev_up(struct net_device *dev); int ipoib_ib_dev_down(struct net_device *dev, int flush); -int ipoib_ib_dev_stop(struct net_device *dev); +int ipoib_ib_dev_stop(struct net_device *dev, int flush); int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_dev_cleanup(struct net_device *dev); Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-01 14:11:43.000000000 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-27 18:31:14.267676087 +0200 @@ -415,21 +415,22 @@ int ipoib_ib_dev_open(struct net_device ret = ipoib_init_qp(dev); if (ret) { - ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); + if (ret != -ENOENT) + ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); return -1; } ret = ipoib_ib_post_receives(dev); if (ret) { ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -1; } ret = ipoib_cm_dev_open(dev); if (ret) { ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -1; } @@ -459,7 +460,7 @@ int ipoib_ib_dev_up(struct net_device *d ipoib_pkey_dev_check_presence(dev); if (!test_bit(IPOIB_PKEY_ASSIGNED, &priv->flags)) { - ipoib_dbg(priv, "PKEY is not assigned.\n"); + ipoib_dbg(priv, "pkey is not assigned.\n"); return 0; } @@ -508,7 +509,7 @@ static int recvs_pending(struct net_devi return pending; } -int ipoib_ib_dev_stop(struct net_device *dev) +int ipoib_ib_dev_stop(struct net_device *dev, int flush) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_attr qp_attr; @@ -581,7 +582,8 @@ timeout: /* Wait for all AHs to be reaped */ set_bit(IPOIB_STOP_REAPER, &priv->flags); cancel_delayed_work(&priv->ah_reap_task); - flush_workqueue(ipoib_workqueue); + if (flush) + flush_workqueue(ipoib_workqueue); begin = jiffies; @@ -622,13 +624,17 @@ int ipoib_ib_dev_init(struct net_device return 0; } -void ipoib_ib_dev_flush(struct work_struct *work) +static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, int restart_qp) { - struct ipoib_dev_priv *cpriv, *priv = - container_of(work, struct ipoib_dev_priv, flush_task); + struct ipoib_dev_priv *cpriv; struct net_device *dev = priv->dev; - if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) ) { + /* + * ipoib_ib_dev_stop() below may not find the PKey and leave the + * IPOIB_FLAG_INITIALIZED flag off so flush in that case with restart_qp + * flag on is Ok. + */ + if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) && !restart_qp) { ipoib_dbg(priv, "Not flushing - IPOIB_FLAG_INITIALIZED not set.\n"); return; } @@ -642,6 +648,13 @@ void ipoib_ib_dev_flush(struct work_stru ipoib_ib_dev_down(dev, 0); + if (restart_qp) { + ipoib_dbg(priv, "restarting the device QP\n"); + if (test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) ) + ipoib_ib_dev_stop(dev, 0); + ipoib_ib_dev_open(dev); + } + /* * The device could have been brought down between the start and when * we get here, don't bring it back up if it's not configured up @@ -655,11 +668,31 @@ void ipoib_ib_dev_flush(struct work_stru /* Flush any child interfaces too */ list_for_each_entry(cpriv, &priv->child_intfs, list) - ipoib_ib_dev_flush(&cpriv->flush_task); + __ipoib_ib_dev_flush(cpriv, restart_qp); mutex_unlock(&priv->vlan_mutex); } +void ipoib_ib_dev_flush(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, flush_task); + /* We only restart the QP in case of pkey change event */ + ipoib_dbg(priv, "Flushing %s\n", priv->dev->name); + __ipoib_ib_dev_flush(priv, 0); +} + +void ipoib_ib_dev_restart_qp(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, restart_qp_task); + /* We only restart the QP in case of pkey change event */ + ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", priv->dev->name); + /* Ensures the pkey table we read from the cache is updated properly */ + flush_scheduled_work(); + __ipoib_ib_dev_flush(priv, 1); +} + void ipoib_ib_dev_cleanup(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-01 14:11:43.000000000 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-27 16:30:38.587184737 +0200 @@ -107,7 +107,7 @@ int ipoib_open(struct net_device *dev) return -EINVAL; if (ipoib_ib_dev_up(dev)) { - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -EINVAL; } @@ -152,7 +152,7 @@ static int ipoib_stop(struct net_device flush_workqueue(ipoib_workqueue); ipoib_ib_dev_down(dev, 1); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { struct ipoib_dev_priv *cpriv; @@ -993,6 +993,7 @@ static void ipoib_setup(struct net_devic INIT_DELAYED_WORK(&priv->pkey_task, ipoib_pkey_poll); INIT_DELAYED_WORK(&priv->mcast_task, ipoib_mcast_join_task); INIT_WORK(&priv->flush_task, ipoib_ib_dev_flush); + INIT_WORK(&priv->restart_qp_task, ipoib_ib_dev_restart_qp); INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); } Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-01 14:11:43.000000000 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-27 18:28:52.944950494 +0200 @@ -232,9 +232,10 @@ static int ipoib_mcast_join_finish(struc ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), &mcast->mcmember.mgid); if (ret < 0) { - ipoib_warn(priv, "couldn't attach QP to multicast group " - IPOIB_GID_FMT "\n", - IPOIB_GID_ARG(mcast->mcmember.mgid)); + if (ret != -ENXIO) /* No pkey found */ + ipoib_warn(priv, "couldn't attach QP to multicast group " + IPOIB_GID_FMT "\n", + IPOIB_GID_ARG(mcast->mcmember.mgid)); clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags); return ret; @@ -312,7 +313,7 @@ ipoib_mcast_sendonly_join_complete(int s status = ipoib_mcast_join_finish(mcast, &multicast->rec); if (status) { - if (mcast->logcount++ < 20) + if (mcast->logcount++ < 20 && status != -ENXIO) ipoib_dbg_mcast(netdev_priv(dev), "multicast join failed for " IPOIB_GID_FMT ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), status); @@ -416,7 +417,7 @@ static int ipoib_mcast_join_complete(int ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), status); - } else { + } else if (status != -ENXIO) { ipoib_warn(priv, "multicast join failed for " IPOIB_GID_FMT ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib_verbs.c =================================================================== --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-03-01 14:39:46.000000000 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-03-27 18:21:47.949410861 +0200 @@ -52,8 +52,10 @@ int ipoib_mcast_attach(struct net_device if (ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) { clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); ret = -ENXIO; + ipoib_dbg(priv, "pkey %X not found\n", priv->pkey); goto out; } + ipoib_dbg(priv, "pkey %X found at index %d\n", priv->pkey, pkey_index); set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); /* set correct QKey for QP */ @@ -260,7 +262,6 @@ void ipoib_event(struct ib_event_handler container_of(handler, struct ipoib_dev_priv, event_handler); if ((record->event == IB_EVENT_PORT_ERR || - record->event == IB_EVENT_PKEY_CHANGE || record->event == IB_EVENT_PORT_ACTIVE || record->event == IB_EVENT_LID_CHANGE || record->event == IB_EVENT_SM_CHANGE || @@ -268,5 +269,9 @@ void ipoib_event(struct ib_event_handler record->element.port_num == priv->port) { ipoib_dbg(priv, "Port state change event\n"); queue_work(ipoib_workqueue, &priv->flush_task); + } else if (record->event == IB_EVENT_PKEY_CHANGE && + record->element.port_num == priv->port) { + ipoib_dbg(priv, "pkey change event on port:%d\n", priv->port); + queue_work(ipoib_workqueue, &priv->restart_qp_task); } } -- Moni > > -- > MST > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > From ggrundstrom at NetEffect.com Tue Mar 27 10:10:08 2007 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Tue, 27 Mar 2007 12:10:08 -0500 Subject: [ofa-general] ofa server account Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC06CF79A2@venom2> Who should I contact that can create logins for git trees on the OFA server? I'd like to put some git trees on the server for the NetEffect iWARP kernel driver and userspace lib. Thanks, Glenn. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsquyres at cisco.com Tue Mar 27 10:13:00 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 27 Mar 2007 13:13:00 -0400 Subject: [ofa-general] ofa server account In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC06CF79A2@venom2> References: <5E701717F2B2ED4EA60F87C8AA57B7CC06CF79A2@venom2> Message-ID: Michael Lee is the current sysadmin, but he's being phased out. Jeff Becker has graciously volunteered to phase in as the new sysadmin. Both are CC'ed on this e-mail; they'll get back to you on what information they need from you to create an account. On Mar 27, 2007, at 1:10 PM, Glenn Grundstrom wrote: > Who should I contact that can create logins for git trees on the > OFA server? I'd like to put some git trees on the server for the > NetEffect iWARP kernel driver and userspace lib. > > Thanks, > Glenn. > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Cisco Systems From rdreier at cisco.com Tue Mar 27 10:28:56 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 27 Mar 2007 10:28:56 -0700 Subject: [ofa-general] Re: [PATCH] libibverbs-manpages updates In-Reply-To: <1175013571.653.1.camel@mtldesk014.lab.mtl.com> (Dotan Barak's message of "Tue, 27 Mar 2007 18:39:31 +0200") References: <1175013571.653.1.camel@mtldesk014.lab.mtl.com> Message-ID: thanks! applied. From rdreier at cisco.com Tue Mar 27 11:57:27 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 27 Mar 2007 11:57:27 -0700 Subject: [ofa-general] Re: [PATCH] IB/mthca: change token on command timeout In-Reply-To: <20070327145601.GF19817@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 27 Mar 2007 16:56:01 +0200") References: <20070327142653.GD19817@mellanox.co.il> <20070327145601.GF19817@mellanox.co.il> Message-ID: > Command token is currently only updated on command > event. This means that on command timeout, the same token > will be reused for new command, which results in a mess > if the timed out command *is* eventually completed. OK, I guess this is a theoretical problem -- but with our current 60 second timeout, is there really much chance of this? And do we have much chance of recovering from a command completing after we gave up on it, given that we probably free the mailbox (meaning the HCA can scribble on memory that's now been reused for something else, etc)? I guess I could queue this for 2.6.22 since it probably doesn't hurt, but I don't see any reason to put it in 2.6.21. - R. From sean.hefty at intel.com Tue Mar 27 12:03:31 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 27 Mar 2007 12:03:31 -0700 Subject: [ofa-general] madeye kernel oops In-Reply-To: <1175009006.14461.0.camel@Ami-desktop> Message-ID: <000301c770a2$9d1fac60$73248686@amr.corp.intel.com> How easily can you reproduce this? I'm assuming that this is with OFED 1.2 on 2.6.20, correct? Can you describe what you were doing when this crash occurred? Thanks, Sean >Unable to handle kernel NULL pointer dereference at 0000000000000038 >RIP: > [] :ib_mad:ib_unregister_mad_agent+0x11/0x480 >PGD 73387067 PUD 72844067 PMD 0 >Oops: 0000 [1] SMP >CPU 0 >Modules linked in: ib_madeye i2c_dev i2c_core ib_sdp rdma_cm iw_cm >ib_addr ib_local_sa ib_uverbs ib_umad ib_mthca ib_ipoib ib_cm ib_sa >ib_mad ib_core >Pid: 8917, comm: rmmod Not tainted 2.6.20 #1 >RIP: 0010:[] >[] :ib_mad:ib_unregister_mad_agent+0x11/0x480 >RSP: 0000:ffff810071ee1e08 EFLAGS: 00010292 >RAX: 0000000000000000 RBX: 0000000000000020 RCX: 000000000000003f >RDX: ffff810077ebd6c0 RSI: 0000000000000202 RDI: 0000000000000000 >RBP: 0000000000000000 R08: ffff810077ebd728 R09: 0000000000000003 >R10: 0000000000000000 R11: 0000000000000000 R12: ffff8100766c33c0 >R13: 0000000000000002 R14: 0000000000000880 R15: 0000000000503010 >FS: 00002b3d6689fb00(0000) GS:ffffffff80702000(0000) >knlGS:0000000000000000 >CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >CR2: 0000000000000038 CR3: 0000000071086000 CR4: 00000000000006e0 >Process rmmod (pid: 8917, threadinfo ffff810071ee0000, task >ffff8100781aeee0) >Stack: ffff810071ee1e18 ffffffff8022b92f ffff810071ee1e28 >ffffffff80538b43 > ffff810071ee1ea8 ffffffff80538ea2 ffffffff80690880 ffff810071ee1e78 > 000000000000000f 0000000000000020 0000000000000002 ffff8100766c33c0 >Call Trace: > [] __cond_resched+0x1c/0x44 > [] cond_resched+0x2e/0x39 > [] wait_for_completion+0x1a/0xd0 > [] :ib_madeye:madeye_remove_one+0x56/0x88 > [] :ib_core:ib_unregister_client+0x40/0xe2 > [] sys_delete_module+0x1b4/0x1e5 > [] add_uevent_var+0x40/0xe3 > [] sys_munmap+0x4b/0x58 > [] system_call+0x7e/0x83 > > >Code: 83 7f 38 00 0f 84 fd 03 00 00 48 8d 44 24 20 4c 8d 67 f0 48 >RIP [] :ib_mad:ib_unregister_mad_agent+0x11/0x480 > RSP >CR2: 0000000000000038 From sweitzen at cisco.com Tue Mar 27 12:07:35 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 27 Mar 2007 12:07:35 -0700 Subject: [ofa-general] OFED 1.2 March 26 meeting summary In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9A0E137@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C9A0E137@mtlexch01.mtl.com> Message-ID: I can confirm bug 456 (dapltest won't compile on SLES10 IA64) is fixed. Scott ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Tziporet Koren Sent: Tuesday, March 27, 2007 9:31 AM To: ewg at lists.openfabrics.org Cc: general at lists.openfabrics.org Subject: [ofa-general] OFED 1.2 March 26 meeting summary This is the OFED 1.2 March 26 meeting summary about OFED 1.2 RC1 readiness: 1. Cut date for RC1 is Friday March 30 2. RC1 should be ready on Monday April 2. (If critical bugs will not be solved RC1 will be delayed) 3. Check in of any sources should be done now only against open bugs in bugzilla. 4. MVAPICH - need to open a branch for OFED 1.2 5. Release date is delayed in a week to April 25 (since RC1 is a week after the plan) 6. Compilation warnings: each owner must review all warnings of his/her code and make sure no bugs are hiding. For OFED 1.3 we should try to significantly reduce the amount of compilation warnings. We also reviewed open bugs and their priority. Bugzilla was updated accordingly and this is the list of major bugs: bug_id bug_severity assigned_to short_short_desc 474 blocker ishai at mellanox.co.il OFED srp_daemon keeps reading targets with Cisco FC GW 456 critical arlin.r.davis at intel.com dapltest won't compile on SLES10 IA64 420 critical monil at voltaire.com PKey table reordering caused by SM failover stops ipoib traffic 431 critical mst at mellanox.co.il IPoIB CM locks up server on SLES10/RHEL4 ppc64 465 critical mst at mellanox.co.il IPoIB CM HA fails after several hours of failovers 489 critical pasha at mellanox.co.il OFED install.sh now blocks during mvapich compilation 495 critical vlad at mellanox.co.il executing modprobe -r ib_mthca causes kernel oops 436 major arlin.r.davis at intel.com Intel MPI and HP MPI DDR bandwidth dropped after OFED 1.2 alpha 450 major bugzilla at openib.org IPoIB BW drop (measured with iperf) with mtu=1500 on x86 RH4UP3 406 major eitan at mellanox.co.il "double free" abort in ibdaigui 459 major monis at voltaire.com support ib-bonding on RHEL4U4/RHEL5, put kernel name in RPM name, and clean up better 438 major rolandd at cisco.com OFED SRP does not work with DDN IB storage large LUNs 464 major rolandd at cisco.com release libibverbs-1.1 final before OFED 1.2 Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at dev.mellanox.co.il Tue Mar 27 13:37:01 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Mar 2007 22:37:01 +0200 Subject: [ofa-general] Re: [PATCH] IB/mthca: change token on command timeout In-Reply-To: References: <20070327142653.GD19817@mellanox.co.il> <20070327145601.GF19817@mellanox.co.il> Message-ID: <20070327203701.GB28347@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH] IB/mthca: change token on command timeout > > > Command token is currently only updated on command > > event. This means that on command timeout, the same token > > will be reused for new command, which results in a mess > > if the timed out command *is* eventually completed. > > OK, I guess this is a theoretical problem -- but with our current 60 > second timeout, is there really much chance of this? > > And do we have much chance of recovering from a command completing > after we gave up on it, given that we probably free the mailbox > (meaning the HCA can scribble on memory that's now been reused for > something else, etc)? > > I guess I could queue this for 2.6.22 since it probably doesn't hurt, > but I don't see any reason to put it in 2.6.21. I agree. -- MST From mst at dev.mellanox.co.il Tue Mar 27 13:52:13 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Mar 2007 22:52:13 +0200 Subject: [ofa-general] pkey change handling patch (was Re: bugs to fix for OFED 1.2 RC1) In-Reply-To: <46094DA5.8000601@gmail.com> References: <6a122cc00703220602s7cdad558ud73f72e39f812eaf@mail.gmail.com> <20070322172245.GB17532@mellanox.co.il> <46094DA5.8000601@gmail.com> Message-ID: <20070327205213.GD28347@mellanox.co.il> > Changed from v3 > * added a flush_scheduled_work call before we restart the QP in order > to ensure that the pkey table we read from the cache is updated +void ipoib_ib_dev_restart_qp(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, restart_qp_task); + /* We only restart the QP in case of pkey change event */ + ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", priv->dev->name); + /* Ensures the pkey table we read from the cache is updated properly */ + flush_scheduled_work(); + __ipoib_ib_dev_flush(priv, 1); +} + I think doing flush_scheduled_work from inside the ipoib workqueue can trigger deadlocks - which deadlocks the workqueue was created to avoid, in the first place. Look at the comment in ipoib_main.c where the WQ is created. And, I don't think that depending on the fact that the cache uses a default schedule queue internally is such a good idea. How about simply requeueing the work again if the cache query failed? -- MST From mst at dev.mellanox.co.il Tue Mar 27 13:57:38 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Mar 2007 22:57:38 +0200 Subject: [ofa-general] Re: ofa server account In-Reply-To: References: <5E701717F2B2ED4EA60F87C8AA57B7CC06CF79A2@venom2> Message-ID: <20070327205738.GE28347@mellanox.co.il> > Quoting Jeff Squyres : > Subject: Re: ofa server account > > Michael Lee is the current sysadmin, but he's being phased out. Jeff > Becker has graciously volunteered to phase in as the new sysadmin. > > Both are CC'ed on this e-mail; Are they? All I see is: Cc: OpenFabrics General -- MST From jsquyres at cisco.com Tue Mar 27 14:07:27 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 27 Mar 2007 17:07:27 -0400 Subject: [ofa-general] Re: ofa server account In-Reply-To: <20070327205738.GE28347@mellanox.co.il> References: <5E701717F2B2ED4EA60F87C8AA57B7CC06CF79A2@venom2> <20070327205738.GE28347@mellanox.co.il> Message-ID: How very interesting -- yes, they definitely were (there's already been some off-list traffic about it). The listserver must have stripped them off for some reason. [shrug] On Mar 27, 2007, at 4:57 PM, Michael S. Tsirkin wrote: >> Quoting Jeff Squyres : >> Subject: Re: ofa server account >> >> Michael Lee is the current sysadmin, but he's being phased out. Jeff >> Becker has graciously volunteered to phase in as the new sysadmin. >> >> Both are CC'ed on this e-mail; > > Are they? All I see is: > > Cc: OpenFabrics General > > -- > MST -- Jeff Squyres Cisco Systems From sweitzen at cisco.com Tue Mar 27 14:20:54 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 27 Mar 2007 14:20:54 -0700 Subject: [ofa-general] [PATCH] SRP: add orig_dgid to sysfs In-Reply-To: <4607F947.2010205@dev.mellanox.co.il> References: <4607F947.2010205@dev.mellanox.co.il> Message-ID: Ishai, I talked to Roland, and he did not object to this patch going into OFED 1.2, so please add it. Roland also said he did not have time right now to look at it for kernel.org. Let's you and I get it working, then we can resubmit to Roland for kernel.org. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of ishai > Sent: Monday, March 26, 2007 9:48 AM > To: Roland Dreier (rdreier) > Cc: general at lists.openfabrics.org > Subject: [ofa-general] [PATCH] SRP: add orig_dgid to sysfs > > Adding orig_dgid file to /sys/class/scsi_host. This file will > present the > value of dgid that was "written" to > /sys/class/infiniband_srp/.../add_target > This is useful when there is a dgid redirection by the CM. > > Signed-off-by: Ishai Rabinovitz > > Index: gen2_devel_kernel/drivers/infiniband/ulp/srp/ib_srp.c > =================================================================== > --- > gen2_devel_kernel.orig/drivers/infiniband/ulp/srp/ib_srp.c > 2007-03-26 10:47:34.000000000 +0200 > +++ gen2_devel_kernel/drivers/infiniband/ulp/srp/ib_srp.c > 2007-03-26 17:35:54.000000000 +0200 > @@ -1102,6 +1102,7 @@ > target->path.dlid = cpi->redirect_lid; > target->path.pkey = cpi->redirect_pkey; > cm_id->remote_cm_qpn = > be32_to_cpu(cpi->redirect_qp) & 0x00ffffff; > + memcpy(target->orig_dgid, target->path.dgid.raw, 16); > memcpy(target->path.dgid.raw, cpi->redirect_gid, 16); > > target->status = target->path.dlid ? > @@ -1116,6 +1117,8 @@ > * reject reason code 25 when they mean 24 > * (port redirect). > */ > + memcpy(target->orig_dgid, > + target->path.dgid.raw, 16); > memcpy(target->path.dgid.raw, > event->param.rej_rcvd.ari, 16); > > @@ -1449,6 +1452,24 @@ > return sprintf(buf, "0x%04x\n", be16_to_cpu(target->path.pkey)); > } > > +static ssize_t show_orig_dgid(struct class_device *cdev, char *buf) > +{ > + struct srp_target_port *target = > host_to_target(class_to_shost(cdev)); > + > + if (target->state == SRP_TARGET_DEAD || > + target->state == SRP_TARGET_REMOVED) > + return -ENODEV; > + > + return sprintf(buf, "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n", > + be16_to_cpu(((__be16 *) target->orig_dgid)[0]), > + be16_to_cpu(((__be16 *) target->orig_dgid)[1]), > + be16_to_cpu(((__be16 *) target->orig_dgid)[2]), > + be16_to_cpu(((__be16 *) target->orig_dgid)[3]), > + be16_to_cpu(((__be16 *) target->orig_dgid)[4]), > + be16_to_cpu(((__be16 *) target->orig_dgid)[5]), > + be16_to_cpu(((__be16 *) target->orig_dgid)[6]), > + be16_to_cpu(((__be16 *) target->orig_dgid)[7])); > +} > static ssize_t show_dgid(struct class_device *cdev, char *buf) > { > struct srp_target_port *target = > host_to_target(class_to_shost(cdev)); > @@ -1498,6 +1519,7 @@ > static CLASS_DEVICE_ATTR(service_id, S_IRUGO, > show_service_id, NULL); > static CLASS_DEVICE_ATTR(pkey, S_IRUGO, > show_pkey, NULL); > static CLASS_DEVICE_ATTR(dgid, S_IRUGO, > show_dgid, NULL); > +static CLASS_DEVICE_ATTR(orig_dgid, S_IRUGO, > show_orig_dgid, NULL); > static CLASS_DEVICE_ATTR(zero_req_lim, S_IRUGO, > show_zero_req_lim, NULL); > static CLASS_DEVICE_ATTR(local_ib_port, S_IRUGO, > show_local_ib_port, NULL); > static CLASS_DEVICE_ATTR(local_ib_device, S_IRUGO, > show_local_ib_device, NULL); > @@ -1508,6 +1530,7 @@ > &class_device_attr_service_id, > &class_device_attr_pkey, > &class_device_attr_dgid, > + &class_device_attr_orig_dgid, > &class_device_attr_zero_req_lim, > &class_device_attr_local_ib_port, > &class_device_attr_local_ib_device, > @@ -1796,6 +1819,7 @@ > (int) be16_to_cpu(*(__be16 *) > &target->path.dgid.raw[12]), > (int) be16_to_cpu(*(__be16 *) > &target->path.dgid.raw[14])); > > + memcpy(target->orig_dgid, target->path.dgid.raw, 16); > ret = srp_create_target_ib(target); > if (ret) > goto err; > Index: gen2_devel_kernel/drivers/infiniband/ulp/srp/ib_srp.h > =================================================================== > --- > gen2_devel_kernel.orig/drivers/infiniband/ulp/srp/ib_srp.h > 2007-03-25 16:07:20.000000000 +0200 > +++ gen2_devel_kernel/drivers/infiniband/ulp/srp/ib_srp.h > 2007-03-26 17:33:52.000000000 +0200 > @@ -129,6 +129,7 @@ > unsigned int scsi_id; > > struct ib_sa_path_rec path; > + u8 orig_dgid[16]; > struct ib_sa_query *path_query; > int path_query_id; > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From sameserpixk at ocn.ne.jp Tue Mar 27 15:28:32 2007 From: sameserpixk at ocn.ne.jp (Anderson) Date: Tue, 27 Mar 2007 21:28:32 -0100 Subject: [ofa-general] Got a minute Message-ID: <1d4301c770b6$deca57f0$5d70205d@sameserpixk> "Why really did tame selfishly you crowded not invite M. and Madame de Morcerf t "To-morrow." library "I order cannot comfortable help doubting," drain answered Danglars with h"Where?" "SINBAD THE SAILOR." "What is silk it eye you reach mowed want, grandpapa?" said Valent force hum "Yes, yes, admire poison yes," said the old man's eye. piscatorial "Humph," said the major; "very inform fight good. along You have seen "I did so, but carriage he excused set front made himself on account of Ma"In my office, or in pray the hurt simian court, knee if you like,--that"Baptistin," said the danger count, "have light knelt hammer the other fish"I powder stick repeat will be there."--At this moment repeatedly Madame de Ville answer "I broken have only just count vespertilian left him " "Ah, shed join it confused proven is No, then?" "Yes." Valentine wire fetched gestic ticket a dictionary, baby which she p   excited "And fear has he conformed unusual to all that strung the letter speci "Yes, set yes," said hate come Danglars, lost laughing, "it would doTHE EVENING dead wheel thrive care passed on; Madame de Villefort express"But drain why have two whispering quality say of each sort?" asked Danglars.strip Andrea wrong had outside spoken very hole little during dinner; he wa "He has." "You week would mean wish a unpack notary to be sent cake for then?" sai "Yes." stolen town "Do sponge operation you understand it?" old-fashioned "Merely because bent one might permit have scribble died," carelessly a "Why so?""And I, shake sir," dirty said ursine Danglars, jelly "shall be most happyraise "You are certainly an pig extraordinary food throw man," said DanAs for slung Andrea, he began, by way raptorial store of by showing off, to sprang briefly talk "Not shade in the least." "Shall kettle innocently peck my father be guilty informed of your wish?" "Yes." "Because it is blow receive the leather air she wink always breathed in hertrain "Pardon me, my friend, if follow I value withstood disturb you," said the"And to design have machine drawer rock ideas," added Madame Danglars."You geoponic have no right to impulse inquisitive beg cry at night," said the groo doubt "There nation is judge picture a dupe somewhere.""Do you wish the process enjoy notary frighten bed to be sent for immediately "Yes." "At damaged band strive all events, it is neither knot you nor I." roll "But still, building if Albert long be bite not so rich as Mademoisel"I overthrow within am license mass not begging, my fine fellow," said the unkno"Oh, do canine not give me credit for this, science price osteal madame; it wacompetition fasten "Come," thrust carriage said Andrea, with sufficient nerve for his "Certainly not." "Then they shall go for fall mother permit him speed directly, grandpa deal "Yes." Valentine laugh rang prickly the beset bell, and ordered the se   "Well, then"--The mammilary man said, in rain prefer a low mark voice: "I wish--I wish you -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rivukuhgi.gif Type: image/gif Size: 5878 bytes Desc: not available URL: From halr at voltaire.com Tue Mar 27 17:45:48 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Mar 2007 19:45:48 -0500 Subject: [ofa-general] [PATCH] IB/core/user_mad.c: Add support for issmdisabled Message-ID: <1175042747.4372.104218.camel@hal.voltaire.com> IB/core/user_mad.c: Add support for issmdisabled Signed-off-by: Hal Rosenstock diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index c069ebe..9ef2603 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -1,6 +1,6 @@ /* * Copyright (c) 2004 Topspin Communications. All rights reserved. - * Copyright (c) 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2005-2007 Voltaire, Inc. All rights reserved. * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. * * This software is available to you under a choice of one of two @@ -31,7 +31,6 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: user_mad.c 5596 2006-03-03 01:00:07Z sean.hefty $ */ #include @@ -92,6 +91,8 @@ struct ib_umad_port { struct cdev *sm_dev; struct class_device *sm_class_dev; + struct cdev *smdis_dev; + struct class_device *smdis_class_dev; struct semaphore sm_sem; struct rw_semaphore mutex; @@ -135,7 +136,7 @@ static const dev_t base_dev = MKDEV(IB_U static DEFINE_SPINLOCK(port_lock); static struct ib_umad_port *umad_port[IB_UMAD_MAX_PORTS]; -static DECLARE_BITMAP(dev_map, IB_UMAD_MAX_PORTS * 2); +static DECLARE_BITMAP(dev_map, IB_UMAD_MAX_PORTS * 3); static void ib_umad_add_one(struct ib_device *device); static void ib_umad_remove_one(struct ib_device *device); @@ -852,6 +853,77 @@ static const struct file_operations umad .release = ib_umad_sm_close }; +static int ib_umad_smdis_open(struct inode *inode, struct file *filp) +{ + struct ib_umad_port *port; + struct ib_port_modify props = { + .set_port_cap_mask = IB_PORT_SM_DISABLED + }; + int ret; + + spin_lock(&port_lock); + port = umad_port[iminor(inode) - IB_UMAD_MINOR_BASE - 2*IB_UMAD_MAX_PORTS]; + if (port) + kref_get(&port->umad_dev->ref); + spin_unlock(&port_lock); + + if (!port) + return -ENXIO; + + if (filp->f_flags & O_NONBLOCK) { + if (down_trylock(&port->sm_sem)) { + ret = -EAGAIN; + goto fail; + } + } else { + if (down_interruptible(&port->sm_sem)) { + ret = -ERESTARTSYS; + goto fail; + } + + } + + ret = ib_modify_port(port->ib_dev, port->port_num, 0, &props); + if (ret) { + up(&port->sm_sem); + goto fail; + } + + filp->private_data = port; + + return 0; + +fail: + kref_put(&port->umad_dev->ref, ib_umad_release_dev); + return ret; +} + +static int ib_umad_smdis_close(struct inode *inode, struct file *filp) +{ + struct ib_umad_port *port = filp->private_data; + struct ib_port_modify props = { + .clr_port_cap_mask = IB_PORT_SM_DISABLED + }; + int ret = 0; + + down_write(&port->mutex); + if (port->ib_dev) + ret = ib_modify_port(port->ib_dev, port->port_num, 0, &props); + up_write(&port->mutex); + + up(&port->sm_sem); + + kref_put(&port->umad_dev->ref, ib_umad_release_dev); + + return ret; +} + +static const struct file_operations umad_smdis_fops = { + .owner = THIS_MODULE, + .open = ib_umad_smdis_open, + .release = ib_umad_smdis_close +}; + static struct ib_client umad_client = { .name = "umad", .add = ib_umad_add_one, @@ -947,12 +1019,41 @@ static int ib_umad_init_port(struct ib_d if (class_device_create_file(port->sm_class_dev, &class_device_attr_port)) goto err_sm_class; + port->smdis_dev = cdev_alloc(); + if (!port->smdis_dev) + goto err_sm_class; + port->smdis_dev->owner = THIS_MODULE; + port->smdis_dev->ops = &umad_smdis_fops; + kobject_set_name(&port->smdis_dev->kobj, "issmdisabled%d", port->dev_num); + if (cdev_add(port->smdis_dev, base_dev + port->dev_num + 2*IB_UMAD_MAX_PORTS, 1)) + goto err_smdis_cdev; + + port->smdis_class_dev = class_device_create(umad_class, NULL, port->smdis_dev->dev, + device->dma_device, + "issmdisabled%d", + port->dev_num); + if (IS_ERR(port->smdis_class_dev)) + goto err_smdis_cdev; + + class_set_devdata(port->smdis_class_dev, port); + + if (class_device_create_file(port->smdis_class_dev, &class_device_attr_ibdev)) + goto err_smdis_class; + if (class_device_create_file(port->smdis_class_dev, &class_device_attr_port)) + goto err_smdis_class; + spin_lock(&port_lock); umad_port[port->dev_num] = port; spin_unlock(&port_lock); return 0; +err_smdis_class: + class_device_destroy(umad_class, port->smdis_dev->dev); + +err_smdis_cdev: + cdev_del(port->smdis_dev); + err_sm_class: class_device_destroy(umad_class, port->sm_dev->dev); @@ -979,9 +1080,11 @@ static void ib_umad_kill_port(struct ib_ class_device_destroy(umad_class, port->dev->dev); class_device_destroy(umad_class, port->sm_dev->dev); + class_device_destroy(umad_class, port->smdis_dev->dev); cdev_del(port->dev); cdev_del(port->sm_dev); + cdev_del(port->smdis_dev); spin_lock(&port_lock); umad_port[port->dev_num] = NULL; From emmettsydel at eiserman.com Tue Mar 27 17:17:16 2007 From: emmettsydel at eiserman.com (thorpe harlan) Date: Wed, 28 Mar 2007 09:17:16 +0900 Subject: [ofa-general] Terence Message-ID: <316a01c770ce$70ee5570$9d00a8c0@owner> Some stubborn sprouts up through the stubble hay, And all at once it is the meadow I walked in at ten, Of Boyg of Normandy . . . Introduction by Vilhjalmur Stefansson Glimmering of light: With its lament, it often sounds, instead, The line between the outside and this room And the worlds?kiffs rudderless, rolling on?BR> By what it seems to have moved toward. In any What can we know of whatever picture-plane Coextensive with everything? How could they know? Toward . . . that seems to be the whispered question shaded by live oaks and bottlebrush trees My only thought is for what has Absurdly, my eyes can only see the arc And then I go on until I am beneath an archway, Thinking of your abiding spirit brings Are muffled into silence that refuses Oh you builders, -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 14578 bytes Desc: not available URL: From halr at voltaire.com Tue Mar 27 18:21:58 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Mar 2007 20:21:58 -0500 Subject: [ofa-general] [PATCH] OpenSM/libvendor: Add support for issmdisabled in OpenIB vendor layer Message-ID: <1175044917.4372.106398.camel@hal.voltaire.com> OpenSM/libvendor: Add support for issmdisabled in OpenIB vendor layer Signed-off-by: Hal Rosenstock diff --git a/osm/include/vendor/osm_vendor_ibumad.h b/osm/include/vendor/osm_vendor_ibumad.h index d05f4c6..ad269bd 100644 --- a/osm/include/vendor/osm_vendor_ibumad.h +++ b/osm/include/vendor/osm_vendor_ibumad.h @@ -172,6 +172,7 @@ typedef struct _osm_vendor int umad_port_id; void *receiver; int issmfd; + int issmdisabledfd; } osm_vendor_t; #define OSM_BIND_INVALID_HANDLE 0 diff --git a/osm/libvendor/ChangeLog b/osm/libvendor/ChangeLog index 0dd31e0..da241a6 100644 --- a/osm/libvendor/ChangeLog +++ b/osm/libvendor/ChangeLog @@ -1,3 +1,7 @@ +2007-03-27 Hal Rosenstock + + * osm_vendor_ibumad.(h c): Add support for issmdisabled + 2007-03-13 Hal Rosenstock * osm_vendor_ibumad.c: In osm_vendor_set_sm, set issmfd to diff --git a/osm/libvendor/osm_vendor_ibumad.c b/osm/libvendor/osm_vendor_ibumad.c index e2e1226..360b787 100644 --- a/osm/libvendor/osm_vendor_ibumad.c +++ b/osm/libvendor/osm_vendor_ibumad.c @@ -454,6 +454,7 @@ osm_vendor_init( pthread_mutex_init(&p_vend->match_tbl_mutex, NULL); p_vend->umad_port_id = -1; p_vend->issmfd = -1; + p_vend->issmdisabledfd = -1; /* * Open our instance of UMAD. @@ -1179,12 +1180,20 @@ osm_vendor_set_sm( { osm_umad_bind_info_t *p_bind = (osm_umad_bind_info_t *)h_bind; osm_vendor_t *p_vend = p_bind->p_vend; - char issmstring[24]; + char string[32]; OSM_LOG_ENTER( p_vend->p_log, osm_vendor_set_sm ); - sprintf(issmstring, "/dev/infiniband/issm%d", p_vend->umad_port_id); + sprintf(string, "/dev/infiniband/issm%d", p_vend->umad_port_id); if (TRUE == is_sm_val) { - p_vend->issmfd = open(issmstring, O_NONBLOCK); + if (p_vend->issmdisabledfd != -1) { + if (0 != close(p_vend->issmdisabledfd)) + osm_log(p_vend->p_log, OSM_LOG_ERROR, + "osm_vendor_set_sm: ERR 5433: " + "clearing IS_SMdisabled capability" + " mask failed: errno %d\n", errno); + } + p_vend->issmdisabledfd = -1; + p_vend->issmfd = open(string, O_NONBLOCK); if (p_vend->issmfd < 0) { osm_log(p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_set_sm: ERR 5431: " @@ -1193,13 +1202,23 @@ osm_vendor_set_sm( p_vend->issmfd = -1; } } else { - if (p_vend->issmfd != -1) + if (p_vend->issmfd != -1) { if (0 != close(p_vend->issmfd)) osm_log(p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_set_sm: ERR 5432: " "clearing IS_SM capability" " mask failed: errno %d\n", errno); + } p_vend->issmfd = -1; + sprintf(string, "/dev/infiniband/issmdisabled%d", p_vend->umad_port_id); + p_vend->issmdisabledfd = open(string, O_NONBLOCK); + if (p_vend->issmdisabledfd < 0) { + osm_log(p_vend->p_log, OSM_LOG_ERROR, + "osm_vendor_set_sm: ERR 5434: " + "setting IS_SMdisabled capability" + " mask failed; errno %d\n", errno); + p_vend->issmdisabledfd = -1; + } } OSM_LOG_EXIT( p_vend->p_log ); } From halr at voltaire.com Tue Mar 27 18:22:11 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Mar 2007 20:22:11 -0500 Subject: [ofa-general] [PATCH 1/2] OpenSM: Add support for making SM inactive Message-ID: <1175044921.4372.106400.camel@hal.voltaire.com> OpenSM: Add support for making SM inactive Signed-off-by: Hal Rosenstock diff --git a/osm/include/opensm/osm_subnet.h b/osm/include/opensm/osm_subnet.h index 5d1b023..ade73ac 100644 --- a/osm/include/opensm/osm_subnet.h +++ b/osm/include/opensm/osm_subnet.h @@ -283,6 +283,7 @@ typedef struct _osm_subn_opt boolean_t exit_on_fatal; boolean_t honor_guid2lid_file; boolean_t daemon; + boolean_t sm_inactive; osm_qos_options_t qos_options; osm_qos_options_t qos_ca_options; osm_qos_options_t qos_sw0_options; @@ -464,6 +465,9 @@ typedef struct _osm_subn_opt * daemon * OpenSM will run in daemon mode. * +* sm_inactive +* OpenSM will start with SM in not active state. +* * qos_options * Default set of QoS options * diff --git a/osm/opensm/osm_port_info_rcv.c b/osm/opensm/osm_port_info_rcv.c index 566b927..4a14ee0 100644 --- a/osm/opensm/osm_port_info_rcv.c +++ b/osm/opensm/osm_port_info_rcv.c @@ -74,7 +74,8 @@ static void __osm_pi_rcv_set_sm( IN const osm_pi_rcv_t* const p_rcv, - IN osm_physp_t* const p_physp ) + IN osm_physp_t* const p_physp, + IN boolean_t const is_smdis ) { osm_bind_handle_t h_bind; osm_dr_path_t *p_dr_path; @@ -85,15 +86,27 @@ __osm_pi_rcv_set_sm( { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_pi_rcv_set_sm: " - "Setting 'IS_SM' bit in port attributes\n" ); + "Setting '%s' bit in port attributes\n", + is_smdis ? "SM_DISAB" : "IS_SM"); } p_dr_path = osm_physp_get_dr_path_ptr( p_physp ); h_bind = osm_dr_path_get_bind_handle( p_dr_path ); - /* - The 'IS_SM' bit isn't already set, so set it. - */ - osm_vendor_set_sm( h_bind, TRUE ); + + if (is_smdis) + { + /* + The 'SM_DISAB' bit isn't already set, so set it. + */ + osm_vendor_set_sm( h_bind, FALSE ); + } + else + { + /* + The 'IS_SM' bit isn't already set, so set it. + */ + osm_vendor_set_sm( h_bind, TRUE ); + } OSM_LOG_EXIT( p_rcv->p_log ); } @@ -112,6 +125,7 @@ __osm_pi_rcv_process_endport( uint8_t rate, mtu; cl_qmap_t* p_sm_tbl; osm_remote_sm_t* p_sm; + boolean_t is_smdis; OSM_LOG_ENTER( p_rcv->p_log, __osm_pi_rcv_process_endport ); @@ -148,15 +162,17 @@ __osm_pi_rcv_process_endport( if( port_guid == p_rcv->p_subn->sm_port_guid ) { + is_smdis = (p_rcv->p_subn->sm_state == IB_SMINFO_STATE_NOTACTIVE); /* We received the PortInfo for our own port. */ - if( !(p_pi->capability_mask & IB_PORT_CAP_IS_SM ) ) + if( (!is_smdis && !(p_pi->capability_mask & IB_PORT_CAP_IS_SM ) ) || + ( is_smdis && !(p_pi->capability_mask & IB_PORT_CAP_SM_DISAB ) ) ) { /* - Set the IS_SM bit to indicate our port hosts an SM. + Set the IS_SM or SM_DISAB bit to indicate our port hosts an SM. */ - __osm_pi_rcv_set_sm( p_rcv, p_physp ); + __osm_pi_rcv_set_sm( p_rcv, p_physp, is_smdis ); } } else diff --git a/osm/opensm/osm_sm_state_mgr.c b/osm/opensm/osm_sm_state_mgr.c index 61492b7..fc68f7e 100644 --- a/osm/opensm/osm_sm_state_mgr.c +++ b/osm/opensm/osm_sm_state_mgr.c @@ -449,8 +449,17 @@ osm_sm_state_mgr_init( p_sm_mgr->p_subn = p_subn; p_sm_mgr->p_state_mgr = p_state_mgr; - /* init the state of the SM to init */ - p_sm_mgr->p_subn->sm_state = IB_SMINFO_STATE_INIT; + if (p_subn->opt.sm_inactive) + { + /* init the state of the SM to not active */ + p_sm_mgr->p_subn->sm_state = IB_SMINFO_STATE_NOTACTIVE; + __osm_sm_state_mgr_notactive_msg( p_sm_mgr ); + } + else + { + /* init the state of the SM to init */ + p_sm_mgr->p_subn->sm_state = IB_SMINFO_STATE_INIT; + } status = cl_spinlock_init( &p_sm_mgr->state_lock ); if( status != CL_SUCCESS ) diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c index f3450d1..326b642 100644 --- a/osm/opensm/osm_subnet.c +++ b/osm/opensm/osm_subnet.c @@ -461,6 +461,7 @@ osm_subn_set_default_opt( p_opt->log_flags = 0; p_opt->honor_guid2lid_file = FALSE; p_opt->daemon = FALSE; + p_opt->sm_inactive = FALSE; p_opt->dump_files_dir = getenv("OSM_TMP_DIR"); if (!p_opt->dump_files_dir || !(*p_opt->dump_files_dir)) @@ -1056,6 +1057,10 @@ osm_subn_parse_conf_file( "daemon", p_key, p_val, &p_opts->daemon); + __osm_subn_opts_unpack_boolean( + "sm_inactive", + p_key, p_val, &p_opts->sm_inactive); + subn_parse_qos_options("qos", p_key, p_val, &p_opts->qos_options); @@ -1291,8 +1296,11 @@ osm_subn_write_conf_file( opts_file, "#\n# MISC OPTIONS\n#\n" "# Daemon mode\n" - "daemon %s\n\n", - p_opts->daemon ? "TRUE" : "FALSE" + "daemon %s\n\n" + "# SM Inactive\n" + "sm_inactive %s\n\n", + p_opts->daemon ? "TRUE" : "FALSE", + p_opts->sm_inactive ? "TRUE" : "FALSE" ); fprintf( From halr at voltaire.com Tue Mar 27 18:22:15 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Mar 2007 20:22:15 -0500 Subject: [ofa-general] [PATCH 2/2] OpenSM: Add support for making SM inactive Message-ID: <1175044930.4372.106402.camel@hal.voltaire.com> OpenSM: Add support for making SM inactive Signed-off-by: Hal Rosenstock diff --git a/osm/man/opensm.8 b/osm/man/opensm.8 index 4d1e45b..5e8a1bd 100644 --- a/osm/man/opensm.8 +++ b/osm/man/opensm.8 @@ -5,7 +5,7 @@ opensm \- InfiniBand subnet manager and .SH SYNOPSIS .B opensm -[\-c(ache-options)] [\-g(uid)[=]] [\-l(mc) ] [\-p(riority) ] [\-smkey ] [\-r(eassign_lids)] [\-R | \-routing_engine ] [\-M | \-lid_matrix_file ] [\-U | \-ucast_file ] [\-S | \-\-sadb_file ] [\-a(dd_guid_file) ] [\-o(nce)] [\-s(weep) ] [\-t(imeout) ] [\-maxsmps ] [\-console [off | local | socket]] [\-console-port ] [\-i(gnore-guids) ] [\-f | \-\-log_file] [\-L | \-\-log_limit ] [\-e(rase_log_file)] [\-P(config)] [\-Q | \-qos] [\-N | \-no_part_enforce] [\-y | \-stay_on_fatal] [\-B | \-daemon] [\-v(erbose)] [\-V] [\-D ] [\-d(ebug) ] [\-h(elp)] [\-?] +[\-c(ache-options)] [\-g(uid)[=]] [\-l(mc) ] [\-p(riority) ] [\-smkey ] [\-r(eassign_lids)] [\-R | \-routing_engine ] [\-M | \-lid_matrix_file ] [\-U | \-ucast_file ] [\-S | \-\-sadb_file ] [\-a(dd_guid_file) ] [\-o(nce)] [\-s(weep) ] [\-t(imeout) ] [\-maxsmps ] [\-console [off | local | socket]] [\-console-port ] [\-i(gnore-guids) ] [\-f | \-\-log_file] [\-L | \-\-log_limit ] [\-e(rase_log_file)] [\-P(config)] [\-Q | \-qos] [\-N | \-no_part_enforce] [\-y | \-stay_on_fatal] [\-B | \-daemon] [\-I | \-inactive] [\-v(erbose)] [\-V] [\-D ] [\-d(ebug) ] [\-h(elp)] [\-?] .SH DESCRIPTION .PP @@ -185,6 +185,9 @@ By default, the SM will exit on these er \fB\-B\fR, \fB\-\-daemon\fR Run in daemon mode - OpenSM will run in the background. .TP +\fB\-I\fR, \fB\-\-inactive\fR +Start SM in inactive rather than normal init SM state. +.TP \fB\-v\fR, \fB\-\-verbose\fR This option increases the log verbosity level. The -v option may be specified multiple times diff --git a/osm/opensm/main.c b/osm/opensm/main.c index 77ca343..a3f892b 100644 --- a/osm/opensm/main.c +++ b/osm/opensm/main.c @@ -270,6 +270,9 @@ show_usage(void) printf( "-B\n" "--daemon\n" " Run in daemon mode - OpenSM will run in the background.\n\n"); + printf("-I\n" + "--inactive\n" + " Start SM in inactive rather than normal init SM state.\n\n"); printf( "-v\n" "--verbose\n" " This option increases the log verbosity level.\n" @@ -582,7 +585,7 @@ main( boolean_t cache_options = FALSE; char *ignore_guids_file_name = NULL; uint32_t val; - const char * const short_option = "i:f:ed:g:l:L:s:t:a:R:M:U:S:P:NBQvVhorcyx"; + const char * const short_option = "i:f:ed:g:l:L:s:t:a:R:M:U:S:P:NBIQvVhorcyx"; /* In the array below, the 2nd parameter specifies the number @@ -627,6 +630,7 @@ main( { "console-port", 1, NULL, 'C'}, #endif { "daemon", 0, NULL, 'B'}, + { "inactive", 0, NULL, 'I'}, { NULL, 0, NULL, 0 } /* Required at the end of the array */ }; @@ -895,7 +899,12 @@ main( case 'B': opt.daemon = TRUE; - printf (" Daemon mode.\n"); + printf (" Daemon mode\n"); + break; + + case 'I': + opt.sm_inactive = TRUE; + printf(" SM started in inactive state\n"); break; case 'h': From bgkjjk at wzdbcorp.com Tue Mar 27 17:38:31 2007 From: bgkjjk at wzdbcorp.com (Schook) Date: Wed, 28 Mar 2007 09:38:31 +0900 Subject: [ofa-general] Transaction Manager position. REF: 5798 Message-ID: <001701c770d9$ca9a0d7a$809afea9@l> A Finnish design and production- corporation specialized in the stone industry, which utilizes the ability rich in tradition with new technology offers- positions in - Project Management as a consequence of the - marketing expansion. Here are our requirements to - candidate who would like to apply for this- position: - High communication skills. Ability to work with people face-to-face, telephone conversation, business correspondence etc. - analytical and organizational skills- basic knowledge of- marketing management- access to the Internet - knowledge of Microsoft Outlook, Microsoft Word. - age 21 + Working with us you will be granted with different benefits,for example, you will become our initial sales- manager, meaning that you will also be a middleman between the- customer and the- client from your country. Doing your work you will gain authority with people from all around the world relying on your service. There will be no delays or hold-ups concerning- wages and moreover you will receive weekly bonuses for outstanding performance. Above all, possible career growth is also guaranteed! APPLICATION FORM: 1. Your full name: 2. Your country: 3. Your full address: 4. Your mobile contact phone: 5. Your home contact phone: 6. Your contact email: If you have what it takes to be successful in this position, and are interested in a career with real progression opportunities then apply online, send your CV`s to HR DEPARTMENT . Msg-Id: 799642960 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at dev.mellanox.co.il Tue Mar 27 22:31:59 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Mar 2007 07:31:59 +0200 Subject: [ofa-general] Re: [PATCH] IB/core/user_mad.c: Add support for issmdisabled In-Reply-To: <1175042747.4372.104218.camel@hal.voltaire.com> References: <1175042747.4372.104218.camel@hal.voltaire.com> Message-ID: <20070328053159.GB32306@mellanox.co.il> > Quoting Hal Rosenstock : > Subject: [PATCH] IB/core/user_mad.c: Add support for issmdisabled > > IB/core/user_mad.c: Add support for issmdisabled > > Signed-off-by: Hal Rosenstock I imagine this is related to the "SM inactive" support you posted separately. Is that right? How is this used by openSM? -- MST From sweitzen at cisco.com Tue Mar 27 23:34:15 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 27 Mar 2007 23:34:15 -0700 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <20070327100256.GL6661@mellanox.co.il> References: <20070327085900.GJ6661@mellanox.co.il><20070327090136.GK6661@mellanox.co.il> <20070327100256.GL6661@mellanox.co.il> Message-ID: I wasn't trying to remove any modules. I'll try to get you more info, but can you try to reproduce it there? This would be a good test for Mellanox, the developers of this feature, to run regularly. Scott From mst at dev.mellanox.co.il Wed Mar 28 00:05:14 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Mar 2007 09:05:14 +0200 Subject: [ofa-general] Re: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: References: <20070327100256.GL6661@mellanox.co.il> Message-ID: <20070328070514.GA8649@mellanox.co.il> > This would be a good test for Mellanox, the developers of this feature, > to run regularly. The difficulty is that without managed switch we can't shut down specific ports easily. I'll try to think of something. -- MST From amip at dev.mellanox.co.il Wed Mar 28 00:15:38 2007 From: amip at dev.mellanox.co.il (Ami Perlmutter) Date: Wed, 28 Mar 2007 09:15:38 +0200 Subject: [ofa-general] madeye kernel oops In-Reply-To: <000301c770a2$9d1fac60$73248686@amr.corp.intel.com> References: <000301c770a2$9d1fac60$73248686@amr.corp.intel.com> Message-ID: <1175066138.14461.2.camel@Ami-desktop> On Tue, 2007-03-27 at 12:03 -0700, Sean Hefty wrote: > How easily can you reproduce this? I'm assuming that this is with OFED 1.2 on > 2.6.20, correct? yes > Can you describe what you were doing when this crash occurred? opensm was running on the other computer running SDP programs > Thanks, > Sean > > >Unable to handle kernel NULL pointer dereference at 0000000000000038 > >RIP: > > [] :ib_mad:ib_unregister_mad_agent+0x11/0x480 > >PGD 73387067 PUD 72844067 PMD 0 > >Oops: 0000 [1] SMP > >CPU 0 > >Modules linked in: ib_madeye i2c_dev i2c_core ib_sdp rdma_cm iw_cm > >ib_addr ib_local_sa ib_uverbs ib_umad ib_mthca ib_ipoib ib_cm ib_sa > >ib_mad ib_core > >Pid: 8917, comm: rmmod Not tainted 2.6.20 #1 > >RIP: 0010:[] > >[] :ib_mad:ib_unregister_mad_agent+0x11/0x480 > >RSP: 0000:ffff810071ee1e08 EFLAGS: 00010292 > >RAX: 0000000000000000 RBX: 0000000000000020 RCX: 000000000000003f > >RDX: ffff810077ebd6c0 RSI: 0000000000000202 RDI: 0000000000000000 > >RBP: 0000000000000000 R08: ffff810077ebd728 R09: 0000000000000003 > >R10: 0000000000000000 R11: 0000000000000000 R12: ffff8100766c33c0 > >R13: 0000000000000002 R14: 0000000000000880 R15: 0000000000503010 > >FS: 00002b3d6689fb00(0000) GS:ffffffff80702000(0000) > >knlGS:0000000000000000 > >CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > >CR2: 0000000000000038 CR3: 0000000071086000 CR4: 00000000000006e0 > >Process rmmod (pid: 8917, threadinfo ffff810071ee0000, task > >ffff8100781aeee0) > >Stack: ffff810071ee1e18 ffffffff8022b92f ffff810071ee1e28 > >ffffffff80538b43 > > ffff810071ee1ea8 ffffffff80538ea2 ffffffff80690880 ffff810071ee1e78 > > 000000000000000f 0000000000000020 0000000000000002 ffff8100766c33c0 > >Call Trace: > > [] __cond_resched+0x1c/0x44 > > [] cond_resched+0x2e/0x39 > > [] wait_for_completion+0x1a/0xd0 > > [] :ib_madeye:madeye_remove_one+0x56/0x88 > > [] :ib_core:ib_unregister_client+0x40/0xe2 > > [] sys_delete_module+0x1b4/0x1e5 > > [] add_uevent_var+0x40/0xe3 > > [] sys_munmap+0x4b/0x58 > > [] system_call+0x7e/0x83 > > > > > >Code: 83 7f 38 00 0f 84 fd 03 00 00 48 8d 44 24 20 4c 8d 67 f0 48 > >RIP [] :ib_mad:ib_unregister_mad_agent+0x11/0x480 > > RSP > >CR2: 0000000000000038 From sweitzen at cisco.com Wed Mar 28 00:22:03 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 28 Mar 2007 00:22:03 -0700 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <20070328070514.GA8649@mellanox.co.il> References: <20070327100256.GL6661@mellanox.co.il> <20070328070514.GA8649@mellanox.co.il> Message-ID: There's no way to shut down an IB switch port with opensm or any OFED diags? Yuck... Scott > -----Original Message----- > From: Michael S. Tsirkin [mailto:mst at dev.mellanox.co.il] > Sent: Wednesday, March 28, 2007 12:05 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Michael S. Tsirkin; general at lists.openfabrics.org; Roland > Dreier; bugmail at lists.openfabrics.org > Subject: Re: [Bug 465] IPoIB CM HA fails after several hours > of failures > > > This would be a good test for Mellanox, the developers of > this feature, > > to run regularly. > > The difficulty is that without managed switch we can't shut down > specific ports easily. I'll try to think of something. > > -- > MST > From tarranceflinn at umts-no1.com Wed Mar 28 00:48:31 2007 From: tarranceflinn at umts-no1.com (ker adriane) Date: Wed, 28 Mar 2007 16:48:31 +0900 Subject: [ofa-general] Dexter Message-ID: <4bbc01c7710d$7b289840$f33ceeda@youre5ab215c2e> Preface to the 1970 Edition Down the long course of the gray slush of things Astonished that you have returned to go As distant memories, through the fog-dimmed light, Toward the still dab of white that oscillates Billows the fog, cloaks Out of the picture of life, as it were, out will be penciled on the coffeeshop menus. Only whirled snow heaped up by whirled snow, That rings, with faithful tongue, its pious note It is as though I were at a second threshold. Although December's frost killed the winter crop, with visors. Their brave recreational vehicles Where lamps are lit: these, too, Scrawny wolves, and you, Where, as I discover as I go through Or else, like us, sunk into some long gaze The line between the outside and this room For any part of them we can make out -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 14657 bytes Desc: not available URL: From mst at dev.mellanox.co.il Wed Mar 28 01:36:37 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Mar 2007 10:36:37 +0200 Subject: [ofa-general] Re: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: References: <20070328070514.GA8649@mellanox.co.il> Message-ID: <20070328083637.GA11695@mellanox.co.il> > > -----Original Message----- > > From: Michael S. Tsirkin [mailto:mst at dev.mellanox.co.il] > > Sent: Wednesday, March 28, 2007 12:05 AM > > To: Scott Weitzenkamp (sweitzen) > > Cc: Michael S. Tsirkin; general at lists.openfabrics.org; Roland > > Dreier; bugmail at lists.openfabrics.org > > Subject: Re: [Bug 465] IPoIB CM HA fails after several hours > > of failures > > > > > This would be a good test for Mellanox, the developers of > > this feature, > > > to run regularly. > > > > The difficulty is that without managed switch we can't shut down > > specific ports easily. I'll try to think of something. > > Quoting Scott Weitzenkamp (sweitzen) : > Subject: RE: [Bug 465] IPoIB CM HA fails after several hours of failures > > There's no way to shut down an IB switch port with opensm or any OFED > diags? Yuck... > > Scott Maybe something can be done with the opensm console. But I don't know where to find documentation for it. -- MST From monil at voltaire.com Wed Mar 28 02:00:57 2007 From: monil at voltaire.com (Moni Levy) Date: Wed, 28 Mar 2007 11:00:57 +0200 Subject: [ofa-general] Re: pkey change handling patch (was Re: bugs to fix for OFED 1.2 RC1) In-Reply-To: <20070327205213.GD28347@mellanox.co.il> References: <6a122cc00703220602s7cdad558ud73f72e39f812eaf@mail.gmail.com> <20070322172245.GB17532@mellanox.co.il> <46094DA5.8000601@gmail.com> <20070327205213.GD28347@mellanox.co.il> Message-ID: <6a122cc00703280200h33f384b9jae75592294a9cbd9@mail.gmail.com> On 3/27/07, Michael S. Tsirkin wrote: > > Changed from v3 > > * added a flush_scheduled_work call before we restart the QP in order > > to ensure that the pkey table we read from the cache is updated > > > +void ipoib_ib_dev_restart_qp(struct work_struct *work) > +{ > + struct ipoib_dev_priv *priv = > + container_of(work, struct ipoib_dev_priv, restart_qp_task); > + /* We only restart the QP in case of pkey change event */ > + ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", priv->dev->name); > + /* Ensures the pkey table we read from the cache is updated properly */ > + flush_scheduled_work(); > + __ipoib_ib_dev_flush(priv, 1); > +} > + > > I think doing flush_scheduled_work from inside the ipoib workqueue > can trigger deadlocks - which deadlocks the workqueue was > created to avoid, in the first place. Look at the comment > in ipoib_main.c where the WQ is created. /* * We create our own workqueue mainly because we want to be * able to flush it when devices are being removed. We can't * use schedule_work()/flush_scheduled_work() because both * unregister_netdev() and linkwatch_event take the rtnl lock, * so flush_scheduled_work() can deadlock during device * removal. */ I read that few times and I understand it as : ipoib workqueue was added because if the default system workqueue was used unregister_netdev() and linkwatch_event() would deadlock. Am I wrong at assuming that after we added the ipoib workqueue we can call flush_scheduled_work ? > > And, I don't think that depending on the fact that the cache > uses a default schedule queue internally is such a good idea. You're definitely right. The coherency enforcement should be inside the cache implementation. We can move the call to flush_scheduled_work() inside the ib_find_cached_pkey and ib_get_cached_pkey. What do you think ? > > How about simply requeueing the work again if the cache query failed? The problem is that it does not fail. It returns a non coherent result, which just don't reflect the pkey table change. -- Moni > > -- > MST > From mst at dev.mellanox.co.il Wed Mar 28 02:33:45 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Mar 2007 11:33:45 +0200 Subject: [ofa-general] Re: pkey change handling patch (was Re: bugs to fix for OFED 1.2 RC1) In-Reply-To: <6a122cc00703280200h33f384b9jae75592294a9cbd9@mail.gmail.com> References: <6a122cc00703220602s7cdad558ud73f72e39f812eaf@mail.gmail.com> <20070322172245.GB17532@mellanox.co.il> <46094DA5.8000601@gmail.com> <20070327205213.GD28347@mellanox.co.il> <6a122cc00703280200h33f384b9jae75592294a9cbd9@mail.gmail.com> Message-ID: <20070328093345.GD11695@mellanox.co.il> > Quoting Moni Levy : > Subject: Re: pkey change handling patch (was Re: bugs to fix for OFED 1.2 RC1) > > On 3/27/07, Michael S. Tsirkin wrote: > >> Changed from v3 > >> * added a flush_scheduled_work call before we restart the QP in > >order > >> to ensure that the pkey table we read from the cache is updated > > > > > >+void ipoib_ib_dev_restart_qp(struct work_struct *work) > >+{ > >+ struct ipoib_dev_priv *priv = > >+ container_of(work, struct ipoib_dev_priv, restart_qp_task); > >+ /* We only restart the QP in case of pkey change event */ > >+ ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", > >priv->dev->name); > >+ /* Ensures the pkey table we read from the cache is updated > >properly */ > >+ flush_scheduled_work(); > >+ __ipoib_ib_dev_flush(priv, 1); > >+} > >+ > > > >I think doing flush_scheduled_work from inside the ipoib workqueue > >can trigger deadlocks - which deadlocks the workqueue was > >created to avoid, in the first place. Look at the comment > >in ipoib_main.c where the WQ is created. > > /* > * We create our own workqueue mainly because we want to be > * able to flush it when devices are being removed. We can't > * use schedule_work()/flush_scheduled_work() because both > * unregister_netdev() and linkwatch_event take the rtnl lock, > * so flush_scheduled_work() can deadlock during device > * removal. > */ > > > I read that few times and I understand it as : ipoib workqueue was > added because if the default system workqueue was used > unregister_netdev() and linkwatch_event() would deadlock. Yes. What you are doing is blocking ipoib workqueue until system workqueue is flushed. So now flushing the ipoib workqueue would deadlock. > Am I wrong > at assuming that after we added the ipoib workqueue we can call > flush_scheduled_work ? Yes :) It's not enough to add the workqueue - you must also use it. > > > >And, I don't think that depending on the fact that the cache > >uses a default schedule queue internally is such a good idea. > > You're definitely right. The coherency enforcement should be inside > the cache implementation. > > We can move the call to > flush_scheduled_work() inside the ib_find_cached_pkey and > ib_get_cached_pkey. What do you think ? I think the current rule is that it is legal to call these in atomic context so we can't block there. Further, since with your patch ib_find_cached_pkey is called from ipoib workqueue, the deadlock would still stay, I think. > > > >How about simply requeueing the work again if the cache query failed? > > The problem is that it does not fail. It returns a non coherent > result, which just don't reflect the pkey table change. I looked at cache.c and you are right. Maybe we should either 1. report events after cache has been updated or 2. make cache queries error out (EBUSY?) if cache hs not updated yet. Option 1 requires core changes, option 2 - ULP changes I would be inclined to go for 2. Roland? -- MST From vlad at lists.openfabrics.org Wed Mar 28 02:35:39 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Wed, 28 Mar 2007 02:35:39 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070328-0200 daily build status Message-ID: <20070328093540.37C22E6081B@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.12 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.14 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.18 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.13 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Failed: From kliteyn at dev.mellanox.co.il Wed Mar 28 05:05:13 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 28 Mar 2007 14:05:13 +0200 Subject: [ofa-general] [PATCH] OpenSM/osm_sm_state_mgr.c: In __osm_sm_state_mgr_send_master_sm_info_req, handle master GUID port not found In-Reply-To: <1175013740.4372.73340.camel@hal.voltaire.com> References: <1175013740.4372.73340.camel@hal.voltaire.com> Message-ID: <460A59F9.4000109@dev.mellanox.co.il> Hal Rosenstock wrote: > OpenSM/osm_sm_state_mgr.c: In > __osm_sm_state_mgr_send_master_sm_info_req, handle master GUID port not > found properly > > Signed-off-by: Hal Rosenstock > > diff --git a/osm/opensm/osm_sm_state_mgr.c b/osm/opensm/osm_sm_state_mgr.c > index 41153fc..002821b 100644 > --- a/osm/opensm/osm_sm_state_mgr.c > +++ b/osm/opensm/osm_sm_state_mgr.c > @@ -231,6 +231,11 @@ __osm_sm_state_mgr_send_master_sm_info_r > */ > p_port = ( osm_port_t * ) cl_qmap_get( &p_sm_mgr->p_subn->port_guid_tbl, > p_sm_mgr->master_guid ); > + if( p_port == > + ( osm_port_t * ) cl_qmap_end( &p_sm_mgr->p_subn->port_guid_tbl ) ) > + { > + p_port = NULL; > + } Good catch. Just curios - did you find it simply by code review or did you actually see a case when there was no port object for the master_guid? --Yevgeny > } > else > { > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From monisonlists at gmail.com Wed Mar 28 05:01:57 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Wed, 28 Mar 2007 14:01:57 +0200 Subject: [ofa-general] [RFC] [PATCH v4] IB/ipoib: Add bonding support to IPoIB Message-ID: <460A5935.7080104@gmail.com> Hi, The previous version of the patch changed ipoib_neigh_destructor to take the pointer to neigh outside the lock. This might be a risk so I wrote a new version of this patch so the change would affect only when bonding is used. Please see below... ------------------------------------------------------------------------------------- IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb->dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Under this scheme, ipoib_neigh_destructor assumption that for each struct neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev) can be casted to struct ipoib_dev_priv is buggy. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one, when n->dev->flags has the IFF_MASTER bit set. Signed-off-by: Moni Shoua Signed-off-by: Or Gerlitz --- ipoib.h | 4 +++- ipoib_main.c | 17 +++++++++++++++-- ipoib_multicast.c | 2 +- 3 files changed, 19 insertions(+), 4 deletions(-) Index: ofed_1_2/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- ofed_1_2.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-28 09:42:44.000000000 +0200 +++ ofed_1_2/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-28 10:10:28.519867061 +0200 @@ -216,6 +216,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_head list; }; @@ -232,7 +233,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; Index: ofed_1_2/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- ofed_1_2.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-28 09:42:44.000000000 +0200 +++ ofed_1_2/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-28 13:11:21.939306498 +0200 @@ -490,7 +490,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb->dst->neighbour); + neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (!neigh) { ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); @@ -773,6 +773,16 @@ static void ipoib_neigh_destructor(struc unsigned long flags; struct ipoib_ah *ah = NULL; + if (n->dev->flags & IFF_MASTER) { + /* n->dev is not an IPoIB device and we have to take priv from elsewhere */ + neigh = *to_ipoib_neigh(n); + if (neigh){ + priv = netdev_priv(neigh->dev); + ipoib_dbg(priv, "neigh_destructor for bonding device: %s\n", + n->dev->name); + } else + return; + } ipoib_dbg(priv, "neigh_destructor for %06x " IPOIB_GID_FMT "\n", IPOIB_QPN(n->ha), @@ -794,7 +804,9 @@ static void ipoib_neigh_destructor(struc ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) + { struct ipoib_neigh *neigh; @@ -803,6 +815,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st return NULL; neigh->neighbour = neighbour; + neigh->dev = dev; *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(&neigh->queue); Index: ofed_1_2/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- ofed_1_2.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-28 09:42:44.000000000 +0200 +++ ofed_1_2/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-28 10:10:28.521866685 +0200 @@ -774,7 +774,7 @@ out: if (skb->dst && skb->dst->neighbour && !*to_ipoib_neigh(skb->dst->neighbour)) { - struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour); + struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (neigh) { kref_get(&mcast->ah->ref); From mst at dev.mellanox.co.il Wed Mar 28 05:15:19 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Mar 2007 14:15:19 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v4] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <460A5935.7080104@gmail.com> References: <460A5935.7080104@gmail.com> Message-ID: <20070328121519.GI11695@mellanox.co.il> > Quoting Moni Shoua : > Subject: [RFC] [PATCH v4] IB/ipoib: Add bonding support to IPoIB > > Hi, > > The previous version of the patch changed ipoib_neigh_destructor to take > the pointer to neigh outside the lock. This might be a risk so I wrote a > new version of this patch so the change would affect only when bonding is used. > Please see below... This looks pretty safe for OFED 1.2 insofar as this won't affect someone not using bonding. Long term, we still need to find a proper solution for whenever this bonding related code goes GA/upstream. -- MST From mst at dev.mellanox.co.il Wed Mar 28 05:48:10 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Mar 2007 14:48:10 +0200 Subject: [ofa-general] opensm console -what is it? Message-ID: <20070328124810.GJ11695@mellanox.co.il> I see this (man opensm): -console [off | local | socket] This option brings up the OpenSM console (default off). Note that the socket option will only be available if OpenSM --enable-console-socket. -console-port Specify an alternate telnet port for the socket console (default 10000). Note that this option only appears if OpenSM was built with --enable-con- sole-socket. But what is it? Let's try: # /usr/local/ofed/bin/opensm -console local ------------------------------------------------- OpenSM Rev:openib-3.0.8 Command Line Arguments: Log File: /var/log/osm.log ------------------------------------------------- OpenSM Rev:openib-3.0.8 Using default GUID 0x2c9020020ee11 OpenSM $ Entering MASTER state SUBNET UP Error parsing command line: OpenSM $ help Supported commands and syntax: help [] quit loglevel [] priority [] resweep [heavy|light] status [loop] logflush -- flush the osm.log file OpenSM $ quit OpenSM $ Error parsing command line: OpenSM $ exit exit : Command not found Supported commands and syntax: help [] quit loglevel [] priority [] resweep [heavy|light] status [loop] logflush -- flush the osm.log file OpenSM $ Hmm .. how does one exit? -- MST From monil at voltaire.com Wed Mar 28 05:48:52 2007 From: monil at voltaire.com (Moni Levy) Date: Wed, 28 Mar 2007 14:48:52 +0200 Subject: [ofa-general] Re: pkey change handling patch (was Re: bugs to fix for OFED 1.2 RC1) In-Reply-To: <20070328093345.GD11695@mellanox.co.il> References: <6a122cc00703220602s7cdad558ud73f72e39f812eaf@mail.gmail.com> <20070322172245.GB17532@mellanox.co.il> <46094DA5.8000601@gmail.com> <20070327205213.GD28347@mellanox.co.il> <6a122cc00703280200h33f384b9jae75592294a9cbd9@mail.gmail.com> <20070328093345.GD11695@mellanox.co.il> Message-ID: <6a122cc00703280548h3d0da818i14e7619afd9efb74@mail.gmail.com> On 3/28/07, Michael S. Tsirkin wrote: > > Quoting Moni Levy : > > Subject: Re: pkey change handling patch (was Re: bugs to fix for OFED 1.2 RC1) > > > > On 3/27/07, Michael S. Tsirkin wrote: > > >> Changed from v3 > > >> * added a flush_scheduled_work call before we restart the QP in > > >order > > >> to ensure that the pkey table we read from the cache is updated > > > > > > > > >+void ipoib_ib_dev_restart_qp(struct work_struct *work) > > >+{ > > >+ struct ipoib_dev_priv *priv = > > >+ container_of(work, struct ipoib_dev_priv, restart_qp_task); > > >+ /* We only restart the QP in case of pkey change event */ > > >+ ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", > > >priv->dev->name); > > >+ /* Ensures the pkey table we read from the cache is updated > > >properly */ > > >+ flush_scheduled_work(); > > >+ __ipoib_ib_dev_flush(priv, 1); > > >+} > > >+ > > > > > >I think doing flush_scheduled_work from inside the ipoib workqueue > > >can trigger deadlocks - which deadlocks the workqueue was > > >created to avoid, in the first place. Look at the comment > > >in ipoib_main.c where the WQ is created. > > > > /* > > * We create our own workqueue mainly because we want to be > > * able to flush it when devices are being removed. We can't > > * use schedule_work()/flush_scheduled_work() because both > > * unregister_netdev() and linkwatch_event take the rtnl lock, > > * so flush_scheduled_work() can deadlock during device > > * removal. > > */ > > > > > > I read that few times and I understand it as : ipoib workqueue was > > added because if the default system workqueue was used > > unregister_netdev() and linkwatch_event() would deadlock. > > Yes. What you are doing is blocking ipoib workqueue until > system workqueue is flushed. So now flushing the ipoib workqueue > would deadlock. > > > Am I wrong > > at assuming that after we added the ipoib workqueue we can call > > flush_scheduled_work ? > > Yes :) > It's not enough to add the workqueue - you must also use it. > > > > > > >And, I don't think that depending on the fact that the cache > > >uses a default schedule queue internally is such a good idea. > > > > You're definitely right. The coherency enforcement should be inside > > the cache implementation. > > > > We can move the call to > > flush_scheduled_work() inside the ib_find_cached_pkey and > > ib_get_cached_pkey. What do you think ? > > I think the current rule is that it is legal to call these > in atomic context so we can't block there. Right > > Further, since with your patch ib_find_cached_pkey is called from > ipoib workqueue, the deadlock would still stay, I think. Ok > > > > > > >How about simply requeueing the work again if the cache query failed? > > > > The problem is that it does not fail. It returns a non coherent > > result, which just don't reflect the pkey table change. > > I looked at cache.c and you are right. Maybe we should either > 1. report events after cache has been updated This one should block if not ready right ? That won't work with the atomic context calling. > or > 2. make cache queries error out (EBUSY?) if cache hs not updated yet. > That sounds like a good idea. The problem is that we then should update all the consumers of these two calls with the additional return code handling, although it sounds like not a very big change. -- Moni > Option 1 requires core changes, option 2 - ULP changes > > I would be inclined to go for 2. > > Roland? > > -- > MST > From mst at dev.mellanox.co.il Wed Mar 28 06:01:27 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Mar 2007 15:01:27 +0200 Subject: [ofa-general] Re: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: References: <20070328070514.GA8649@mellanox.co.il> Message-ID: <20070328130127.GK11695@mellanox.co.il> > Quoting Scott Weitzenkamp (sweitzen) : > Subject: RE: [Bug 465] IPoIB CM HA fails after several hours of failures > > There's no way to shut down an IB switch port with opensm or any OFED > diags? Yuck... It seems ibportstate will do the job. OK, I'll try to script something. -- MST From sashak at voltaire.com Wed Mar 28 07:09:23 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 28 Mar 2007 16:09:23 +0200 Subject: [ofa-general] Re: opensm console -what is it? In-Reply-To: <20070328124810.GJ11695@mellanox.co.il> References: <20070328124810.GJ11695@mellanox.co.il> Message-ID: <1175090963.11973.14.camel@localhost> On Wed, 2007-03-28 at 14:48 +0200, Michael S. Tsirkin wrote: > I see this (man opensm): > > -console [off | local | socket] > This option brings up the OpenSM console (default off). Note that the > socket option will only be available if OpenSM --enable-console-socket. > > -console-port > Specify an alternate telnet port for the socket console (default 10000). > Note that this option only appears if OpenSM was built with --enable-con- > sole-socket. > > But what is it? This is simple console interface for OpenSM. > Let's try: > > # /usr/local/ofed/bin/opensm -console local > ------------------------------------------------- > OpenSM Rev:openib-3.0.8 > Command Line Arguments: > Log File: /var/log/osm.log > ------------------------------------------------- > OpenSM Rev:openib-3.0.8 > > Using default GUID 0x2c9020020ee11 > OpenSM $ Entering MASTER state > > SUBNET UP > > > Error parsing command line: > > OpenSM $ help > Supported commands and syntax: > help [] > quit > loglevel [] > priority [] > resweep [heavy|light] > status [loop] > logflush -- flush the osm.log file > OpenSM $ quit > OpenSM $ > Error parsing command line: > > OpenSM $ exit > exit : Command not found > > Supported commands and syntax: > help [] > quit > loglevel [] > priority [] > resweep [heavy|light] > status [loop] > logflush -- flush the osm.log file > OpenSM $ > > Hmm .. how does one exit? You cannot exit from local console, 'quit' will work with remote console. If you want just to kill OpenSM you can use ^C. Sasha From bs at q-leap.de Wed Mar 28 06:46:13 2007 From: bs at q-leap.de (Bernd Schubert) Date: Wed, 28 Mar 2007 15:46:13 +0200 Subject: [ofa-general] lustre problem Message-ID: <200703281546.13684.bs@q-leap.de> Hi, with 2.6.20.4 and lustre-1.4.9 we get an oops, see below. In principle it also could be a lustre problem, but with mellanox cards it just works fine. [ 195.339317] Lustre: Added LNI 192.168.41.106 at o2ib [8/64] [ 195.352336] Lustre: Added LNI 192.168.42.106 at tcp [8/256] [ 195.357796] Lustre: Accept secure, port 988 [ 195.412988] Lustre: Lustre Lite Client File System; info at clusterfs.com [ 195.449596] Unable to handle kernel paging request at 000000007740b000 RIP: [ 195.454249] [] __iowrite32_copy+0x2/0x8 [ 195.462306] PGD 11ac87067 PUD 0 [ 195.465648] Oops: 0000 [1] SMP Entering kdb (current=0xffff81007755c100, pid 3191) on processor 3 Oops: due to oops @ 0xffffffff803513d2 r15 = 0x0000000000000005 r14 = 0x0000000000000168 r13 = 0x000000007740b000 r12 = 0xffffc200001d601c rbp = 0xffff81007c083a60 rbx = 0x0000000000000059 r11 = 0x0000000000000000 r10 = 0xffff810076bc4000 r9 = 0xffff810076bc4000 r8 = 0xffff81007ccf2ec8 rax = 0x0000000000000000 rcx = 0x0000000000000059 rdx = 0x0000000000000059 rsi = 0x000000007740b000 rdi = 0xffffc200001d601c orig_rax = 0xffffffffffffffff rip = 0xffffffff803513d2 cs = 0x0000000000000010 eflags = 0x0000000000010206 rsp = 0xffff81007c0839f0 ss = 0x0000000000000000 ®s = 0xffff81007c083958 [3]kdb> bt Stack traceback for pid 3191 0xffff81007755c100 3191 19 1 3 R 0xffff81007755c3c0 *ib_cm/3 rsp rip Function (args) 0xffff81007c0839d8 0xffffffff803513d2 __iowrite32_copy+0x2 0xffff81007c083a08 0xffffffff88066161 [ib_ipath]ipath_verbs_send+0x10b 0xffff81007c083a68 0xffffffff88061205 [ib_ipath]ipath_do_ruc_send+0x707 0xffff81007c083af8 0xffffffff88061619 [ib_ipath]ipath_post_ruc_send+0x1fd 0xffff81007c083b58 0xffffffff88065c39 [ib_ipath]ipath_post_send+0x70 0xffff81007c083b88 0xffffffff88284685 [ko2iblnd]kiblnd_check_sends+0x5c0 0xffff81007c083b98 0xffffffff8046e3af _spin_unlock+0x9 0xffff81007c083bf8 0xffffffff882873af [ko2iblnd]kiblnd_connreq_done+0x3d2 0xffff81007c083c28 0xffffffff8826b96d [ib_cm]ib_send_cm_rtu+0xec 0xffff81007c083c78 0xffffffff882886e9 [ko2iblnd]kiblnd_check_connreply+0x318 0xffff81007c083cd8 0xffffffff88289537 [ko2iblnd]kiblnd_cm_callback+0xb02 0xffff81007c083d38 0xffffffff88274c01 [rdma_cm]cma_ib_handler+0x18a 0xffff81007c083da8 0xffffffff8826c7da [ib_cm]cm_process_work+0x5c 0xffff81007c083dd8 0xffffffff8826de19 [ib_cm]cm_work_handler+0xad7 0xffff81007c083e28 0xffffffff8826d342 [ib_cm]cm_work_handler 0xffff81007c083e38 0xffffffff80238bc9 run_workqueue+0xb1 0xffff81007c083e58 0xffffffff80238c71 worker_thread 0xffff81007c083e68 0xffffffff8023bed0 keventd_create_kthread 0xffff81007c083e78 0xffffffff80238d97 worker_thread+0x126 In ipath_verbs.c: ipath_verbs_send() the problem is the address of ss->sge.vaddr. The problem seems to be in the goto loop of ipath_ruc.c: ipath_do_ruc_send(). First time qp->s_hdrwords is zero, so it dosen't call if (qp->s_hdrwords != 0) { ... ipath_verbs_send() ... } Then also both ifs are not true. if (qp->s_ack_state != IB_OPCODE_RC_ACKNOWLEDGE && (bth0 = ipath_make_rc_ack(qp, ohdr, pmtu)) != 0) { printk ("Sending.\n"); bth2 = qp->s_ack_psn++ & IPATH_PSN_MASK; } else if (!((qp->ibqp.qp_type == IB_QPT_RC) ? ipath_make_rc_req(qp, ohdr, pmtu, &bth0, &bth2) : ipath_make_uc_req(qp, ohdr, pmtu, &bth0, &bth2))) { ... } So it increases qp->s_hdrwords and after the "goto again", ipath_verbs_send() will be called and it crashes. Any help to solve the problem is appreciated. Thanks in advance, Bernd -- Bernd Schubert Q-Leap Networks GmbH From monty at lampreynetworks.com Wed Mar 28 06:55:40 2007 From: monty at lampreynetworks.com (John LaMontagne) Date: Wed, 28 Mar 2007 09:55:40 -0400 Subject: [ofa-general] RE: ofa interop test plan In-Reply-To: <1174923205.10117.10.camel@stevo-desktop> References: <1174923205.10117.10.camel@stevo-desktop> Message-ID: <000c01c77140$c8819690$5984c3b0$@com> I will get you an answer to this. I need to defer to Rupert Dance. He will return an answer to you. Sorry for the confusion. Monty -----Original Message----- From: Steve Wise [mailto:swise at opengridcomputing.com] Sent: Monday, March 26, 2007 11:33 AM To: monty at lampreynetworks.com Cc: ewg at lists.openfabrics.org; General at lists.openfabrics.org Subject: ofa interop test plan Monty, The IOL info page at http://www.iol.unh.edu/services/testing/ofa/events/Invitation_2007-04_OFA.ph p states that the test plan is on the openfabrics site at www.openfabrics.org/downloads.htm, but I don't see it there. Can you point me at the iwarp specific test plan? Thanks, Steve. From dledford at redhat.com Wed Mar 28 07:25:18 2007 From: dledford at redhat.com (Doug Ledford) Date: Wed, 28 Mar 2007 10:25:18 -0400 Subject: [ofa-general] ofa server account In-Reply-To: References: <5E701717F2B2ED4EA60F87C8AA57B7CC06CF79A2@venom2> Message-ID: <1175091919.3973.133.camel@athlon-x2.xsintricity.com> On Tue, 2007-03-27 at 13:13 -0400, Jeff Squyres wrote: > Michael Lee is the current sysadmin, but he's being phased out. Jeff > Becker has graciously volunteered to phase in as the new sysadmin. > > Both are CC'ed on this e-mail; they'll get back to you on what > information they need from you to create an account. I was going to privately email those guys and request the same thing (but for different reasons). But, as you later pointed out, the list server stripped their addresses. Care to forward me to them as well Jeff? > > On Mar 27, 2007, at 1:10 PM, Glenn Grundstrom wrote: > > > Who should I contact that can create logins for git trees on the > > OFA server? I'd like to put some git trees on the server for the > > NetEffect iWARP kernel driver and userspace lib. > > > > Thanks, > > Glenn. > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > > openib-general > > -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From rsdance at freedomcycle.net Wed Mar 28 07:27:00 2007 From: rsdance at freedomcycle.net (Rupert Dance) Date: Wed, 28 Mar 2007 10:27:00 -0400 Subject: [ofa-general] RE: ofa interop test plan In-Reply-To: <000c01c77140$c8819690$5984c3b0$@com> References: <1174923205.10117.10.camel@stevo-desktop> <000c01c77140$c8819690$5984c3b0$@com> Message-ID: <000001c77145$26c9a5e0$800101df@annapurna> Hi Steve, The OFA-IWG Interoperability Test Plan is undergoing a final review and update but I have attached the most current version. This is also available on the UNH-IOL Website: http://www.iol.unh.edu/services/testing/ofa/index.php. Just click on the Test Suite Icon. The OFA is also updating their website and the plan will be available there shortly. Thanks Thank you, Rupert Dance Lamprey Networks 58 Dover Road Durham, NH 03824 Phone: 603-868-8411 Fax: 603-868-6411 -----Original Message----- From: John LaMontagne [mailto:monty at lampreynetworks.com] Sent: Wednesday, March 28, 2007 9:56 AM To: 'Steve Wise'; 'Rupert Dance' Cc: ewg at lists.openfabrics.org; General at lists.openfabrics.org Subject: RE: ofa interop test plan I will get you an answer to this. I need to defer to Rupert Dance. He will return an answer to you. Sorry for the confusion. Monty -----Original Message----- From: Steve Wise [mailto:swise at opengridcomputing.com] Sent: Monday, March 26, 2007 11:33 AM To: monty at lampreynetworks.com Cc: ewg at lists.openfabrics.org; General at lists.openfabrics.org Subject: ofa interop test plan Monty, The IOL info page at http://www.iol.unh.edu/services/testing/ofa/events/Invitation_2007-04_OFA.ph p states that the test plan is on the openfabrics site at www.openfabrics.org/downloads.htm, but I don't see it there. Can you point me at the iwarp specific test plan? Thanks, Steve. -------------- next part -------------- A non-text attachment was scrubbed... Name: OFA-IWG Interoperability Test Plan-v1.05.pdf Type: application/pdf Size: 978531 bytes Desc: not available URL: From weiny2 at llnl.gov Wed Mar 28 09:23:30 2007 From: weiny2 at llnl.gov (Ira Weiny) Date: Wed, 28 Mar 2007 09:23:30 -0700 Subject: [ofa-general] Re: opensm console -what is it? In-Reply-To: <1175090963.11973.14.camel@localhost> References: <20070328124810.GJ11695@mellanox.co.il> <1175090963.11973.14.camel@localhost> Message-ID: <20070328092330.6230029a.weiny2@llnl.gov> On Wed, 28 Mar 2007 16:09:23 +0200 Sasha Khapyorsky wrote: > On Wed, 2007-03-28 at 14:48 +0200, Michael S. Tsirkin wrote: > > > > Hmm .. how does one exit? > > You cannot exit from local console, 'quit' will work with remote > console. If you want just to kill OpenSM you can use ^C. > > Sasha Sasha is correct. Here at LLNL we are making extensive use of the socket console to be able to change log levels and other parameters after opensm has been run as a daemon. The local console was kept as a feature for developers to make working with that interface easier. (And it was there before the socket was added so it did not cost much.) I believe all the other commands should work as advertised in the local mode. Here is a patch to the help menu of the quit command which should make things clearer. Ira -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: 0001-specify-quit-is-not-valid-in-local-mode.txt URL: From monisonlists at gmail.com Wed Mar 28 09:24:05 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Wed, 28 Mar 2007 18:24:05 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v4] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <20070328121519.GI11695@mellanox.co.il> References: <460A5935.7080104@gmail.com> <20070328121519.GI11695@mellanox.co.il> Message-ID: <460A96A5.1050307@gmail.com> Michael S. Tsirkin wrote: >> Quoting Moni Shoua : >> Subject: [RFC] [PATCH v4] IB/ipoib: Add bonding support to IPoIB >> >> Hi, >> >> The previous version of the patch changed ipoib_neigh_destructor to take >> the pointer to neigh outside the lock. This might be a risk so I wrote a >> new version of this patch so the change would affect only when bonding is used. >> Please see below... > > This looks pretty safe for OFED 1.2 insofar as this won't affect > someone not using bonding. > > Long term, we still need to find a proper solution for whenever > this bonding related code goes GA/upstream. > Thanks Michael. Vlad, Can you add this patch to kernel_patches/fixes please? Please note that this patch breaks one of the backport patches: ipoib_8111_to_2_6_16.patch. I adapted the broken patch so it can be applied after the bonding patch. I tested that it works for RH4UP3. You can take it from below. Index: ofa_kernel-1.2/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- ofa_kernel-1.2.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-28 17:52:59.000000000 +0200 +++ ofa_kernel-1.2/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-28 17:53:41.000000000 +0200 @@ -218,6 +218,7 @@ struct ipoib_neigh { struct neighbour *neighbour; struct net_device *dev; + struct list_head all_neigh_list; struct list_head list; }; Index: ofa_kernel-1.2/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- ofa_kernel-1.2.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-28 17:52:59.000000000 +0200 +++ ofa_kernel-1.2/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-28 18:07:50.000000000 +0200 @@ -85,6 +85,9 @@ struct workqueue_struct *ipoib_workqueue struct ib_sa_client ipoib_sa_client; +static DEFINE_SPINLOCK(ipoib_all_neigh_list_lock); +static LIST_HEAD(ipoib_all_neigh_list); + static void ipoib_add_one(struct ib_device *device); static void ipoib_remove_one(struct ib_device *device); @@ -783,6 +786,18 @@ static void ipoib_neigh_destructor(struc } else return; } + + struct ipoib_neigh *tn, *nn = NULL; + spin_lock(&ipoib_all_neigh_list_lock); + list_for_each_entry(tn, &ipoib_all_neigh_list, all_neigh_list) + if (tn->neighbour == n) { + nn = tn; + break; + } + spin_unlock(&ipoib_all_neigh_list_lock); + if (!nn) + return; + ipoib_dbg(priv, "neigh_destructor for %06x " IPOIB_GID_FMT "\n", IPOIB_QPN(n->ha), @@ -819,6 +834,11 @@ struct ipoib_neigh *ipoib_neigh_alloc(st *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(&neigh->queue); + spin_lock(&ipoib_all_neigh_list_lock); + list_add_tail(&neigh->all_neigh_list, &ipoib_all_neigh_list); + neigh->neighbour->ops->destructor = ipoib_neigh_destructor; + spin_unlock(&ipoib_all_neigh_list_lock); + return neigh; } @@ -826,6 +846,17 @@ void ipoib_neigh_free(struct net_device { struct ipoib_dev_priv *priv = netdev_priv(dev); struct sk_buff *skb; + struct ipoib_neigh *nn; + spin_lock(&ipoib_all_neigh_list_lock); + list_del(&neigh->all_neigh_list); + list_for_each_entry(nn, &ipoib_all_neigh_list, all_neigh_list) + if (nn->neighbour->ops == neigh->neighbour->ops) + goto found; + + neigh->neighbour->ops->destructor = NULL; +found: + spin_unlock(&ipoib_all_neigh_list_lock); + *to_ipoib_neigh(neigh->neighbour) = NULL; while ((skb = __skb_dequeue(&neigh->queue))) { ++priv->stats.tx_dropped; @@ -836,8 +867,6 @@ void ipoib_neigh_free(struct net_device static int ipoib_neigh_setup_dev(struct net_device *dev, struct neigh_parms *parms) { - parms->neigh_destructor = ipoib_neigh_destructor; - return 0; } From rdreier at cisco.com Wed Mar 28 10:25:24 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 28 Mar 2007 10:25:24 -0700 Subject: [ofa-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get a few small post-rc5 fixes. I still have one more fix to send that I am still reviewing -- I should be able to send it Thursday. Erez Zilber (1): IB/iser: Handle aborting a command after it is sent Michael S. Tsirkin (1): IB/mthca: Fix thinko in init_mr_table() Steve Wise (1): RDMA/cxgb3: Fix resource leak in cxio_hal_init_ctrl_qp() drivers/infiniband/hw/cxgb3/cxio_hal.c | 12 ++++++++---- drivers/infiniband/hw/mthca/mthca_mr.c | 4 ++-- drivers/infiniband/ulp/iser/iser_initiator.c | 17 +++++++++-------- 3 files changed, 19 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c index 818cf1a..f5e9aee 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c @@ -498,9 +498,9 @@ static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p) u64 sge_cmd, ctx0, ctx1; u64 base_addr; struct t3_modify_qp_wr *wqe; - struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_KERNEL); - + struct sk_buff *skb; + skb = alloc_skb(sizeof(*wqe), GFP_KERNEL); if (!skb) { PDBG("%s alloc_skb failed\n", __FUNCTION__); return -ENOMEM; @@ -508,7 +508,7 @@ static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p) err = cxio_hal_init_ctrl_cq(rdev_p); if (err) { PDBG("%s err %d initializing ctrl_cq\n", __FUNCTION__, err); - return err; + goto err; } rdev_p->ctrl_qp.workq = dma_alloc_coherent( &(rdev_p->rnic_info.pdev->dev), @@ -518,7 +518,8 @@ static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p) GFP_KERNEL); if (!rdev_p->ctrl_qp.workq) { PDBG("%s dma_alloc_coherent failed\n", __FUNCTION__); - return -ENOMEM; + err = -ENOMEM; + goto err; } pci_unmap_addr_set(&rdev_p->ctrl_qp, mapping, rdev_p->ctrl_qp.dma_addr); @@ -556,6 +557,9 @@ static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p) rdev_p->ctrl_qp.workq, 1 << T3_CTRL_QP_SIZE_LOG2); skb->priority = CPL_PRIORITY_CONTROL; return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb)); +err: + kfree_skb(skb); + return err; } static int cxio_hal_destroy_ctrl_qp(struct cxio_rdev *rdev_p) diff --git a/drivers/infiniband/hw/mthca/mthca_mr.c b/drivers/infiniband/hw/mthca/mthca_mr.c index 8e4846b..fdb576d 100644 --- a/drivers/infiniband/hw/mthca/mthca_mr.c +++ b/drivers/infiniband/hw/mthca/mthca_mr.c @@ -881,8 +881,8 @@ int mthca_init_mr_table(struct mthca_dev *dev) } mpts = mtts = 1 << i; } else { - mpts = dev->limits.num_mtt_segs; - mtts = dev->limits.num_mpts; + mtts = dev->limits.num_mtt_segs; + mpts = dev->limits.num_mpts; } if (!mthca_is_memfree(dev) && diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 89e3728..278fcbc 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -658,6 +658,7 @@ void iser_ctask_rdma_finalize(struct iscsi_iser_cmd_task *iser_ctask) { int deferred; int is_rdma_aligned = 1; + struct iser_regd_buf *regd; /* if we were reading, copy back to unaligned sglist, * anyway dma_unmap and free the copy @@ -672,20 +673,20 @@ void iser_ctask_rdma_finalize(struct iscsi_iser_cmd_task *iser_ctask) } if (iser_ctask->dir[ISER_DIR_IN]) { - deferred = iser_regd_buff_release - (&iser_ctask->rdma_regd[ISER_DIR_IN]); + regd = &iser_ctask->rdma_regd[ISER_DIR_IN]; + deferred = iser_regd_buff_release(regd); if (deferred) { - iser_err("References remain for BUF-IN rdma reg\n"); - BUG(); + iser_err("%d references remain for BUF-IN rdma reg\n", + atomic_read(®d->ref_count)); } } if (iser_ctask->dir[ISER_DIR_OUT]) { - deferred = iser_regd_buff_release - (&iser_ctask->rdma_regd[ISER_DIR_OUT]); + regd = &iser_ctask->rdma_regd[ISER_DIR_OUT]; + deferred = iser_regd_buff_release(regd); if (deferred) { - iser_err("References remain for BUF-OUT rdma reg\n"); - BUG(); + iser_err("%d references remain for BUF-OUT rdma reg\n", + atomic_read(®d->ref_count)); } } From rdreier at cisco.com Wed Mar 28 10:28:28 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 28 Mar 2007 10:28:28 -0700 Subject: [ofa-general] Re: pkey change handling patch In-Reply-To: <20070328093345.GD11695@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 28 Mar 2007 11:33:45 +0200") References: <6a122cc00703220602s7cdad558ud73f72e39f812eaf@mail.gmail.com> <20070322172245.GB17532@mellanox.co.il> <46094DA5.8000601@gmail.com> <20070327205213.GD28347@mellanox.co.il> <6a122cc00703280200h33f384b9jae75592294a9cbd9@mail.gmail.com> <20070328093345.GD11695@mellanox.co.il> Message-ID: Michael> I looked at cache.c and you are right. Maybe we should Michael> either 1. report events after cache has been updated or Michael> 2. make cache queries error out (EBUSY?) if cache hs not Michael> updated yet. Michael> Option 1 requires core changes, option 2 - ULP changes Michael> I would be inclined to go for 2. Roland? Yes, I agree. How about ESTALE as an error code? - R. From halr at voltaire.com Wed Mar 28 12:13:10 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Mar 2007 14:13:10 -0500 Subject: [ofa-general] Re: [PATCH] IB/core/user_mad.c: Add support for issmdisabled In-Reply-To: <20070328053159.GB32306@mellanox.co.il> References: <1175042747.4372.104218.camel@hal.voltaire.com> <20070328053159.GB32306@mellanox.co.il> Message-ID: <1175109189.4379.2042.camel@hal.voltaire.com> On Wed, 2007-03-28 at 00:31, Michael S. Tsirkin wrote: > > Quoting Hal Rosenstock : > > Subject: [PATCH] IB/core/user_mad.c: Add support for issmdisabled > > > > IB/core/user_mad.c: Add support for issmdisabled > > > > Signed-off-by: Hal Rosenstock > > I imagine this is related to the "SM inactive" support > you posted separately. Is that right? Yes. > How is this used by openSM? Whenever the SM state machine goes to inactive, it set the isSMdisabled bit (and resets the IsSM bit) and in the opposite direction sets IsSM and resets isSMdisabled. OpenSM vendor layer for OpenIB will try to set/reset isSMdisabled. If it fails (no user_mad support for this), the isSM bit is still properly handled. -- Hal From halr at voltaire.com Wed Mar 28 12:15:57 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Mar 2007 14:15:57 -0500 Subject: [ofa-general] madeye kernel oops In-Reply-To: <1175066138.14461.2.camel@Ami-desktop> References: <000301c770a2$9d1fac60$73248686@amr.corp.intel.com> <1175066138.14461.2.camel@Ami-desktop> Message-ID: <1175109356.4379.2208.camel@hal.voltaire.com> On Wed, 2007-03-28 at 02:15, Ami Perlmutter wrote: > On Tue, 2007-03-27 at 12:03 -0700, Sean Hefty wrote: > > How easily can you reproduce this? I'm assuming that this is with OFED 1.2 on > > 2.6.20, correct? > yes > > Can you describe what you were doing when this crash occurred? > opensm was running on the other computer > running SDP programs So the node which oops'd was only running madeye and some SDP data transfer ? Can you be more specific about the failure scenario ? What was going on on the node which failed ? It looks like you were removing madeye. Was this the first time ? Anything else going on ? Thanks. -- Hal > > Thanks, > > Sean > > > > >Unable to handle kernel NULL pointer dereference at 0000000000000038 > > >RIP: > > > [] :ib_mad:ib_unregister_mad_agent+0x11/0x480 > > >PGD 73387067 PUD 72844067 PMD 0 > > >Oops: 0000 [1] SMP > > >CPU 0 > > >Modules linked in: ib_madeye i2c_dev i2c_core ib_sdp rdma_cm iw_cm > > >ib_addr ib_local_sa ib_uverbs ib_umad ib_mthca ib_ipoib ib_cm ib_sa > > >ib_mad ib_core > > >Pid: 8917, comm: rmmod Not tainted 2.6.20 #1 > > >RIP: 0010:[] > > >[] :ib_mad:ib_unregister_mad_agent+0x11/0x480 > > >RSP: 0000:ffff810071ee1e08 EFLAGS: 00010292 > > >RAX: 0000000000000000 RBX: 0000000000000020 RCX: 000000000000003f > > >RDX: ffff810077ebd6c0 RSI: 0000000000000202 RDI: 0000000000000000 > > >RBP: 0000000000000000 R08: ffff810077ebd728 R09: 0000000000000003 > > >R10: 0000000000000000 R11: 0000000000000000 R12: ffff8100766c33c0 > > >R13: 0000000000000002 R14: 0000000000000880 R15: 0000000000503010 > > >FS: 00002b3d6689fb00(0000) GS:ffffffff80702000(0000) > > >knlGS:0000000000000000 > > >CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > >CR2: 0000000000000038 CR3: 0000000071086000 CR4: 00000000000006e0 > > >Process rmmod (pid: 8917, threadinfo ffff810071ee0000, task > > >ffff8100781aeee0) > > >Stack: ffff810071ee1e18 ffffffff8022b92f ffff810071ee1e28 > > >ffffffff80538b43 > > > ffff810071ee1ea8 ffffffff80538ea2 ffffffff80690880 ffff810071ee1e78 > > > 000000000000000f 0000000000000020 0000000000000002 ffff8100766c33c0 > > >Call Trace: > > > [] __cond_resched+0x1c/0x44 > > > [] cond_resched+0x2e/0x39 > > > [] wait_for_completion+0x1a/0xd0 > > > [] :ib_madeye:madeye_remove_one+0x56/0x88 > > > [] :ib_core:ib_unregister_client+0x40/0xe2 > > > [] sys_delete_module+0x1b4/0x1e5 > > > [] add_uevent_var+0x40/0xe3 > > > [] sys_munmap+0x4b/0x58 > > > [] system_call+0x7e/0x83 > > > > > > > > >Code: 83 7f 38 00 0f 84 fd 03 00 00 48 8d 44 24 20 4c 8d 67 f0 48 > > >RIP [] :ib_mad:ib_unregister_mad_agent+0x11/0x480 > > > RSP > > >CR2: 0000000000000038 > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Wed Mar 28 12:16:55 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Mar 2007 14:16:55 -0500 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: References: <20070327100256.GL6661@mellanox.co.il> <20070328070514.GA8649@mellanox.co.il> Message-ID: <1175109415.4379.2292.camel@hal.voltaire.com> On Wed, 2007-03-28 at 02:22, Scott Weitzenkamp (sweitzen) wrote: > There's no way to shut down an IB switch port with opensm or any OFED > diags? Yuck... Not true; ibportstate can do this. -- Hal > > Scott > > > -----Original Message----- > > From: Michael S. Tsirkin [mailto:mst at dev.mellanox.co.il] > > Sent: Wednesday, March 28, 2007 12:05 AM > > To: Scott Weitzenkamp (sweitzen) > > Cc: Michael S. Tsirkin; general at lists.openfabrics.org; Roland > > Dreier; bugmail at lists.openfabrics.org > > Subject: Re: [Bug 465] IPoIB CM HA fails after several hours > > of failures > > > > > This would be a good test for Mellanox, the developers of > > this feature, > > > to run regularly. > > > > The difficulty is that without managed switch we can't shut down > > specific ports easily. I'll try to think of something. > > > > -- > > MST > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From changquing.tang at hp.com Wed Mar 28 11:25:11 2007 From: changquing.tang at hp.com (Tang, Changqing) Date: Wed, 28 Mar 2007 19:25:11 +0100 Subject: [ofa-general] Re: [PATCH V2 - libibverbs] Added reference count tocompletion event channels In-Reply-To: References: <1173693643.18284.1.camel@mtldesk014.lab.mtl.com><349DCDA352EACF42A0C49FA6DCEA8403DD79B6@G3W0634.americas.hpqcorp.net> Message-ID: <349DCDA352EACF42A0C49FA6DCEA8403E66D2F@G3W0634.americas.hpqcorp.net> HI, Is it OK to destroy a completion queue while it has event queued, any memory leak risk ? Thanks. --CQ > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Tuesday, March 27, 2007 8:55 AM > To: Tang, Changqing > Cc: Dotan Barak; openib-general > Subject: Re: [ofa-general] Re: [PATCH V2 - libibverbs] Added > reference count tocompletion event channels > > > Since you changed the size of structure 'struct ibv_cq', > does that mean > code > compiled with OFED 1.1 can not work > with OFED 1.2 ? > > No, the compatibility wrappers should still work. > > - R. > From halr at voltaire.com Wed Mar 28 12:29:15 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Mar 2007 14:29:15 -0500 Subject: [ofa-general] Re: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <20070328083637.GA11695@mellanox.co.il> References: <20070328070514.GA8649@mellanox.co.il> <20070328083637.GA11695@mellanox.co.il> Message-ID: <1175109866.4379.2787.camel@hal.voltaire.com> On Wed, 2007-03-28 at 03:36, Michael S. Tsirkin wrote: > > > -----Original Message----- > > > From: Michael S. Tsirkin [mailto:mst at dev.mellanox.co.il] > > > Sent: Wednesday, March 28, 2007 12:05 AM > > > To: Scott Weitzenkamp (sweitzen) > > > Cc: Michael S. Tsirkin; general at lists.openfabrics.org; Roland > > > Dreier; bugmail at lists.openfabrics.org > > > Subject: Re: [Bug 465] IPoIB CM HA fails after several hours > > > of failures > > > > > > > This would be a good test for Mellanox, the developers of > > > this feature, > > > > to run regularly. > > > > > > The difficulty is that without managed switch we can't shut down > > > specific ports easily. I'll try to think of something. > > > > Quoting Scott Weitzenkamp (sweitzen) : > > Subject: RE: [Bug 465] IPoIB CM HA fails after several hours of failures > > > > There's no way to shut down an IB switch port with opensm or any OFED > > diags? Yuck... > > > > Scott > > Maybe something can be done with the opensm console. A command could be added for this in the console but there is a separate diag command which handles this. > But I don't know where to find documentation for it. The documentation is the help in the console. If there is a need, it could also be added as a section in the man page. -- Hal From halr at voltaire.com Wed Mar 28 12:29:11 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Mar 2007 14:29:11 -0500 Subject: [ofa-general] [PATCH] OpenSM/osm_sm_state_mgr.c: In __osm_sm_state_mgr_send_master_sm_info_req, handle master GUID port not found In-Reply-To: <460A59F9.4000109@dev.mellanox.co.il> References: <1175013740.4372.73340.camel@hal.voltaire.com> <460A59F9.4000109@dev.mellanox.co.il> Message-ID: <1175109802.4379.2703.camel@hal.voltaire.com> On Wed, 2007-03-28 at 07:05, Yevgeny Kliteynik wrote: > Hal Rosenstock wrote: > > OpenSM/osm_sm_state_mgr.c: In > > __osm_sm_state_mgr_send_master_sm_info_req, handle master GUID port not > > found properly > > > > Signed-off-by: Hal Rosenstock > > > > diff --git a/osm/opensm/osm_sm_state_mgr.c b/osm/opensm/osm_sm_state_mgr.c > > index 41153fc..002821b 100644 > > --- a/osm/opensm/osm_sm_state_mgr.c > > +++ b/osm/opensm/osm_sm_state_mgr.c > > @@ -231,6 +231,11 @@ __osm_sm_state_mgr_send_master_sm_info_r > > */ > > p_port = ( osm_port_t * ) cl_qmap_get( &p_sm_mgr->p_subn->port_guid_tbl, > > p_sm_mgr->master_guid ); > > + if( p_port == > > + ( osm_port_t * ) cl_qmap_end( &p_sm_mgr->p_subn->port_guid_tbl ) ) > > + { > > + p_port = NULL; > > + } > > Good catch. > Just curios - did you find it simply by code review or did you > actually see a case when there was no port object for the master_guid? Yes, with my work on SM inactive. -- Hal > --Yevgeny > > > } > > else > > { > > > > > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From mst at dev.mellanox.co.il Wed Mar 28 11:34:37 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Mar 2007 20:34:37 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v4] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <460A96A5.1050307@gmail.com> References: <460A5935.7080104@gmail.com> <20070328121519.GI11695@mellanox.co.il> <460A96A5.1050307@gmail.com> Message-ID: <20070328183437.GB4253@mellanox.co.il> > Quoting Moni Shoua : > Subject: Re: [RFC] [PATCH v4] IB/ipoib: Add bonding support to IPoIB > > Michael S. Tsirkin wrote: > >> Quoting Moni Shoua : > >> Subject: [RFC] [PATCH v4] IB/ipoib: Add bonding support to IPoIB > >> > >> Hi, > >> > >> The previous version of the patch changed ipoib_neigh_destructor to take > >> the pointer to neigh outside the lock. This might be a risk so I wrote a > >> new version of this patch so the change would affect only when bonding is used. > >> Please see below... > > > > This looks pretty safe for OFED 1.2 insofar as this won't affect > > someone not using bonding. > > > > Long term, we still need to find a proper solution for whenever > > this bonding related code goes GA/upstream. > > > Thanks Michael. > > Vlad, > Can you add this patch to kernel_patches/fixes please? > Please note that this patch breaks one of the backport patches: ipoib_8111_to_2_6_16.patch. > I adapted the broken patch so it can be applied after the bonding patch. > I tested that it works for RH4UP3. You can take it from below. Moni, pls post a link to git tree Vlad can pull. -- MST From mst at dev.mellanox.co.il Wed Mar 28 11:44:11 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Mar 2007 20:44:11 +0200 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: <20070323092234.GG17532@mellanox.co.il> Message-ID: <20070328184411.GC4253@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [GIT PULL] please pull infiniband.git > > > What about mthca QP reset issues? > > I'm still thinking about synchronizing with the completion EQ's irq. Let's discuss this? Can you formulate what's bothering you? -- MST From mst at dev.mellanox.co.il Wed Mar 28 12:12:23 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Mar 2007 21:12:23 +0200 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <1175109415.4379.2292.camel@hal.voltaire.com> References: <20070327100256.GL6661@mellanox.co.il> <20070328070514.GA8649@mellanox.co.il> <1175109415.4379.2292.camel@hal.voltaire.com> Message-ID: <20070328191223.GF4253@mellanox.co.il> > > There's no way to shut down an IB switch port with opensm or any OFED > > diags? Yuck... > > Not true; ibportstate can do this. I found that, yes. However, to automate this fully I need to find the lid of the switch that is connected to specific HCA ports. I expect ibnetdiscover can do this, but was unable to grok the output syntax. Is it documented somewhere? Alternatively, can linkinfo be queried with saquery? -- MST From halr at voltaire.com Wed Mar 28 13:17:46 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Mar 2007 15:17:46 -0500 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <20070328191223.GF4253@mellanox.co.il> References: <20070327100256.GL6661@mellanox.co.il> <20070328070514.GA8649@mellanox.co.il> <1175109415.4379.2292.camel@hal.voltaire.com> <20070328191223.GF4253@mellanox.co.il> Message-ID: <1175113065.4379.6150.camel@hal.voltaire.com> On Wed, 2007-03-28 at 14:12, Michael S. Tsirkin wrote: > > > There's no way to shut down an IB switch port with opensm or any OFED > > > diags? Yuck... > > > > Not true; ibportstate can do this. > > I found that, yes. > However, to automate this fully I need to find the lid > of the switch that is connected to specific HCA ports. So do you have the GUID or LID or the HCA port(s) in question ? > I expect ibnetdiscover can do this, but was unable to grok > the output syntax. I'll explain once I have the answer to the above question. > Is it documented somewhere? In the man page but this may not be sufficient for your purposes. > Alternatively, can linkinfo be queried with saquery? Not currently. -- Hal From halr at voltaire.com Wed Mar 28 13:31:36 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Mar 2007 15:31:36 -0500 Subject: [ofa-general] Re: opensm console -what is it? In-Reply-To: <20070328092330.6230029a.weiny2@llnl.gov> References: <20070328124810.GJ11695@mellanox.co.il> <1175090963.11973.14.camel@localhost> <20070328092330.6230029a.weiny2@llnl.gov> Message-ID: <1175113895.4379.7045.camel@hal.voltaire.com> On Wed, 2007-03-28 at 11:23, Ira Weiny wrote: > On Wed, 28 Mar 2007 16:09:23 +0200 > Sasha Khapyorsky wrote: > > > On Wed, 2007-03-28 at 14:48 +0200, Michael S. Tsirkin wrote: > > > > > > > > > Hmm .. how does one exit? > > > > You cannot exit from local console, 'quit' will work with remote > > console. If you want just to kill OpenSM you can use ^C. > > > > Sasha > > Sasha is correct. Here at LLNL we are making extensive use of the socket > console to be able to change log levels and other parameters after opensm has > been run as a daemon. > > The local console was kept as a feature for developers to make working with > that interface easier. (And it was there before the socket was added so it did > not cost much.) I believe all the other commands should work as advertised in > the local mode. > > Here is a patch to the help menu of the quit command which should make things > clearer. Sure. How about also the following on top of your change: OpenSM/osm_console.c: Indicate use ctl-c to quit in local mode Signed-off-by: Hal Rosenstock diff --git a/osm/opensm/osm_console.c b/osm/opensm/osm_console.c index 4577ab7..5ce9c88 100644 --- a/osm/opensm/osm_console.c +++ b/osm/opensm/osm_console.c @@ -89,7 +89,7 @@ static void help_command(FILE *out, int static void help_quit(FILE *out, int detail) { - fprintf(out, "quit (not valid in local mode)\n"); + fprintf(out, "quit (not valid in local mode; use ctl-c)\n"); } > Ira > > > ______________________________________________________________________ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at dev.mellanox.co.il Wed Mar 28 12:51:31 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Mar 2007 21:51:31 +0200 Subject: [ofa-general] Re: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <1175109866.4379.2787.camel@hal.voltaire.com> References: <20070328070514.GA8649@mellanox.co.il> <20070328083637.GA11695@mellanox.co.il> <1175109866.4379.2787.camel@hal.voltaire.com> Message-ID: <20070328195131.GH4253@mellanox.co.il> Hal, All, I saw you added some stuff to bugzilla. Let's keep discussion on the list please - reply to my mail, do not add text to bugzilla directly. -- MST From halr at voltaire.com Wed Mar 28 14:00:20 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Mar 2007 16:00:20 -0500 Subject: [ofa-general] Re: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <20070328195131.GH4253@mellanox.co.il> References: <20070328070514.GA8649@mellanox.co.il> <20070328083637.GA11695@mellanox.co.il> <1175109866.4379.2787.camel@hal.voltaire.com> <20070328195131.GH4253@mellanox.co.il> Message-ID: <1175115619.4379.8830.camel@hal.voltaire.com> On Wed, 2007-03-28 at 14:51, Michael S. Tsirkin wrote: > Hal, All, I saw you added some stuff to bugzilla. > Let's keep discussion on the list please - reply to my > mail, do not add text to bugzilla directly. I didn't. I only reply'd all to the emails on the list. -- Hal From mst at dev.mellanox.co.il Wed Mar 28 13:09:06 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Mar 2007 22:09:06 +0200 Subject: [ofa-general] Re: pkey change handling patch In-Reply-To: References: <6a122cc00703220602s7cdad558ud73f72e39f812eaf@mail.gmail.com> <20070322172245.GB17532@mellanox.co.il> <46094DA5.8000601@gmail.com> <20070327205213.GD28347@mellanox.co.il> <6a122cc00703280200h33f384b9jae75592294a9cbd9@mail.gmail.com> <20070328093345.GD11695@mellanox.co.il> Message-ID: <20070328200906.GJ4253@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: pkey change handling patch > > Michael> I looked at cache.c and you are right. Maybe we should > Michael> either 1. report events after cache has been updated or > Michael> 2. make cache queries error out (EBUSY?) if cache hs not > Michael> updated yet. > > Michael> Option 1 requires core changes, option 2 - ULP changes > > Michael> I would be inclined to go for 2. Roland? > > Yes, I agree. How about ESTALE as an error code? Looks OK. Moni, I think this should be a separate patch, and your pkey work on top of this. -- MST From mst at dev.mellanox.co.il Wed Mar 28 13:12:51 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Mar 2007 22:12:51 +0200 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <1175113065.4379.6150.camel@hal.voltaire.com> References: <20070327100256.GL6661@mellanox.co.il> <20070328070514.GA8649@mellanox.co.il> <1175109415.4379.2292.camel@hal.voltaire.com> <20070328191223.GF4253@mellanox.co.il> <1175113065.4379.6150.camel@hal.voltaire.com> Message-ID: <20070328201251.GK4253@mellanox.co.il> > > > Not true; ibportstate can do this. > > > > I found that, yes. > > However, to automate this fully I need to find the lid > > of the switch that is connected to specific HCA ports. > > So do you have the GUID or LID or the HCA port(s) in question ? Yes, that's easy to get. > > I expect ibnetdiscover can do this, but was unable to grok > > the output syntax. > > I'll explain once I have the answer to the above question. > > > Is it documented somewhere? > > In the man page but this may not be sufficient for your purposes. > > > Alternatively, can linkinfo be queried with saquery? > > Not currently. -- MST From rdreier at cisco.com Wed Mar 28 13:18:38 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 28 Mar 2007 13:18:38 -0700 Subject: [ofa-general] Re: pkey change handling patch In-Reply-To: <20070328200906.GJ4253@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 28 Mar 2007 22:09:06 +0200") References: <6a122cc00703220602s7cdad558ud73f72e39f812eaf@mail.gmail.com> <20070322172245.GB17532@mellanox.co.il> <46094DA5.8000601@gmail.com> <20070327205213.GD28347@mellanox.co.il> <6a122cc00703280200h33f384b9jae75592294a9cbd9@mail.gmail.com> <20070328093345.GD11695@mellanox.co.il> <20070328200906.GJ4253@mellanox.co.il> Message-ID: > Moni, I think this should be a separate patch, > and your pkey work on top of this. Yes, I agree. - R. From myopenib at gmail.com Wed Mar 28 13:23:52 2007 From: myopenib at gmail.com (Moni Levy) Date: Wed, 28 Mar 2007 22:23:52 +0200 Subject: [ofa-general] Re: pkey change handling patch In-Reply-To: <20070328200906.GJ4253@mellanox.co.il> References: <6a122cc00703220602s7cdad558ud73f72e39f812eaf@mail.gmail.com> <20070322172245.GB17532@mellanox.co.il> <46094DA5.8000601@gmail.com> <20070327205213.GD28347@mellanox.co.il> <6a122cc00703280200h33f384b9jae75592294a9cbd9@mail.gmail.com> <20070328093345.GD11695@mellanox.co.il> <20070328200906.GJ4253@mellanox.co.il> Message-ID: <460ACED8.20605@gmail.com> Michael S. Tsirkin wrote: >> Quoting Roland Dreier : >> Subject: Re: pkey change handling patch >> >> Michael> I looked at cache.c and you are right. Maybe we should >> Michael> either 1. report events after cache has been updated or >> Michael> 2. make cache queries error out (EBUSY?) if cache hs not >> Michael> updated yet. >> >> Michael> Option 1 requires core changes, option 2 - ULP changes >> >> Michael> I would be inclined to go for 2. Roland? >> >> Yes, I agree. How about ESTALE as an error code? >> > > Looks OK. > Moni, I think this should be a separate patch, > and your pkey work on top of this. > Ok, I'll try to close that tomorrow. --Moni From halr at voltaire.com Wed Mar 28 14:44:22 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Mar 2007 16:44:22 -0500 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <20070328201251.GK4253@mellanox.co.il> References: <20070327100256.GL6661@mellanox.co.il> <20070328070514.GA8649@mellanox.co.il> <1175109415.4379.2292.camel@hal.voltaire.com> <20070328191223.GF4253@mellanox.co.il> <1175113065.4379.6150.camel@hal.voltaire.com> <20070328201251.GK4253@mellanox.co.il> Message-ID: <1175118260.4379.11551.camel@hal.voltaire.com> On Wed, 2007-03-28 at 15:12, Michael S. Tsirkin wrote: > > > > Not true; ibportstate can do this. > > > > > > I found that, yes. > > > However, to automate this fully I need to find the lid > > > of the switch that is connected to specific HCA ports. > > > > So do you have the GUID or LID or the HCA port(s) in question ? > > Yes, that's easy to get. > > > > I expect ibnetdiscover can do this, but was unable to grok > > > the output syntax. > > > > I'll explain once I have the answer to the above question. Search for "H-", where GUID in hex is the node GUID, in the output of ibnetdiscover. [1] to the right of it indicates it is port 1. So for example, Switch 24 "S-005442ba00003080" # "ISR9024 Voltaire" base port 0 lid 6 lmc 0 [22] "H-0008f10403961354"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 4 It is listed under the switch it is attached to and in the right hand side is the LID of the switch which in this case is 6. -- Hal > > > Is it documented somewhere? > > > > In the man page but this may not be sufficient for your purposes. > > > > > Alternatively, can linkinfo be queried with saquery? > > > > Not currently. > > From rdreier at cisco.com Wed Mar 28 15:05:54 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 28 Mar 2007 15:05:54 -0700 Subject: [ofa-general] [ANNOUNCE] libibverbs 1.1-rc1 released Message-ID: I just tagged the 1.1-rc1 release of libibverbs and pushed it out to my git tree on kernel.org: git://git.kernel.org/pub/scm/libs/infiniband/libibverbs.git (the name of the tag is libibverbs-1.1-rc1). I've also copied a tarball into my home directory on openfabrics.org, with sha1sum: f92b71268eacfffa858a05582b0ed71ce37e760c libibverbs-1.1-rc1.tar.gz I would appreciated it if someone with access could move this into the right directory to appear in This release is the beginning of a major release cycle for libibverbs, so full compatibility with earlier libibverbs 1.0 releases is not preserved. Low-level device drivers will need to be rebuilt to work with libibverbs 1.1. However, a versioned ABI is provided so that applications dynamically linked with libibverbs 1.0 should work. I don't know of any major source level incompatibilities that would prevent an application that compiles against libibverbs 1.0 from building and working with libibverbs 1.1. I believe that libibverbs 1.1 is quite stable, since prereleases of the 1.1 tree have been shipped in the OFED 1.2 prereleases without any significant bug reports, and no major changes have gone into the tree for quite some time. Therefore I think a realistic schedule would be libibverbs 1.1-rc2 in one week, and libibverbs 1.1 final one week after that (April 11). A git shortlog of the changes on the 1.1 branch is below: Dotan Barak (4): Fix some memory leaks in read_config() error path Add resource cleanup at end of pingpong tests Fix memory leak on ibv_fork_init() error path Man page updates Jack Morgenstein (2): Return sq_draining properly from query_qp Delete man3 symbolic links before creating them during install Jeff Squyres (1): Add README notes about Valgrind memcheck support Leonid Arsh (1): Add IBV_EVENT_CLIENT_REREGISTER to libibverbs Ralph Campbell (1): Add response handling to ibv_cmd_resize_cq() Roland Dreier (34): Branch a libibverbs-1.0 tree for maintenance Fix update to Debian policy 3.7.2 Fix minor memory leaks Fix ibv_get_device_list() to really NULL-terminate the array Fix libibverbs definition of mb() for sparc Make fork() work for verbs consumers Simplify Debian package version Fix formatting of pingpong man pages slightly Debian packaging improvements Fix alignment of work request structures Update libibverbs man pages so they don't refer to "OpenIB" Add node_type and transport_type members to struct ibv_device Add Valgrind annotations Fix up configure test for Update ChangeLogs to give credit for Valgrind annotations Add handling of --with-valgrind= Add rmb() and wmb() to Minor cleanups Fix previous sq_draining change so it actually builds Rewrite test for linker script to get rid of Makefile conditionals Fix rewritten test for linker script support Implement new method for finding and loading device-specific drivers Revert "Pass driver data through ibv_cmd_req_notify_cq()" Don't use d_type member of struct dirent Fix caching of --version-script check Add ABI compatibility for apps linked against libibverbs 1.0 Rename Debian package back to libibverbs1 Fix unset context breakage when a low-level driver does kernel bypass Update Debian changelog Start adding libibverbs manpages Add remaining libibverbs manpages Add low-level driver hooks for reregister MR and memory windows Reference count completion channels Roll libibverbs 1.1-rc1 release Sean Hefty (1): Add some helper functions to simplify using UD QPs Steve Wise (5): Add async_event callback function to struct ibv_context_ops Support provider response data in reg_mr command Pass driver data through ibv_cmd_req_notify_cq() Don't lose devices when multiple RDMA devices are present The ibv_cmd_* create functions need to set context From ftillier at windows.microsoft.com Wed Mar 28 16:56:31 2007 From: ftillier at windows.microsoft.com (Fab Tillier) Date: Wed, 28 Mar 2007 16:56:31 -0700 Subject: [ofa-general] Past conference presentation? Message-ID: There used to be a section on the OpenFabrics wesbsite where PDF files of presentations from past conferences were posted. I can't seem to find these anymore - can anyone point me to links, or are these gone? Thanks! -Fab -------------- next part -------------- An HTML attachment was scrubbed... URL: From reaches at dinamiko.com Wed Mar 28 20:40:06 2007 From: reaches at dinamiko.com (Cristopher Mcclellan) Date: Wed, 28 Mar 2007 19:40:06 -0800 Subject: [ofa-general] Coreysoft has special deal on MlCR0S0FT/AD0BE S0ftware Message-ID: <000001c771b3$94083900$0100007f@localhost> passwords having overheard one which was successfully used; the 2. Set up a bootp server to provide the client with IP, gateway, fi This capability is particularly useful in an environment where which you are free to explore. PC joystick device. Which filter LPD starts and the filter's arguments depend on what is Fortunately, the solution is simple: just use another translation, MS-DOS Filesystem. Unless you plan to mount a DOS formatted will need to reset the associated counter using the ipfw(8) file in root's home directory: The floppy disk controller is now responsible for placing the byte to # cd /dev -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: microadobe2.gif Type: image/gif Size: 8137 bytes Desc: not available URL: From mst at dev.mellanox.co.il Wed Mar 28 23:08:58 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Mar 2007 08:08:58 +0200 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <1175118260.4379.11551.camel@hal.voltaire.com> References: <20070327100256.GL6661@mellanox.co.il> <20070328070514.GA8649@mellanox.co.il> <1175109415.4379.2292.camel@hal.voltaire.com> <20070328191223.GF4253@mellanox.co.il> <1175113065.4379.6150.camel@hal.voltaire.com> <20070328201251.GK4253@mellanox.co.il> <1175118260.4379.11551.camel@hal.voltaire.com> Message-ID: <20070329060858.GP4253@mellanox.co.il> > > > > I expect ibnetdiscover can do this, but was unable to grok > > > > the output syntax. > > > > > > I'll explain once I have the answer to the above question. > > Search for "H-", where GUID in hex is the node GUID, in the > output of ibnetdiscover. [1] to the right of it indicates it is port 1. > > So for example, > Switch 24 "S-005442ba00003080" # "ISR9024 Voltaire" base port 0 lid 6 lmc 0 > [22] "H-0008f10403961354"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 4 > > It is listed under the switch it is attached to and in the right hand > side is the LID of the switch which in this case is 6. And how do I know the switch port here? Hal, where does this syntax come from? Some legacy script? How about fixing this tool to provide a sane, tabulated output, with a top self-documenting header, and flags to select specific rows/colums? I envision something a la ps: type guid port remote_lid remote_port description Would such a patch be accepted? -- MST From mst at dev.mellanox.co.il Wed Mar 28 23:11:52 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Mar 2007 08:11:52 +0200 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <1175118260.4379.11551.camel@hal.voltaire.com> References: <20070327100256.GL6661@mellanox.co.il> <20070328070514.GA8649@mellanox.co.il> <1175109415.4379.2292.camel@hal.voltaire.com> <20070328191223.GF4253@mellanox.co.il> <1175113065.4379.6150.camel@hal.voltaire.com> <20070328201251.GK4253@mellanox.co.il> <1175118260.4379.11551.camel@hal.voltaire.com> Message-ID: <20070329061152.GQ4253@mellanox.co.il> > > > > I expect ibnetdiscover can do this, but was unable to grok > > > > the output syntax. > > > > > > I'll explain once I have the answer to the above question. > > Search for "H-", where GUID in hex is the node GUID, in the > output of ibnetdiscover. [1] to the right of it indicates it is port 1. > > So for example, > Switch 24 "S-005442ba00003080" # "ISR9024 Voltaire" base port 0 lid 6 lmc 0 > [22] "H-0008f10403961354"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 4 > > It is listed under the switch it is attached to and in the right hand > side is the LID of the switch which in this case is 6. And how do I know the switch port here? Hal, where does this syntax come from? Some legacy script? How about fixing this tool to provide a sane, tabulated output, with a top self-documenting header, and flags to select specific rows/colums? I envision something a la ps: type guid port remote_lid remote_port description Would such a patch be accepted? -- MST From mlleinin at hpcn.ca.sandia.gov Wed Mar 28 23:44:17 2007 From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger) Date: Wed, 28 Mar 2007 23:44:17 -0700 Subject: [ofa-general] Past conference presentation? In-Reply-To: References: Message-ID: <1175150657.26696.369.camel@localhost> On Wed, 2007-03-28 at 16:56 -0700, Fab Tillier wrote: > There used to be a section on the OpenFabrics wesbsite where PDF files > of presentations from past conferences were posted. I can’t seem to > find these anymore – can anyone point me to links, or are these gone? I found http://www.openfabrics.org/conferences/conference.htm that lists the old conferences/workshops but the links are stale. Perhaps Jeff Becker can fix this, but I don't know his email. - Matt > > > > Thanks! > > -Fab > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- Matt L. Leininger, Ph.D. Principal Member of Technical Staff V 925-294-4842 Scalable Computing R&D F 925-294-2400 Sandia National Laboratories mlleini at sandia.gov MS 9158, PO Box 969 http://hpcn-www.ca.sandia.gov/~mlleinin Livermore, CA 94551, USA From Philippe.GREGOIRE at CEA.FR Thu Mar 29 00:09:48 2007 From: Philippe.GREGOIRE at CEA.FR (Philippe.GREGOIRE at CEA.FR) Date: Thu, 29 Mar 2007 09:09:48 +0200 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hoursof failures References: <20070327100256.GL6661@mellanox.co.il><20070328070514.GA8649@mellanox.co.il><1175109415.4379.2292.camel@hal.voltaire.com><20070328191223.GF4253@mellanox.co.il><1175113065.4379.6150.camel@hal.voltaire.com> <20070328201251.GK4253@mellanox.co.il> Message-ID: Michael tracing route between HCA port and the subnet manager will give the lid of the switch connected to this HCA port : [root at cors127 ~]# ibstat CA 'mthca0' CA type: MT23108 Number of ports: 2 Firmware version: 3.0.0 Hardware version: a1 Node GUID: 0x0008f10403962eb0 System image GUID: 0x0008f10403962eb3 Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 26 LMC: 1 SM lid: 14 Capability mask: 0x00110a68 Port GUID: 0x0008f10403962eb1 Port 2: State: Down Physical state: Polling Rate: 2 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00110a68 Port GUID: 0x0008f10403962eb2 [root at cors127 ~]# ibtracert 26 14 >From ca {0x0008f10403962eb0} portnum 1 lid 0x1a-0x1b "cors127 HCA-1" [1] -> switch port {0x0005ad000001a775}[2] lid 0x2-0x2 "Cisco Switch SFS7000" [24] -> switch port {0x0005ad0000001834}[5] lid 0x10-0x10 "Topspin Switch - U3" [3] -> switch port {0x0005ad0000001830}[1] lid 0xe-0xe "Topspin Switch - U1" To switch {0x0005ad0000001830} portnum 0 lid 0xe-0xe "Topspin Switch - U1" [root at cors127 ~]# ibtracert 26 14 2>&1 | awk '(NR==2) {print $7}' 0x2-0x2 HCA port lid and its subnet manager lid are available in /sys/infiniband, so it 's better to do : [root at cors127 ~]# ibtracert $(&1 | awk '(NR==2) {sub(/-.*/, "", $7); print $7}' 0x2 PS: redirection of stderr to stdout is required as ibtracert gives all info on stderr. Philippe -------- Message d'origine-------- De: general-bounces at lists.openfabrics.org de la part de Michael S. Tsirkin Date: mer. 28/03/2007 22:12 À: Hal Rosenstock Cc: Michael S. Tsirkin; general at lists.openfabrics.org; bugmail at lists.openfabrics.org Objet : Re: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hoursof failures > > > Not true; ibportstate can do this. > > > > I found that, yes. > > However, to automate this fully I need to find the lid > > of the switch that is connected to specific HCA ports. > > So do you have the GUID or LID or the HCA port(s) in question ? Yes, that's easy to get. > > I expect ibnetdiscover can do this, but was unable to grok > > the output syntax. > > I'll explain once I have the answer to the above question. > > > Is it documented somewhere? > > In the man page but this may not be sufficient for your purposes. > > > Alternatively, can linkinfo be queried with saquery? > > Not currently. -- MST _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at dev.mellanox.co.il Thu Mar 29 00:15:15 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Mar 2007 09:15:15 +0200 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hoursof failures In-Reply-To: References: <20070328201251.GK4253@mellanox.co.il> Message-ID: <20070329071515.GS4253@mellanox.co.il> > Quoting Philippe.GREGOIRE at CEA.FR : > Subject: RE : [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hoursof failures > > Michael > tracing route between HCA port and the subnet manager will give the lid of the > switch connected to this HCA port : Cool, thanks! > [root at cors127 ~]# ibtracert 26 14 2>&1 | awk '(NR==2) {print $7}' > 0x2-0x2 > > HCA port lid and its subnet manager lid are available in /sys/infiniband, so > it 's better to do : > > [root at cors127 ~]# ibtracert $( sys/class/infiniband/mthca0/ports/1/sm_lid) 2>&1 | awk '(NR==2) {sub(/-.*/, "", > $7); print $7}' > 0x2 > > PS: redirection of stderr to stdout is required as ibtracert gives all info on > stderr. Why is that? Hal? I guess I'll file a bug for ibtracert on this. -- MST From amip at dev.mellanox.co.il Thu Mar 29 00:32:34 2007 From: amip at dev.mellanox.co.il (Ami Perlmutter) Date: Thu, 29 Mar 2007 09:32:34 +0200 Subject: [ofa-general] madeye kernel oops In-Reply-To: <1175109356.4379.2208.camel@hal.voltaire.com> References: <000301c770a2$9d1fac60$73248686@amr.corp.intel.com> <1175066138.14461.2.camel@Ami-desktop> <1175109356.4379.2208.camel@hal.voltaire.com> Message-ID: <1175153554.14461.10.camel@Ami-desktop> On Wed, 2007-03-28 at 14:15 -0500, Hal Rosenstock wrote: > On Wed, 2007-03-28 at 02:15, Ami Perlmutter wrote: > > On Tue, 2007-03-27 at 12:03 -0700, Sean Hefty wrote: > > > How easily can you reproduce this? I'm assuming that this is with OFED 1.2 on > > > 2.6.20, correct? > > yes > > > Can you describe what you were doing when this crash occurred? > > opensm was running on the other computer > > running SDP programs > > So the node which oops'd was only running madeye and some SDP data > transfer ? I was using madeye to debug mad loses in SDP connect. so other that CM mads there was no data being sent by SDP > Can you be more specific about the failure scenario ? What was going on > on the node which failed ? It looks like you were removing madeye. Was > this the first time ? Anything else going on ? when I tried to remove the module, the node was not running anything. opensm was running on the other machine. the oops happend when I tried to remove madeye in order to reset the driver. this oops happend more than once, but not every time I removed the module. > Thanks. > > -- Hal > > > > Thanks, > > > Sean > > > > > > >Unable to handle kernel NULL pointer dereference at 0000000000000038 > > > >RIP: > > > > [] :ib_mad:ib_unregister_mad_agent+0x11/0x480 > > > >PGD 73387067 PUD 72844067 PMD 0 > > > >Oops: 0000 [1] SMP > > > >CPU 0 > > > >Modules linked in: ib_madeye i2c_dev i2c_core ib_sdp rdma_cm iw_cm > > > >ib_addr ib_local_sa ib_uverbs ib_umad ib_mthca ib_ipoib ib_cm ib_sa > > > >ib_mad ib_core > > > >Pid: 8917, comm: rmmod Not tainted 2.6.20 #1 > > > >RIP: 0010:[] > > > >[] :ib_mad:ib_unregister_mad_agent+0x11/0x480 > > > >RSP: 0000:ffff810071ee1e08 EFLAGS: 00010292 > > > >RAX: 0000000000000000 RBX: 0000000000000020 RCX: 000000000000003f > > > >RDX: ffff810077ebd6c0 RSI: 0000000000000202 RDI: 0000000000000000 > > > >RBP: 0000000000000000 R08: ffff810077ebd728 R09: 0000000000000003 > > > >R10: 0000000000000000 R11: 0000000000000000 R12: ffff8100766c33c0 > > > >R13: 0000000000000002 R14: 0000000000000880 R15: 0000000000503010 > > > >FS: 00002b3d6689fb00(0000) GS:ffffffff80702000(0000) > > > >knlGS:0000000000000000 > > > >CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > >CR2: 0000000000000038 CR3: 0000000071086000 CR4: 00000000000006e0 > > > >Process rmmod (pid: 8917, threadinfo ffff810071ee0000, task > > > >ffff8100781aeee0) > > > >Stack: ffff810071ee1e18 ffffffff8022b92f ffff810071ee1e28 > > > >ffffffff80538b43 > > > > ffff810071ee1ea8 ffffffff80538ea2 ffffffff80690880 ffff810071ee1e78 > > > > 000000000000000f 0000000000000020 0000000000000002 ffff8100766c33c0 > > > >Call Trace: > > > > [] __cond_resched+0x1c/0x44 > > > > [] cond_resched+0x2e/0x39 > > > > [] wait_for_completion+0x1a/0xd0 > > > > [] :ib_madeye:madeye_remove_one+0x56/0x88 > > > > [] :ib_core:ib_unregister_client+0x40/0xe2 > > > > [] sys_delete_module+0x1b4/0x1e5 > > > > [] add_uevent_var+0x40/0xe3 > > > > [] sys_munmap+0x4b/0x58 > > > > [] system_call+0x7e/0x83 > > > > > > > > > > > >Code: 83 7f 38 00 0f 84 fd 03 00 00 48 8d 44 24 20 4c 8d 67 f0 48 > > > >RIP [] :ib_mad:ib_unregister_mad_agent+0x11/0x480 > > > > RSP > > > >CR2: 0000000000000038 > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From monisonlists at gmail.com Thu Mar 29 00:55:14 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Thu, 29 Mar 2007 09:55:14 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v4] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <20070328183437.GB4253@mellanox.co.il> References: <460A5935.7080104@gmail.com> <20070328121519.GI11695@mellanox.co.il> <460A96A5.1050307@gmail.com> <20070328183437.GB4253@mellanox.co.il> Message-ID: <460B70E2.4020608@gmail.com> > > Moni, pls post a link to git tree Vlad can pull. > I put a copy of ofed_1_2 git under ~monis/scm/ofed_1_2 in OFA server Vlad, you can pull it from there. I added a new patch and updated an existing one (or actually some copies of the same) Here the commit log of what I did commit 95648bda7a4f935937e92a2084db9152103d300e Author: Moni Shoua Date: Thu Mar 29 00:49:31 2007 -0700 Fix broken patch after ipoib_dev_in_ipoib_neigh.patch was added to kernel fixes Signed-off-by: Moni Shoua commit 2fe6af6feb6361aae8821fbe83c804aedb49fd57 Author: Moni Shoua Date: Thu Mar 29 00:16:58 2007 -0700 This patch enables bonding to work with IPoIB devices as slaves. It prevents kernel crash when ipoib_neigh_destructor is called and n->dev is not an IPoIB device but bonding master. Signed-off-by: Moni Shoua From mst at dev.mellanox.co.il Thu Mar 29 01:02:28 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Mar 2007 10:02:28 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v4] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <460B70E2.4020608@gmail.com> References: <460A5935.7080104@gmail.com> <20070328121519.GI11695@mellanox.co.il> <460A96A5.1050307@gmail.com> <20070328183437.GB4253@mellanox.co.il> <460B70E2.4020608@gmail.com> Message-ID: <20070329080228.GV4253@mellanox.co.il> Moni, could you just put the dev field somewhere else in structure, so that we don't have to touch backports? Quoting Moni Shoua : Subject: Re: [RFC] [PATCH v4] IB/ipoib: Add bonding support to IPoIB > > Moni, pls post a link to git tree Vlad can pull. > I put a copy of ofed_1_2 git under ~monis/scm/ofed_1_2 in OFA server Vlad, you can pull it from there. I added a new patch and updated an existing one (or actually some copies of the same) Here the commit log of what I did commit 95648bda7a4f935937e92a2084db9152103d300e Author: Moni Shoua Date: Thu Mar 29 00:49:31 2007 -0700 Fix broken patch after ipoib_dev_in_ipoib_neigh.patch was added to kernel fixes Signed-off-by: Moni Shoua commit 2fe6af6feb6361aae8821fbe83c804aedb49fd57 Author: Moni Shoua Date: Thu Mar 29 00:16:58 2007 -0700 This patch enables bonding to work with IPoIB devices as slaves. It prevents kernel crash when ipoib_neigh_destructor is called and n->dev is not an IPoIB device but bonding master. Signed-off-by: Moni Shoua -- MST From monisonlists at gmail.com Thu Mar 29 01:17:59 2007 From: monisonlists at gmail.com (Moni Shoua) Date: Thu, 29 Mar 2007 10:17:59 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v4] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <20070329080228.GV4253@mellanox.co.il> References: <460A5935.7080104@gmail.com> <20070328121519.GI11695@mellanox.co.il> <460A96A5.1050307@gmail.com> <20070328183437.GB4253@mellanox.co.il> <460B70E2.4020608@gmail.com> <20070329080228.GV4253@mellanox.co.il> Message-ID: <460B7637.4090500@gmail.com> Michael S. Tsirkin wrote: > Moni, could you just put the dev field somewhere else in structure, > so that we don't have to touch backports? > I would be glad to find a way so we don't habe to touch backports but the conflict is not in the place of dev field but in the code inside the destructor and I don't see a way to avoid this. From vlad at dev.mellanox.co.il Thu Mar 29 01:56:56 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 29 Mar 2007 10:56:56 +0200 Subject: [ofa-general] Re: [RFC] [PATCH v4] IB/ipoib: Add bonding support to IPoIB In-Reply-To: <460B70E2.4020608@gmail.com> References: <460A5935.7080104@gmail.com> <20070328121519.GI11695@mellanox.co.il> <460A96A5.1050307@gmail.com> <20070328183437.GB4253@mellanox.co.il> <460B70E2.4020608@gmail.com> Message-ID: <1175158616.637.0.camel@vladsk-laptop> On Thu, 2007-03-29 at 09:55 +0200, Moni Shoua wrote: > > > > Moni, pls post a link to git tree Vlad can pull. > > > > I put a copy of ofed_1_2 git under ~monis/scm/ofed_1_2 in OFA server > > Vlad, you can pull it from there. Done. -- Vladimir Sokolovsky Mellanox Technologies Ltd. From dotanb at dev.mellanox.co.il Thu Mar 29 02:29:41 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 29 Mar 2007 11:29:41 +0200 Subject: [ofa-general] Question about registering the [vdso] memory section in user level Message-ID: <460B8705.9030904@dev.mellanox.co.il> Hi. In our regression, there is a test case in which we register the last VMA of a process that has a read permission. Only in kernel 2.6.20-rc5 i get a failure. Here is the last line that i got from executing "cat /proc/1873/maps": In kernel 2.6.16.21-0.8-smp: 2b42d103f000-2b42d1116000 r--p 00000000 08:07 647899 /usr/lib/locale/en_US.utf8/LC_COLLATE 2b42d1116000-2b42d1117000 r--p 00000000 08:07 633217 /usr/lib/locale/en_US.utf8/LC_TIME 2b42d1117000-2b42d1118000 r--p 00000000 08:07 647880 /usr/lib/locale/en_US.utf8/LC_NUMERIC 2b42d1118000-2b42d114b000 r--p 00000000 08:07 647898 /usr/lib/locale/en_US.utf8/LC_CTYPE 2b42d114b000-2b42d114c000 rw-p 2b42d114b000 00:00 0 7fffda251000-7fffda267000 rw-p 7fffda251000 00:00 0 [stack] ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso] In kernel 2.6.20-rc5: 2ba22fd99000-2ba22fd9b000 rw-p 0000a000 08:03 309500 /lib64/libnss_files-2.3.4.so 2ba22fd9b000-2ba232be6000 r--p 00000000 08:03 68735 /usr/lib/locale/locale-archive 2ba232be6000-2ba232bec000 r--s 00000000 08:03 97940 /usr/lib64/gconv/gconv-modules.cache 2ba232bec000-2ba232bee000 rw-p 2ba232bec000 00:00 0 7fff7ae2d000-7fff7ae43000 rw-p 7fff7ae2d000 00:00 0 [stack] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vdso] It seems that in kernel 2.6.20-rc5 the last VMA which has a read permission is the [vdso] section but when i try to register it i get a failure. Is it wrong to register this section? thanks Dotan From vlad at lists.openfabrics.org Thu Mar 29 02:37:13 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Thu, 29 Mar 2007 02:37:13 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070329-0200 daily build status Message-ID: <20070329093713.8BF96E60815@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.14 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.15 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.19 Passed on ia64 with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.16 Passed on x86_64 with linux-2.6.14 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.14 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.15 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.12 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Failed: Build failed on x86_64 with linux-2.6.16.43-0.3-smp Log: /home/vlad/tmp/ofa_1_2_kernel-20070329-0200_linux-2.6.16.43-0.3-smp_x86_64_check/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h:15: error: ‘struct net_device’ has no member named ‘xmit_lock’ /home/vlad/tmp/ofa_1_2_kernel-20070329-0200_linux-2.6.16.43-0.3-smp_x86_64_check/drivers/infiniband/core/addr.c: At top level: /home/vlad/tmp/ofa_1_2_kernel-20070329-0200_linux-2.6.16.43-0.3-smp_x86_64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070329-0200_linux-2.6.16.43-0.3-smp_x86_64_check/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070329-0200_linux-2.6.16.43-0.3-smp_x86_64_check/drivers/infiniband/core] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070329-0200_linux-2.6.16.43-0.3-smp_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070329-0200_linux-2.6.16.43-0.3-smp_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.16.43-0.3-smp' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- From mst at dev.mellanox.co.il Thu Mar 29 02:40:23 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Mar 2007 11:40:23 +0200 Subject: [ofa-general] Re: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: References: <20070327100256.GL6661@mellanox.co.il> Message-ID: <20070329093713.GA4253@mellanox.co.il> > I'll try to get you more info, but can you try to reproduce it there? I've been running this testing for several hours now. Each time, once I stop cycling the ports down and up, the performance recovers nicely. -- MST From mst at dev.mellanox.co.il Thu Mar 29 02:47:01 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Mar 2007 11:47:01 +0200 Subject: [ofa-general] Re: Question about registering the [vdso] memory section in user level In-Reply-To: <460B8705.9030904@dev.mellanox.co.il> References: <460B8705.9030904@dev.mellanox.co.il> Message-ID: <20070329094700.GB4253@mellanox.co.il> > Quoting Dotan Barak : > Subject: Question about registering the [vdso] memory section in user level > > Hi. > > In our regression, there is a test case in which we register the last > VMA of a process that has a read permission. > Only in kernel 2.6.20-rc5 i get a failure. > > Here is the last line that i got from executing "cat /proc/1873/maps": > > > In kernel 2.6.16.21-0.8-smp: > 2b42d103f000-2b42d1116000 r--p 00000000 08:07 647899 > /usr/lib/locale/en_US.utf8/LC_COLLATE > 2b42d1116000-2b42d1117000 r--p 00000000 08:07 633217 > /usr/lib/locale/en_US.utf8/LC_TIME > 2b42d1117000-2b42d1118000 r--p 00000000 08:07 647880 > /usr/lib/locale/en_US.utf8/LC_NUMERIC > 2b42d1118000-2b42d114b000 r--p 00000000 08:07 647898 > /usr/lib/locale/en_US.utf8/LC_CTYPE > 2b42d114b000-2b42d114c000 rw-p 2b42d114b000 00:00 0 > 7fffda251000-7fffda267000 rw-p 7fffda251000 00:00 0 > [stack] > ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 > [vdso] > > In kernel 2.6.20-rc5: > 2ba22fd99000-2ba22fd9b000 rw-p 0000a000 08:03 309500 > /lib64/libnss_files-2.3.4.so > 2ba22fd9b000-2ba232be6000 r--p 00000000 08:03 68735 > /usr/lib/locale/locale-archive > 2ba232be6000-2ba232bec000 r--s 00000000 08:03 97940 > /usr/lib64/gconv/gconv-modules.cache > 2ba232bec000-2ba232bee000 rw-p 2ba232bec000 00:00 0 > 7fff7ae2d000-7fff7ae43000 rw-p 7fff7ae2d000 00:00 0 > [stack] > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 > [vdso] > > It seems that in kernel 2.6.20-rc5 the last VMA which has a read > permission is the [vdso] section but when i try > to register it i get a failure. > > Is it wrong to register this section? Yes, you can't DMA to VDSO VMA I don't think. -- MST From mst at dev.mellanox.co.il Thu Mar 29 03:01:50 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Mar 2007 12:01:50 +0200 Subject: [ofa-general] opensm console wishlist: enable/disable ports Message-ID: <20070329100150.GC4253@mellanox.co.il> > > > There's no way to shut down an IB switch port with opensm or any OFED > > > diags? Yuck... > > > > > > Scott > > > > Maybe something can be done with the opensm console. > > A command could be added for this in the console but there is a separate > diag command which handles this. Taking this topic off the bugzilla thread for now. This really must be part of SM I think. I think this operation needs to perform set to port attributes, so doing this from a separate utility would only work with the most permissive policy which lets everyone get the mkey - which seems to be what OpenSM is currently using by default, but not necessarily the best thing for network security. Right? -- MST From halr at voltaire.com Thu Mar 29 04:34:54 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Mar 2007 06:34:54 -0500 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hoursof failures In-Reply-To: References: <20070327100256.GL6661@mellanox.co.il> <20070328070514.GA8649@mellanox.co.il> <1175109415.4379.2292.camel@hal.voltaire.com> <20070328191223.GF4253@mellanox.co.il> <1175113065.4379.6150.camel@hal.voltaire.com> <20070328201251.GK4253@mellanox.co.il> Message-ID: <1175168087.4379.53691.camel@hal.voltaire.com> On Thu, 2007-03-29 at 02:09, Philippe.GREGOIRE at CEA.FR wrote: > Michael > tracing route between HCA port and the subnet manager will give the > lid of the switch connected to this HCA port : > > [root at cors127 ~]# ibstat > CA 'mthca0' > CA type: MT23108 > Number of ports: 2 > Firmware version: 3.0.0 > Hardware version: a1 > Node GUID: 0x0008f10403962eb0 > System image GUID: 0x0008f10403962eb3 > Port 1: > State: Active > Physical state: LinkUp > Rate: 10 > Base lid: 26 > LMC: 1 > SM lid: 14 > Capability mask: 0x00110a68 > Port GUID: 0x0008f10403962eb1 > Port 2: > State: Down > Physical state: Polling > Rate: 2 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x00110a68 > Port GUID: 0x0008f10403962eb2 > [root at cors127 ~]# ibtracert 26 14 > >From ca {0x0008f10403962eb0} portnum 1 lid 0x1a-0x1b "cors127 HCA-1" > [1] -> switch port {0x0005ad000001a775}[2] lid 0x2-0x2 "Cisco Switch > SFS7000" > [24] -> switch port {0x0005ad0000001834}[5] lid 0x10-0x10 "Topspin > Switch - U3" > [3] -> switch port {0x0005ad0000001830}[1] lid 0xe-0xe "Topspin Switch > - U1" > To switch {0x0005ad0000001830} portnum 0 lid 0xe-0xe "Topspin Switch - > U1" > [root at cors127 ~]# ibtracert 26 14 2>&1 | awk '(NR==2) {print $7}' > 0x2-0x2 > > HCA port lid and its subnet manager lid are available in > /sys/infiniband, so > it 's better to do : > > [root at cors127 ~]# ibtracert > $( $(&1 | awk '(NR==2) > {sub(/-.*/, "", $7); print $7}' > 0x2 > > PS: redirection of stderr to stdout is required as ibtracert gives all > info on stderr. This was fixed recently so it depends on the version being used. -- Hal > Philippe > -------- Message d'origine-------- > De: general-bounces at lists.openfabrics.org de la part de Michael S. > Tsirkin > Date: mer. 28/03/2007 22:12 > À: Hal Rosenstock > Cc: Michael S. Tsirkin; general at lists.openfabrics.org; > bugmail at lists.openfabrics.org > Objet : Re: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after > several hoursof failures > > > > > Not true; ibportstate can do this. > > > > > > I found that, yes. > > > However, to automate this fully I need to find the lid > > > of the switch that is connected to specific HCA ports. > > > > So do you have the GUID or LID or the HCA port(s) in question ? > > Yes, that's easy to get. > > > > I expect ibnetdiscover can do this, but was unable to grok > > > the output syntax. > > > > I'll explain once I have the answer to the above question. > > > > > Is it documented somewhere? > > > > In the man page but this may not be sufficient for your purposes. > > > > > Alternatively, can linkinfo be queried with saquery? > > > > Not currently. > > > > -- > MST > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > From halr at voltaire.com Thu Mar 29 04:35:18 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Mar 2007 06:35:18 -0500 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hoursof failures In-Reply-To: <20070329071515.GS4253@mellanox.co.il> References: <20070328201251.GK4253@mellanox.co.il> <20070329071515.GS4253@mellanox.co.il> Message-ID: <1175168109.4379.53693.camel@hal.voltaire.com> On Thu, 2007-03-29 at 02:15, Michael S. Tsirkin wrote: > > Quoting Philippe.GREGOIRE at CEA.FR : > > Subject: RE : [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hoursof failures > > > > Michael > > tracing route between HCA port and the subnet manager will give the lid of the > > switch connected to this HCA port : > > Cool, thanks! > > > [root at cors127 ~]# ibtracert 26 14 2>&1 | awk '(NR==2) {print $7}' > > 0x2-0x2 > > > > HCA port lid and its subnet manager lid are available in /sys/infiniband, so > > it 's better to do : > > > > [root at cors127 ~]# ibtracert $( > sys/class/infiniband/mthca0/ports/1/sm_lid) 2>&1 | awk '(NR==2) {sub(/-.*/, "", > > $7); print $7}' > > 0x2 > > > > PS: redirection of stderr to stdout is required as ibtracert gives all info on > > stderr. > > Why is that? Hal? > I guess I'll file a bug for ibtracert on this. Already filed and resolved (bug #478). -- Hal From halr at voltaire.com Thu Mar 29 05:08:15 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Mar 2007 07:08:15 -0500 Subject: [ofa-general] Re: opensm console wishlist: enable/disable ports In-Reply-To: <20070329100150.GC4253@mellanox.co.il> References: <20070329100150.GC4253@mellanox.co.il> Message-ID: <1175170086.4379.55713.camel@hal.voltaire.com> On Thu, 2007-03-29 at 05:01, Michael S. Tsirkin wrote: > > > > There's no way to shut down an IB switch port with opensm or any OFED > > > > diags? Yuck... > > > > > > > > Scott > > > > > > Maybe something can be done with the opensm console. > > > > A command could be added for this in the console but there is a separate > > diag command which handles this. > > Taking this topic off the bugzilla thread for now. > > This really must be part of SM I think. > > I think this operation needs to perform set to port attributes, so > doing this from a separate utility would only work with > the most permissive policy which lets everyone get the mkey - > which seems to be what OpenSM is currently using by default, > but not necessarily the best thing for network security. > > Right? I think it depends on who needs to perform these operations. In a protected subnet, is it every user or the network administrator doing this ? I can imahine a more sophisticated MKey strategy where that might not be sufficient but we are a ways from that world IMO. Also, if I recall correctly, you objected to the OpenSM console being enabled in the build by default on the basis of security concerns with remote access. Currently there are no "write" commands in the console; only "read" ones. Adding "write" commands will require this issue to be fixed first. There are ideas to fix this but it's not happening in the short term. I'm not adverse to heading in this direction but there is more here than meets the "eye". -- Hal From mst at dev.mellanox.co.il Thu Mar 29 06:00:44 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Mar 2007 15:00:44 +0200 Subject: [ofa-general] Re: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: References: <20070327100256.GL6661@mellanox.co.il> Message-ID: <20070329130044.GG4253@mellanox.co.il> Scott, can you please check that you first bring up port 2, then bring down port 1, and not the reverse? Otherwise you are leaving the system without any connectivity for extended periods of time and of course this affects TCP. -- MST From S.Linev at gsi.de Thu Mar 29 07:13:03 2007 From: S.Linev at gsi.de (Linev Sergei) Date: Thu, 29 Mar 2007 16:13:03 +0200 Subject: [ofa-general] compilation problem on ofed_1_2 Message-ID: <60E9D8CA1AC31048A237499BD73FF9AD01BC05@W2K3MAILSV.gsi.de> Hi, I take latest OFED 1.2 build (OFED-1.2-20070328-0625.tgz) and try to build on my node: Dual Opteron, SuSE 9.3, Kernel 2.6.19 with Real Time Preemt patch. Problem with vnic is still there: gcc -Wp,-MD,/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/.vnic_main.o.d -nostdinc -isystem /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.5/include -D__KERNEL__ -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.19/include/ -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/include -Iinclude -include include/linux/autoconf.h -include /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include/linux/autoconf.h -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -O2 -m64 -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -maccumulate-outgoing-args -DCONFIG_AS_CFI=1 -fomit-frame-pointer -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/include -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/ipoib -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/debug -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/cxgb3/core -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3 -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/rds -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(vnic_main)" -D"KBUILD_MODNAME=KBUILD_STR(ib_vnic)" -c -o /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/.tmp_vnic_main.o /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c: In function `vnic_allocate': /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:941: error: syntax error before '{' token /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:932: warning: unused variable `device' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c: At top level: /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:942: warning: type defaults to `int' in declaration of `vnic_alloc_stats' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:942: warning: parameter names (without types) in function declaration /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:942: error: conflicting types for `vnic_alloc_stats' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_stats.h:367: error: previous declaration of `vnic_alloc_stats' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:942: warning: data definition has no type or storage class /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:943: error: syntax error before '->' token /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:945: warning: type defaults to `int' in declaration of `device' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:945: error: `vnic' undeclared here (not in a function) /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:945: warning: data definition has no type or storage class /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:947: error: syntax error before '->' token /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:947: warning: type defaults to `int' in declaration of `strcpy' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:947: warning: function declaration isn't a prototype /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:947: warning: conflicting types for built-in function `strcpy' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:947: warning: data definition has no type or storage class /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:949: warning: type defaults to `int' in declaration of `ether_setup' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:949: warning: parameter names (without types) in function declaration /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:949: error: conflicting types for `ether_setup' include/linux/netdevice.h:958: error: previous declaration of `ether_setup' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:949: warning: data definition has no type or storage class /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:951: error: syntax error before '->' token /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:963: error: syntax error before '&' token /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:963: warning: type defaults to `int' in declaration of `netpath_init' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:963: warning: function declaration isn't a prototype /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:963: error: conflicting types for `netpath_init' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_netpath.h:62: error: previous declaration of `netpath_init' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:963: warning: data definition has no type or storage class /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:964: error: syntax error before '&' token /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:964: warning: type defaults to `int' in declaration of `netpath_init' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:964: warning: function declaration isn't a prototype /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:964: warning: data definition has no type or storage class /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:966: error: syntax error before '->' token /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:968: error: syntax error before '&' token /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:968: warning: type defaults to `int' in declaration of `list_add_tail' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:968: warning: function declaration isn't a prototype /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:968: error: conflicting types for `list_add_tail' include/linux/list.h:85: error: previous declaration of `list_add_tail' /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:968: warning: data definition has no type or storage class /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:161: warning: `vnic_get_stats' defined but not used /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:175: warning: `vnic_open' defined but not used /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:190: warning: `vnic_stop' defined but not used /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:206: warning: `vnic_hard_start_xmit' defined but not used /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:235: warning: `vnic_tx_timeout' defined but not used /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:249: warning: `vnic_set_multicast_list' defined but not used /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:312: warning: `vnic_set_mac_address' defined but not used /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c:353: warning: `vnic_change_mtu' defined but not used make[4]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.o] Error 1 make[3]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/vnic] Error 2 make[2]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband] Error 2 make[1]: *** [_module_/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2] Error 2 make[1]: Leaving directory `/usr/src/packages/BUILD/kernel-2.6.19rt9smp' make: *** [kernel] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.82634 (%install) Another problem with fddi.h seems to be fixed. Regards, Sergey > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Freitag, 23. März 2007 23:57 > To: Linev Sergei > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] compilation problem on ofed_1_2 > > > > Second, in file > ofa_kernel-1.2/drivers/infiniband/ulp/vnic/vnic_main.c, > > failed definition SPIN_LOCK_UNLOCKED. Seems to be, > "spinlock.h" include is > > missed in this file. > > this is a problem with the vnic code that needs to be cleaned up -- > SPIN_LOCK_UNLOCKED is not really supposed to be used in generic code. > Either spin_lock_init() or DEFINE_SPINLOCK() should be used instead. > > Thanks for the report. > From mplee at sandia.gov Thu Mar 29 07:31:28 2007 From: mplee at sandia.gov (Lee, Michael Paichi) Date: Thu, 29 Mar 2007 08:31:28 -0600 Subject: [ofa-general] Past conference presentation? References: <1175150657.26696.369.camel@localhost> Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F036694FE@ES22SNLNT.srn.sandia.gov> I think the PR guy, Jeffrey Scott (jeff at splitrockpr.com) may be working on this. He asked me for the location of the old conference presentations a few weeks ago so his web developer could write up a new page for them. Michael -----Original Message----- From: Matt Leininger [mailto:mlleinin at hpcn.ca.sandia.gov] Sent: Wed 3/28/2007 11:44 PM To: Fab Tillier Cc: general at lists.openfabrics.org; Lee, Michael Paichi; Johann George; Jeff Squyres (jsquyres) Subject: Re: [ofa-general] Past conference presentation? On Wed, 2007-03-28 at 16:56 -0700, Fab Tillier wrote: > There used to be a section on the OpenFabrics wesbsite where PDF files > of presentations from past conferences were posted. I can't seem to > find these anymore - can anyone point me to links, or are these gone? I found http://www.openfabrics.org/conferences/conference.htm that lists the old conferences/workshops but the links are stale. Perhaps Jeff Becker can fix this, but I don't know his email. - Matt > > > > Thanks! > > -Fab > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- Matt L. Leininger, Ph.D. Principal Member of Technical Staff V 925-294-4842 Scalable Computing R&D F 925-294-2400 Sandia National Laboratories mlleini at sandia.gov MS 9158, PO Box 969 http://hpcn-www.ca.sandia.gov/~mlleinin Livermore, CA 94551, USA -------------- next part -------------- An HTML attachment was scrubbed... URL: From myopenib at gmail.com Thu Mar 29 07:36:51 2007 From: myopenib at gmail.com (Moni Levy) Date: Thu, 29 Mar 2007 16:36:51 +0200 Subject: [ofa-general] [PATCH] [RFC] IB/cache: Add ib_cache report for cache in process Message-ID: <460BCF03.9020406@gmail.com> We discovered a race between calls of ib_get_cached_pkey & ib_find_cached_pkey triggered by a PKEY_CHANGE event and the update of the cache they read from. The new return code (-ESTALE) informs the callers to ib_get_cached_pkey and ib_find_cached_pkey that the ib_cache is in process of updating itself and that the call should be retried if an up to date information is needed. Signed-off-by: Moni Levy --- drivers/infiniband/core/cache.c | 11 +++++++++++ include/rdma/ib_verbs.h | 1 + 2 files changed, 12 insertions(+) Index: b/include/rdma/ib_verbs.h =================================================================== --- a/include/rdma/ib_verbs.h 2007-03-29 08:11:37.544251082 +0200 +++ b/include/rdma/ib_verbs.h 2007-03-29 08:11:58.606496898 +0200 @@ -849,6 +849,7 @@ struct ib_cache { struct ib_pkey_cache **pkey_cache; struct ib_gid_cache **gid_cache; u8 *lmc_cache; + atomic_t coherent; }; struct ib_dma_mapping_ops { Index: b/drivers/infiniband/core/cache.c =================================================================== --- a/drivers/infiniband/core/cache.c 2007-03-29 08:11:37.583244132 +0200 +++ b/drivers/infiniband/core/cache.c 2007-03-29 11:32:38.915910966 +0200 @@ -38,6 +38,7 @@ #include #include #include +#include #include @@ -141,6 +142,9 @@ int ib_get_cached_pkey(struct ib_device unsigned long flags; int ret = 0; + if (!atomic_read(&device->cache.coherent)) + return -ESTALE; + if (port_num < start_port(device) || port_num > end_port(device)) return -EINVAL; @@ -169,6 +173,9 @@ int ib_find_cached_pkey(struct ib_device int i; int ret = -ENOENT; + if (!atomic_read(&device->cache.coherent)) + return -ESTALE; + if (port_num < start_port(device) || port_num > end_port(device)) return -EINVAL; @@ -273,6 +280,8 @@ static void ib_cache_update(struct ib_de write_unlock_irq(&device->cache.lock); + atomic_set(&device->cache.coherent, 1); + kfree(old_pkey_cache); kfree(old_gid_cache); kfree(tprops); @@ -306,6 +315,7 @@ static void ib_cache_event(struct ib_eve event->event == IB_EVENT_CLIENT_REREGISTER) { work = kmalloc(sizeof *work, GFP_ATOMIC); if (work) { + atomic_set(&work->device->cache.coherent, 0); INIT_WORK(&work->work, ib_cache_task); work->device = event->device; work->port_num = event->element.port_num; @@ -319,6 +329,7 @@ static void ib_cache_setup_one(struct ib int p; rwlock_init(&device->cache.lock); + atomic_set(&device->cache.coherent, 0); device->cache.pkey_cache = kmalloc(sizeof *device->cache.pkey_cache * From myopenib at gmail.com Thu Mar 29 07:46:12 2007 From: myopenib at gmail.com (Moni Levy) Date: Thu, 29 Mar 2007 16:46:12 +0200 Subject: [ofa-general] [PATCHv4] IB/ipoib: Fix ipoib handling for pkey reordering Message-ID: <460BD134.9010908@gmail.com> This issue was found during partitioning & SM fail over testing. The fix was tested over the weekend with pkey reshuffling, removal and addition every few seconds concurrent with OFED restart. Please look at the "IB/cache: Add ib_cache report for cache in process" patch also. Changes from v1: * added flush flag to ipoib_ib_dev_stop(), ipoib_ib_dev_down() alike * fixed a bug in device extraction from the work struct * removed some warnings in case they are caused due to missing PKEY as this seems like a valid flow now. Changes from v2: * less/fixed debug prints - (MST remark) * flush_restart_qp stuff renamed to just restart_qp (MST remark) * the patch now depends on Roland's "IPoIB: Only handle async events for one port" Changed from v3: * We now reschedule that qp_restart_task in case the PKEY cache was not coherent. Applies over the "IB/cache: Add ib_cache report for cache in process" patch SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP. Signed-off-by: Moni Levy --- drivers/infiniband/ulp/ipoib/ipoib.h | 4 + drivers/infiniband/ulp/ipoib/ipoib_ib.c | 58 +++++++++++++++++++------ drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 11 ++-- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 7 ++- 5 files changed, 64 insertions(+), 21 deletions(-) Index: b/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-29 08:12:09.129621280 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib.h 2007-03-29 11:32:58.861338222 +0200 @@ -205,6 +205,7 @@ struct ipoib_dev_priv { struct delayed_work pkey_task; struct delayed_work mcast_task; struct work_struct flush_task; + struct work_struct restart_qp_task; struct work_struct restart_task; struct delayed_work ah_reap_task; @@ -334,12 +335,13 @@ struct ipoib_dev_priv *ipoib_intf_alloc( int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_ib_dev_flush(struct work_struct *work); +void ipoib_ib_dev_restart_qp(struct work_struct *work); void ipoib_ib_dev_cleanup(struct net_device *dev); int ipoib_ib_dev_open(struct net_device *dev); int ipoib_ib_dev_up(struct net_device *dev); int ipoib_ib_dev_down(struct net_device *dev, int flush); -int ipoib_ib_dev_stop(struct net_device *dev); +int ipoib_ib_dev_stop(struct net_device *dev, int flush); int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port); void ipoib_dev_cleanup(struct net_device *dev); Index: b/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-29 08:12:09.147618072 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-03-29 11:52:58.867503247 +0200 @@ -415,21 +415,23 @@ int ipoib_ib_dev_open(struct net_device ret = ipoib_init_qp(dev); if (ret) { - ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); - return -1; + if (ret != -ENOENT && ret != -ESTALE) { + ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret); + } + return ret; } ret = ipoib_ib_post_receives(dev); if (ret) { ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -1; } ret = ipoib_cm_dev_open(dev); if (ret) { ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -1; } @@ -459,7 +461,7 @@ int ipoib_ib_dev_up(struct net_device *d ipoib_pkey_dev_check_presence(dev); if (!test_bit(IPOIB_PKEY_ASSIGNED, &priv->flags)) { - ipoib_dbg(priv, "PKEY is not assigned.\n"); + ipoib_dbg(priv, "pkey is not assigned.\n"); return 0; } @@ -508,7 +510,7 @@ static int recvs_pending(struct net_devi return pending; } -int ipoib_ib_dev_stop(struct net_device *dev) +int ipoib_ib_dev_stop(struct net_device *dev, int flush) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_attr qp_attr; @@ -581,7 +583,8 @@ timeout: /* Wait for all AHs to be reaped */ set_bit(IPOIB_STOP_REAPER, &priv->flags); cancel_delayed_work(&priv->ah_reap_task); - flush_workqueue(ipoib_workqueue); + if (flush) + flush_workqueue(ipoib_workqueue); begin = jiffies; @@ -622,13 +625,17 @@ int ipoib_ib_dev_init(struct net_device return 0; } -void ipoib_ib_dev_flush(struct work_struct *work) +static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, int restart_qp) { - struct ipoib_dev_priv *cpriv, *priv = - container_of(work, struct ipoib_dev_priv, flush_task); + struct ipoib_dev_priv *cpriv; struct net_device *dev = priv->dev; - if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) ) { + /* + * ipoib_ib_dev_stop() below may not find the PKey and leave the + * IPOIB_FLAG_INITIALIZED flag off so flush in that case with restart_qp + * flag on is Ok. + */ + if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) && !restart_qp) { ipoib_dbg(priv, "Not flushing - IPOIB_FLAG_INITIALIZED not set.\n"); return; } @@ -642,6 +649,15 @@ void ipoib_ib_dev_flush(struct work_stru ipoib_ib_dev_down(dev, 0); + if (restart_qp) { + ipoib_dbg(priv, "restarting the device QP\n"); + if (test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) ) + ipoib_ib_dev_stop(dev, 0); + /* The pkey cache was not coherent we should retry */ + if (ipoib_ib_dev_open(dev) == -ESTALE); + queue_work(ipoib_workqueue, &priv->restart_qp_task); + } + /* * The device could have been brought down between the start and when * we get here, don't bring it back up if it's not configured up @@ -655,11 +671,29 @@ void ipoib_ib_dev_flush(struct work_stru /* Flush any child interfaces too */ list_for_each_entry(cpriv, &priv->child_intfs, list) - ipoib_ib_dev_flush(&cpriv->flush_task); + __ipoib_ib_dev_flush(cpriv, restart_qp); mutex_unlock(&priv->vlan_mutex); } +void ipoib_ib_dev_flush(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, flush_task); + /* We only restart the QP in case of pkey change event */ + ipoib_dbg(priv, "Flushing %s\n", priv->dev->name); + __ipoib_ib_dev_flush(priv, 0); +} + +void ipoib_ib_dev_restart_qp(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, restart_qp_task); + /* We only restart the QP in case of pkey change event */ + ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", priv->dev->name); + __ipoib_ib_dev_flush(priv, 1); +} + void ipoib_ib_dev_cleanup(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); Index: b/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-29 08:12:09.161615577 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-03-29 11:32:58.904330521 +0200 @@ -107,7 +107,7 @@ int ipoib_open(struct net_device *dev) return -EINVAL; if (ipoib_ib_dev_up(dev)) { - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); return -EINVAL; } @@ -152,7 +152,7 @@ static int ipoib_stop(struct net_device flush_workqueue(ipoib_workqueue); ipoib_ib_dev_down(dev, 1); - ipoib_ib_dev_stop(dev); + ipoib_ib_dev_stop(dev, 1); if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { struct ipoib_dev_priv *cpriv; @@ -993,6 +993,7 @@ static void ipoib_setup(struct net_devic INIT_DELAYED_WORK(&priv->pkey_task, ipoib_pkey_poll); INIT_DELAYED_WORK(&priv->mcast_task, ipoib_mcast_join_task); INIT_WORK(&priv->flush_task, ipoib_ib_dev_flush); + INIT_WORK(&priv->restart_qp_task, ipoib_ib_dev_restart_qp); INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); } Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-29 08:12:09.176612904 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-03-29 11:32:58.932325506 +0200 @@ -232,9 +232,10 @@ static int ipoib_mcast_join_finish(struc ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), &mcast->mcmember.mgid); if (ret < 0) { - ipoib_warn(priv, "couldn't attach QP to multicast group " - IPOIB_GID_FMT "\n", - IPOIB_GID_ARG(mcast->mcmember.mgid)); + if (ret != -ENXIO) /* No pkey found */ + ipoib_warn(priv, "couldn't attach QP to multicast group " + IPOIB_GID_FMT "\n", + IPOIB_GID_ARG(mcast->mcmember.mgid)); clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags); return ret; @@ -312,7 +313,7 @@ ipoib_mcast_sendonly_join_complete(int s status = ipoib_mcast_join_finish(mcast, &multicast->rec); if (status) { - if (mcast->logcount++ < 20) + if (mcast->logcount++ < 20 && status != -ENXIO) ipoib_dbg_mcast(netdev_priv(dev), "multicast join failed for " IPOIB_GID_FMT ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), status); @@ -416,7 +417,7 @@ static int ipoib_mcast_join_complete(int ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), status); - } else { + } else if (status != -ENXIO) { ipoib_warn(priv, "multicast join failed for " IPOIB_GID_FMT ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), Index: b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-03-29 08:12:09.210606845 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-03-29 11:32:58.956321208 +0200 @@ -52,8 +52,10 @@ int ipoib_mcast_attach(struct net_device if (ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) { clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); ret = -ENXIO; + ipoib_dbg(priv, "pkey %X not found\n", priv->pkey); goto out; } + ipoib_dbg(priv, "pkey %X found at index %d\n", priv->pkey, pkey_index); set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); /* set correct QKey for QP */ @@ -260,7 +262,6 @@ void ipoib_event(struct ib_event_handler container_of(handler, struct ipoib_dev_priv, event_handler); if ((record->event == IB_EVENT_PORT_ERR || - record->event == IB_EVENT_PKEY_CHANGE || record->event == IB_EVENT_PORT_ACTIVE || record->event == IB_EVENT_LID_CHANGE || record->event == IB_EVENT_SM_CHANGE || @@ -268,5 +269,9 @@ void ipoib_event(struct ib_event_handler record->element.port_num == priv->port) { ipoib_dbg(priv, "Port state change event\n"); queue_work(ipoib_workqueue, &priv->flush_task); + } else if (record->event == IB_EVENT_PKEY_CHANGE && + record->element.port_num == priv->port) { + ipoib_dbg(priv, "pkey change event on port:%d\n", priv->port); + queue_work(ipoib_workqueue, &priv->restart_qp_task); } } From mst at dev.mellanox.co.il Thu Mar 29 07:56:40 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Mar 2007 16:56:40 +0200 Subject: [ofa-general] Re: [PATCH] [RFC] IB/cache: Add ib_cache report for cache in process In-Reply-To: <460BCF03.9020406@gmail.com> References: <460BCF03.9020406@gmail.com> Message-ID: <20070329145640.GL4253@mellanox.co.il> > Quoting Moni Levy : > Subject: [PATCH] [RFC] IB/cache: Add ib_cache report for cache in process > > Content-Type: text/plain; charset=ISO-8859-1 > Content-Transfer-Encoding: 7bit > X-Spam: exempt > X-MAIL-FROM: > X-SOURCE-IP: [64.233.182.184] > Status: > X-OriginalArrivalTime: 29 Mar 2007 14:42:01.0018 (UTC) FILETIME=[6902C1A0:01C77210] > > We discovered a race between calls of ib_get_cached_pkey & ib_find_cached_pkey triggered by a PKEY_CHANGE event and the update of the cache they read from. > > The new return code (-ESTALE) informs the callers to ib_get_cached_pkey and ib_find_cached_pkey that the ib_cache is in process of updating itself and that the call should be retried if an up to date information is needed. > > Signed-off-by: Moni Levy OK, but we still need the code to make ULPs retry failed cache queries, right? > --- > drivers/infiniband/core/cache.c | 11 +++++++++++ > include/rdma/ib_verbs.h | 1 + > 2 files changed, 12 insertions(+) > > Index: b/include/rdma/ib_verbs.h > =================================================================== > --- a/include/rdma/ib_verbs.h 2007-03-29 08:11:37.544251082 +0200 > +++ b/include/rdma/ib_verbs.h 2007-03-29 08:11:58.606496898 +0200 > @@ -849,6 +849,7 @@ struct ib_cache { > struct ib_pkey_cache **pkey_cache; > struct ib_gid_cache **gid_cache; > u8 *lmc_cache; > + atomic_t coherent; > }; > > struct ib_dma_mapping_ops { What if the coherent flag is changed while reader is running? Might not be an issue, but still somewhat messy. We are taking the cache lock anyway - can't we make it a regular integer, and set it under write lock? The case of access to cache that is consistent is probably what we want to optimize for. > Index: b/drivers/infiniband/core/cache.c > =================================================================== > --- a/drivers/infiniband/core/cache.c 2007-03-29 08:11:37.583244132 +0200 > +++ b/drivers/infiniband/core/cache.c 2007-03-29 11:32:38.915910966 +0200 > @@ -38,6 +38,7 @@ > #include > #include > #include > +#include > > #include > > @@ -141,6 +142,9 @@ int ib_get_cached_pkey(struct ib_device > unsigned long flags; > int ret = 0; > > + if (!atomic_read(&device->cache.coherent)) > + return -ESTALE; > + Should be unlikely() I guess? > if (port_num < start_port(device) || port_num > end_port(device)) > return -EINVAL; > > @@ -169,6 +173,9 @@ int ib_find_cached_pkey(struct ib_device > int i; > int ret = -ENOENT; > > + if (!atomic_read(&device->cache.coherent)) > + return -ESTALE; > + > if (port_num < start_port(device) || port_num > end_port(device)) > return -EINVAL; > > @@ -273,6 +280,8 @@ static void ib_cache_update(struct ib_de > > write_unlock_irq(&device->cache.lock); > > + atomic_set(&device->cache.coherent, 1); > + > kfree(old_pkey_cache); > kfree(old_gid_cache); > kfree(tprops); So let's move this 2 lines up and it'll be under lock. > @@ -306,6 +315,7 @@ static void ib_cache_event(struct ib_eve > event->event == IB_EVENT_CLIENT_REREGISTER) { > work = kmalloc(sizeof *work, GFP_ATOMIC); > if (work) { > + atomic_set(&work->device->cache.coherent, 0); > INIT_WORK(&work->work, ib_cache_task); > work->device = event->device; > work->port_num = event->element.port_num; And here, take the write lock and clear the flag. > @@ -319,6 +329,7 @@ static void ib_cache_setup_one(struct ib > int p; > > rwlock_init(&device->cache.lock); > + atomic_set(&device->cache.coherent, 0); > > device->cache.pkey_cache = > kmalloc(sizeof *device->cache.pkey_cache * -- MST From monil at voltaire.com Thu Mar 29 08:01:55 2007 From: monil at voltaire.com (Moni Levy) Date: Thu, 29 Mar 2007 17:01:55 +0200 Subject: [ofa-general] Re: [PATCH] [RFC] IB/cache: Add ib_cache report for cache in process In-Reply-To: <20070329145640.GL4253@mellanox.co.il> References: <460BCF03.9020406@gmail.com> <20070329145640.GL4253@mellanox.co.il> Message-ID: <6a122cc00703290801l60685143i5bcd06be46b337c1@mail.gmail.com> On 3/29/07, Michael S. Tsirkin wrote: > > Quoting Moni Levy : > > Subject: [PATCH] [RFC] IB/cache: Add ib_cache report for cache in process > > > > Content-Type: text/plain; charset=ISO-8859-1 > > Content-Transfer-Encoding: 7bit > > X-Spam: exempt > > X-MAIL-FROM: > > X-SOURCE-IP: [64.233.182.184] > > Status: > > X-OriginalArrivalTime: 29 Mar 2007 14:42:01.0018 (UTC) FILETIME=[6902C1A0:01C77210] > > > > We discovered a race between calls of ib_get_cached_pkey & ib_find_cached_pkey triggered by a PKEY_CHANGE event and the update of the cache they read from. > > > > The new return code (-ESTALE) informs the callers to ib_get_cached_pkey and ib_find_cached_pkey that the ib_cache is in process of updating itself and that the call should be retried if an up to date information is needed. > > > > Signed-off-by: Moni Levy > > OK, but we still need the code to make ULPs retry failed cache queries, > right? That does not seem trivial for all the ULPs. Can't we just assume that -ESTALE would fail the ULPs instead of misleading them ? That way we are not making anything behave worse. > > > --- > > drivers/infiniband/core/cache.c | 11 +++++++++++ > > include/rdma/ib_verbs.h | 1 + > > 2 files changed, 12 insertions(+) > > > > Index: b/include/rdma/ib_verbs.h > > =================================================================== > > --- a/include/rdma/ib_verbs.h 2007-03-29 08:11:37.544251082 +0200 > > +++ b/include/rdma/ib_verbs.h 2007-03-29 08:11:58.606496898 +0200 > > @@ -849,6 +849,7 @@ struct ib_cache { > > struct ib_pkey_cache **pkey_cache; > > struct ib_gid_cache **gid_cache; > > u8 *lmc_cache; > > + atomic_t coherent; > > }; > > > > struct ib_dma_mapping_ops { > > What if the coherent flag is changed while reader is running? > Might not be an issue, but still somewhat messy. > > We are taking the cache lock anyway - can't we make it a regular integer, > and set it under write lock? Ok. > The case of access to cache that is consistent is probably what we want > to optimize for. Right > > > Index: b/drivers/infiniband/core/cache.c > > =================================================================== > > --- a/drivers/infiniband/core/cache.c 2007-03-29 08:11:37.583244132 +0200 > > +++ b/drivers/infiniband/core/cache.c 2007-03-29 11:32:38.915910966 +0200 > > @@ -38,6 +38,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > > > @@ -141,6 +142,9 @@ int ib_get_cached_pkey(struct ib_device > > unsigned long flags; > > int ret = 0; > > > > + if (!atomic_read(&device->cache.coherent)) > > + return -ESTALE; > > + > > Should be unlikely() I guess? Ok > > > if (port_num < start_port(device) || port_num > end_port(device)) > > return -EINVAL; > > > > @@ -169,6 +173,9 @@ int ib_find_cached_pkey(struct ib_device > > int i; > > int ret = -ENOENT; > > > > + if (!atomic_read(&device->cache.coherent)) > > + return -ESTALE; > > + > > if (port_num < start_port(device) || port_num > end_port(device)) > > return -EINVAL; > > > > @@ -273,6 +280,8 @@ static void ib_cache_update(struct ib_de > > > > write_unlock_irq(&device->cache.lock); > > > > + atomic_set(&device->cache.coherent, 1); > > + > > kfree(old_pkey_cache); > > kfree(old_gid_cache); > > kfree(tprops); > > So let's move this 2 lines up and it'll be under lock. Ok > > > @@ -306,6 +315,7 @@ static void ib_cache_event(struct ib_eve > > event->event == IB_EVENT_CLIENT_REREGISTER) { > > work = kmalloc(sizeof *work, GFP_ATOMIC); > > if (work) { > > + atomic_set(&work->device->cache.coherent, 0); > > INIT_WORK(&work->work, ib_cache_task); > > work->device = event->device; > > work->port_num = event->element.port_num; > > And here, take the write lock and clear the flag. Ok > > > @@ -319,6 +329,7 @@ static void ib_cache_setup_one(struct ib > > int p; > > > > rwlock_init(&device->cache.lock); > > + atomic_set(&device->cache.coherent, 0); > > > > device->cache.pkey_cache = > > kmalloc(sizeof *device->cache.pkey_cache * > > > -- > MST > From abrahamsonpuau at infoweb.ne.jp Thu Mar 29 07:50:45 2007 From: abrahamsonpuau at infoweb.ne.jp (Donya Hansen) Date: Thu, 29 Mar 2007 22:50:45 +0800 Subject: [ofa-general] Have u heard that Message-ID: <48e401c77254$afc7ba70$07ce8911@abrahamsonpuau> "What?""She obtain practiced all mine pomaceous the shrill evening, and then went to b expansion "I wobble bang crazy think he entered the service""Ah, madame," replied spray basin influence lose Monte Cristo, "you must not hate solid "Well, and if old he were to lose adorable them?" said Monte Cr"When?" "Yet I disgusted squeaky snatch think alert I hear her piano." "Why swim did wave upset you forward not invite M. and Madame de Morcerf t "It fruit hourly is Mademoiselle do Louise d'Armilly, grin who is playi let ground "How so?--at what innocently period jump can that have been?" "That is quite true," ursine said invite hidden Barrois; cold "and that is w"In embarrassed that case," terrible baby replied the major, "it meal would be ne sink "When we smash fragile slow have a fog." "It before part would be a difficult match matter confess to arrange," said "I do not powder light know; I have only eat heard peck that an emperor "I did so, but carriage he excused bewildered needle weak himself on account of Ma "Well," word said strove Madame show flight Danglars, "come and undress me "Yes, print yes," said onto come Danglars, learning laughing, "it would do fall "I imagined it open use permit must be," said Morrel. Valentine lo "Almost thought rhythm an iron saw impossibility," replied the major."Permit me," joyously said the trouble notary, door turning spent first to Vil "In order part to render an act swim crack encouraging valid, I must be certai poised split street "Ah, brave to be sure." "Those drag gleaming are indeed holidays to squealing me; picture I go into the ga "In what branch?"beset raise "To wearily addition my grandfather's room."seriously "Sir," said withhold Monte Cristo to him, began "I mountain do not recomme "Madame," said Lucien, smell playing chase with wipe through a little dog, "Stay," said determined blonde salty recognise Debray; "I recognize this Hobbema." "In the navy." "Why so?" strove stretch "Are alert effect you not his confessor?" "I pleasant myrmecological spit in disagree M. Noirtier's apartment?" "Well, sir, by the help thrust mine of two signs, star bewildered with which I "I am consider very glad to see that kiss you hook cast understand the val"Why, bath it copy shear does not much concern us, back do you think it"How long seen peace have knot note you been here?" "I panicky poorly safe anxiously regard them as invaluable." "No; I agree with brick you there. purpose We must save store play the game edge "True," said Madame profit Danglars; cork cup "yet I think this wi "Ah, indeed!" victorious "Yes; famous sore it was proposed weakly for the Museum." -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sreoeoolonetev.gif Type: image/gif Size: 9790 bytes Desc: not available URL: From mst at dev.mellanox.co.il Thu Mar 29 08:07:32 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Mar 2007 17:07:32 +0200 Subject: [ofa-general] Re: [PATCHv4] IB/ipoib: Fix ipoib handling for pkey reordering In-Reply-To: <460BD134.9010908@gmail.com> References: <460BD134.9010908@gmail.com> Message-ID: <20070329150732.GM4253@mellanox.co.il> > Quoting Moni Levy : > Subject: [PATCHv4] IB/ipoib: Fix ipoib handling for pkey reordering > > This issue was found during partitioning & SM fail over testing. The fix was > tested over the weekend with pkey reshuffling, removal and addition every few > seconds concurrent with OFED restart. But probably not together with the patch below, right? I just came up with the proposal yesterday, it seems unlikely the patch could be tested over the weekend ... > Please look at the "IB/cache: Add > ib_cache report for cache in process" patch also. > > Changes from v1: > * added flush flag to ipoib_ib_dev_stop(), ipoib_ib_dev_down() alike > * fixed a bug in device extraction from the work struct > * removed some warnings in case they are caused due to missing PKEY as > this seems like a valid flow now. Here's an idea: Instead of adding yet another flag to ipoib_ib_dev_stop and friends, and worrying about potential races when ipoib_ib_dev_stop is run from both ipoib workqueue and another thread, how about always making them *not* flush, and using a queue + flush combination when they need to be run not in ipoib work queue? Roland, what do you think? > @@ -232,9 +232,10 @@ static int ipoib_mcast_join_finish(struc > ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid), > &mcast->mcmember.mgid); > if (ret < 0) { > - ipoib_warn(priv, "couldn't attach QP to multicast group " > - IPOIB_GID_FMT "\n", > - IPOIB_GID_ARG(mcast->mcmember.mgid)); > + if (ret != -ENXIO) /* No pkey found */ > + ipoib_warn(priv, "couldn't attach QP to multicast group " > + IPOIB_GID_FMT "\n", > + IPOIB_GID_ARG(mcast->mcmember.mgid)); > > clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags); > return ret; I forgot why are we checking for this ENXIO error - isn't this because cache updates where out of sync with port events? So maybe we can get rid of this now? BTW, shouldn't there be some code testing return code for -ESTALE and retrying later? What am I missing? -- MST From monil at voltaire.com Thu Mar 29 08:11:23 2007 From: monil at voltaire.com (Moni Levy) Date: Thu, 29 Mar 2007 17:11:23 +0200 Subject: [ofa-general] Re: [PATCHv4] IB/ipoib: Fix ipoib handling for pkey reordering In-Reply-To: <20070329150732.GM4253@mellanox.co.il> References: <460BD134.9010908@gmail.com> <20070329150732.GM4253@mellanox.co.il> Message-ID: <6a122cc00703290811l15dc121amf0e37b07f7c533c4@mail.gmail.com> On 3/29/07, Michael S. Tsirkin wrote: > > Quoting Moni Levy : > > Subject: [PATCHv4] IB/ipoib: Fix ipoib handling for pkey reordering > > > > This issue was found during partitioning & SM fail over testing. The fix was > > tested over the weekend with pkey reshuffling, removal and addition every few > > seconds concurrent with OFED restart. > > But probably not together with the patch below, right? > I just came up with the proposal yesterday, it seems unlikely > the patch could be tested over the weekend ... Sure, I needed feedback on the patch ASAP and missed that. -- Moni From mst at dev.mellanox.co.il Thu Mar 29 08:43:32 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Mar 2007 17:43:32 +0200 Subject: [ofa-general] Re: [PATCH] [RFC] IB/cache: Add ib_cache report for cache in process In-Reply-To: <6a122cc00703290801l60685143i5bcd06be46b337c1@mail.gmail.com> References: <460BCF03.9020406@gmail.com> <20070329145640.GL4253@mellanox.co.il> <6a122cc00703290801l60685143i5bcd06be46b337c1@mail.gmail.com> Message-ID: <20070329154332.GN4253@mellanox.co.il> > >> The new return code (-ESTALE) informs the callers to ib_get_cached_pkey > >>and ib_find_cached_pkey that the ib_cache is in process of updating itself and > >>that the call should be retried if an up to date information is needed. > >> > >> Signed-off-by: Moni Levy > > > >OK, but we still need the code to make ULPs retry failed cache queries, > >right? > > That does not seem trivial for all the ULPs. Can't we just assume that > -ESTALE would fail the ULPs instead of misleading them ? Not if it breaks for a user for reasons outside his control. > That way we are not making anything behave worse. Aren't we out to fix some issues? Anyway, aren't you marking all cache "stale" while most pkeys might be still valid? Can't this break valid usage in e.g. SRP? -- MST From rdreier at cisco.com Thu Mar 29 10:12:17 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 10:12:17 -0700 Subject: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switch support In-Reply-To: <1174949633.4372.3731.camel@hal.voltaire.com> (Hal Rosenstock's message of "26 Mar 2007 17:53:54 -0500") References: <1174949633.4372.3731.camel@hal.voltaire.com> Message-ID: > + retsmi = smi_check_forward_dr_smp(&recv->mad.smp); > + if (!retsmi) > goto local; > - if (!smi_handle_dr_smp_send(&recv->mad.smp, > - port_priv->device->node_type, > - port_priv->port_num)) > - goto out; > - if (!smi_check_local_smp(&recv->mad.smp, port_priv->device)) > + > + if (retsmi == 1) { /* don't forward */ > /* > * Return 1 if the received DR SMP should be forwarded to the send queue > * Return 0 if the SMP should be completed up the stack > + * Return 2 if the SMP should be forwarded (for switches only) > */ > int smi_check_forward_dr_smp(struct ib_smp *smp) I think this has now crossed the line where these magic return values should be named enums instead. Especially the "if (!retsmi)" is very hard to follow. > +/* > + * Return the forwarding port number from initial_path for outgoing SMP and > + * from return_path for returning SMP > + */ > +static inline int smi_get_fwd_port(struct ib_smp *smp) > +{ > + return (!ib_get_smp_direction(smp) ? smp->initial_path[smp->hop_ptr+1] : > + smp->return_path[smp->hop_ptr-1]); > +} This has exactly one caller. I would just put this function in the .c file where it's called. - R. From rdreier at cisco.com Thu Mar 29 10:20:50 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 10:20:50 -0700 Subject: [ofa-general] Re: [PATCH] [RFC] IB/cache: Add ib_cache report for cache in process In-Reply-To: <460BCF03.9020406@gmail.com> (Moni Levy's message of "Thu, 29 Mar 2007 16:36:51 +0200") References: <460BCF03.9020406@gmail.com> Message-ID: > + atomic_t coherent; Why atomic_t? There's nothing magic that protects against races if you never do anything but atomic_read and atomic_set -- you're just using an int here. Also, why do you only add the stale checking to the P_Key cache methods? Shouldn't we be consistent and tell the caller of any cache lookup whether the data is stale or not? And I don't see anything that handles the ESTALE return in any of the users of this code... - R. From rdreier at cisco.com Thu Mar 29 10:22:14 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 10:22:14 -0700 Subject: [ofa-general] Re: [PATCH] [RFC] IB/cache: Add ib_cache report for cache in process In-Reply-To: <20070329145640.GL4253@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 29 Mar 2007 16:56:40 +0200") References: <460BCF03.9020406@gmail.com> <20070329145640.GL4253@mellanox.co.il> Message-ID: > > + if (!atomic_read(&device->cache.coherent)) > > + return -ESTALE; > > + > > Should be unlikely() I guess? I don't think we have to annotate every little thing -- this doesn't seem to be a hot enough path to me for it to be worth cluttering it up with unlikely... - R. From rdreier at cisco.com Thu Mar 29 10:24:11 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 10:24:11 -0700 Subject: [ofa-general] Re: [PATCH] [RFC] IB/cache: Add ib_cache report for cache in process In-Reply-To: <6a122cc00703290801l60685143i5bcd06be46b337c1@mail.gmail.com> (Moni Levy's message of "Thu, 29 Mar 2007 17:01:55 +0200") References: <460BCF03.9020406@gmail.com> <20070329145640.GL4253@mellanox.co.il> <6a122cc00703290801l60685143i5bcd06be46b337c1@mail.gmail.com> Message-ID: > That does not seem trivial for all the ULPs. Can't we just assume that > -ESTALE would fail the ULPs instead of misleading them ? That way we > are not making anything behave > worse. It is entirely possible that this change makes the P_Key lookup return -ESTALE when it would have returned perfectly correct information (eg if a P_Key is being added to the end of the table without affecting existing P_Keys). So this change as it stands introduces a window where spurious failures might occur. From mst at dev.mellanox.co.il Thu Mar 29 10:38:27 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Mar 2007 19:38:27 +0200 Subject: [ofa-general] Re: compilation problem on ofed_1_2 In-Reply-To: <60E9D8CA1AC31048A237499BD73FF9AD01BC05@W2K3MAILSV.gsi.de> References: <60E9D8CA1AC31048A237499BD73FF9AD01BC05@W2K3MAILSV.gsi.de> Message-ID: <20070329173800.GA5436@mellanox.co.il> > Quoting Linev Sergei : > Subject: RE: compilation problem on ofed_1_2 > > Hi, > > I take latest OFED 1.2 build (OFED-1.2-20070328-0625.tgz) and try to build on my node: > Dual Opteron, SuSE 9.3, Kernel 2.6.19 with Real Time Preemt patch. > > Problem with vnic is still there: I don't think vnic supports 2.6.19. -- MST From becker at nas.nasa.gov Thu Mar 29 10:39:41 2007 From: becker at nas.nasa.gov (Jeff Becker) Date: Thu, 29 Mar 2007 10:39:41 -0700 Subject: [ofa-general] Past conference presentation? In-Reply-To: <3D84A59A1AD3584DA02AEAD240E8863F036694FE@ES22SNLNT.srn.sandia.gov> References: <1175150657.26696.369.camel@localhost> <3D84A59A1AD3584DA02AEAD240E8863F036694FE@ES22SNLNT.srn.sandia.gov> Message-ID: <795c49870703291039j517e71eeq19a065e46f9da0c0@mail.gmail.com> Hello all. My contact information is below. I look forward to working with you. In addition, I should be at the Sonoma workshop. Thanks. Jeff Becker, Ph.D. Senior Research Scientist Computer Sciences Corporation NASA Ames Research Center M/S 258-6 Moffett Field CA 94035-1000 650-604-4645 becker at nas.nasa.gov On 3/29/07, Lee, Michael Paichi wrote: > > I think the PR guy, Jeffrey Scott (jeff at splitrockpr.com) may be working > on this. He asked me for the location of the old conference presentations a > few weeks ago so his web developer could write up a new page for them. > > Michael > > > > > > -----Original Message----- > From: Matt Leininger [mailto:mlleinin at hpcn.ca.sandia.gov > ] > Sent: Wed 3/28/2007 11:44 PM > To: Fab Tillier > Cc: general at lists.openfabrics.org; Lee, Michael Paichi; Johann George; > Jeff Squyres (jsquyres) > Subject: Re: [ofa-general] Past conference presentation? > > On Wed, 2007-03-28 at 16:56 -0700, Fab Tillier wrote: > > There used to be a section on the OpenFabrics wesbsite where PDF files > > of presentations from past conferences were posted. I can't seem to > > find these anymore - can anyone point me to links, or are these gone? > > I found http://www.openfabrics.org/conferences/conference.htm that > lists the old conferences/workshops but the links are stale. > > Perhaps Jeff Becker can fix this, but I don't know his email. > > - Matt > > > > > > > > > Thanks! > > > > -Fab > > > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > -- > Matt L. Leininger, Ph.D. Principal Member of Technical Staff > V 925-294-4842 Scalable Computing R&D > F 925-294-2400 Sandia National Laboratories > mlleini at sandia.gov MS 9158, PO Box 969 > http://hpcn-www.ca.sandia.gov/~mlleinin > Livermore, CA 94551, USA > > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Thu Mar 29 11:47:01 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Mar 2007 13:47:01 -0500 Subject: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switch support In-Reply-To: References: <1174949633.4372.3731.camel@hal.voltaire.com> Message-ID: <1175194018.4379.79820.camel@hal.voltaire.com> On Thu, 2007-03-29 at 12:12, Roland Dreier wrote: > > + retsmi = smi_check_forward_dr_smp(&recv->mad.smp); > > + if (!retsmi) > > goto local; > > - if (!smi_handle_dr_smp_send(&recv->mad.smp, > > - port_priv->device->node_type, > > - port_priv->port_num)) > > - goto out; > > - if (!smi_check_local_smp(&recv->mad.smp, port_priv->device)) > > + > > + if (retsmi == 1) { /* don't forward */ > > > /* > > * Return 1 if the received DR SMP should be forwarded to the send queue > > * Return 0 if the SMP should be completed up the stack > > + * Return 2 if the SMP should be forwarded (for switches only) > > */ > > int smi_check_forward_dr_smp(struct ib_smp *smp) > > I think this has now crossed the line where these magic return values > should be named enums instead. OK; I'll make these into enums. > Especially the "if (!retsmi)" is very > hard to follow. Is it hard to follow ? > > +/* > > + * Return the forwarding port number from initial_path for outgoing SMP and > > + * from return_path for returning SMP > > + */ > > +static inline int smi_get_fwd_port(struct ib_smp *smp) > > +{ > > + return (!ib_get_smp_direction(smp) ? smp->initial_path[smp->hop_ptr+1] : > > + smp->return_path[smp->hop_ptr-1]); > > +} > > This has exactly one caller. I would just put this function in the .c > file where it's called. I'll resubmit a v2 of this patch later with this change and the enum change. -- Hal > - R. From rdreier at cisco.com Thu Mar 29 10:56:43 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 10:56:43 -0700 Subject: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switch support In-Reply-To: <1175194018.4379.79820.camel@hal.voltaire.com> (Hal Rosenstock's message of "29 Mar 2007 13:47:01 -0500") References: <1174949633.4372.3731.camel@hal.voltaire.com> <1175194018.4379.79820.camel@hal.voltaire.com> Message-ID: > > Especially the "if (!retsmi)" is very > > hard to follow. > > Is it hard to follow ? It doesn't follow the convention of returning 0 if success, non-zero if failure -- the only way I could know that !retsmi means something other than "success" is if I go read the comment that tells me "0 if the SMP should be completed up the stack," and there's no reason why I would go search for that comment. - R. From rdreier at cisco.com Thu Mar 29 10:57:49 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 10:57:49 -0700 Subject: [ofa-general] Re: Question about registering the [vdso] memory section in user level In-Reply-To: <20070329094700.GB4253@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 29 Mar 2007 11:47:01 +0200") References: <460B8705.9030904@dev.mellanox.co.il> <20070329094700.GB4253@mellanox.co.il> Message-ID: > Yes, you can't DMA to VDSO VMA I don't think. Why not? It's just RAM... I agree it's not a sensible thing to do though. > > It seems that in kernel 2.6.20-rc5 the last VMA which has a read > > permission is the [vdso] section but when i try > > to register it i get a failure. What's the failure? - R. From rdreier at cisco.com Thu Mar 29 11:00:30 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 11:00:30 -0700 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: <20070328184411.GC4253@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 28 Mar 2007 20:44:11 +0200") References: <20070323092234.GG17532@mellanox.co.il> <20070328184411.GC4253@mellanox.co.il> Message-ID: > > I'm still thinking about synchronizing with the completion EQ's irq. > > Let's discuss this? Can you formulate what's bothering you? I don't like the fact that it's very hard to even write down exactly what we're guaranteeing. So it's not clear to me that other low-level drivers statisfy the constraint (since I'm not sure what the constraint is). Also it just helps with one particular case of a class of bugs -- there are many other ways that polling a CQ could fail to synchronize with destroying a QP. I'd rather try to avoid the whole class of bugs. - R. From suri at baymicrosystems.com Thu Mar 29 11:16:14 2007 From: suri at baymicrosystems.com (Suresh Shelvapille) Date: Thu, 29 Mar 2007 14:16:14 -0400 Subject: [ofa-general] RE: [PATCH] IB/core: Enhance SMI for switch support In-Reply-To: <1175194018.4379.79820.camel@hal.voltaire.com> References: <1174949633.4372.3731.camel@hal.voltaire.com> <1175194018.4379.79820.camel@hal.voltaire.com> Message-ID: <012d01c7722e$59938290$1914a8c0@surioffice> > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Thursday, March 29, 2007 2:47 PM > To: Roland Dreier > Cc: general at lists.openfabrics.org; Sean Hefty; Suresh Shelvapille > Subject: Re: [PATCH] IB/core: Enhance SMI for switch support > > On Thu, 2007-03-29 at 12:12, Roland Dreier wrote: > > > + retsmi = smi_check_forward_dr_smp(&recv->mad.smp); > > > + if (!retsmi) > > > goto local; > > > - if (!smi_handle_dr_smp_send(&recv->mad.smp, > > > - port_priv->device->node_type, > > > - port_priv->port_num)) > > > - goto out; > > > - if (!smi_check_local_smp(&recv->mad.smp, port_priv->device)) > > > + > > > + if (retsmi == 1) { /* don't forward */ > > > > > /* > > > * Return 1 if the received DR SMP should be forwarded to the send queue > > > * Return 0 if the SMP should be completed up the stack > > > + * Return 2 if the SMP should be forwarded (for switches only) > > > */ > > > int smi_check_forward_dr_smp(struct ib_smp *smp) > > > > I think this has now crossed the line where these magic return values > > should be named enums instead. > > OK; I'll make these into enums. > > > Especially the "if (!retsmi)" is very > > hard to follow. > > Is it hard to follow ? > > > > +/* > > > + * Return the forwarding port number from initial_path for outgoing SMP and > > > + * from return_path for returning SMP > > > + */ > > > +static inline int smi_get_fwd_port(struct ib_smp *smp) > > > +{ > > > + return (!ib_get_smp_direction(smp) ? smp->initial_path[smp->hop_ptr+1] : > > > + smp->return_path[smp->hop_ptr-1]); > > > +} > > > > This has exactly one caller. I would just put this function in the .c > > file where it's called. > > I'll resubmit a v2 of this patch later with this change and the enum > change. > the reason this was made into a function and put inside the header file was because paths weren't accessed directly within mad.c. If you are going to do what Roland is suggesting, then why have a function? Why not just stick it in-line as I had before. Thanks, Suri From rdreier at cisco.com Thu Mar 29 11:19:52 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 11:19:52 -0700 Subject: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switch support In-Reply-To: <012d01c7722e$59938290$1914a8c0@surioffice> (Suresh Shelvapille's message of "Thu, 29 Mar 2007 14:16:14 -0400") References: <1174949633.4372.3731.camel@hal.voltaire.com> <1175194018.4379.79820.camel@hal.voltaire.com> <012d01c7722e$59938290$1914a8c0@surioffice> Message-ID: > the reason this was made into a function and put inside the header file was because > paths weren't accessed directly within mad.c. > > If you are going to do what Roland is suggesting, then why have a function? Why not > just stick it in-line as I had before. The function name makes it somewhat self-documenting. But I agree that just putting it in-line (maybe with a comment) would be fine too. On the other hand if you want to keep all the SMP-specific stuff out of mad.c that makes sense too. - R. From halr at voltaire.com Thu Mar 29 12:19:55 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Mar 2007 14:19:55 -0500 Subject: [ofa-general] RE: [PATCH] IB/core: Enhance SMI for switch support In-Reply-To: <012d01c7722e$59938290$1914a8c0@surioffice> References: <1174949633.4372.3731.camel@hal.voltaire.com> <1175194018.4379.79820.camel@hal.voltaire.com> <012d01c7722e$59938290$1914a8c0@surioffice> Message-ID: <1175195993.4379.81818.camel@hal.voltaire.com> On Thu, 2007-03-29 at 13:16, Suresh Shelvapille wrote: > > -----Original Message----- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Thursday, March 29, 2007 2:47 PM > > To: Roland Dreier > > Cc: general at lists.openfabrics.org; Sean Hefty; Suresh Shelvapille > > Subject: Re: [PATCH] IB/core: Enhance SMI for switch support > > > > On Thu, 2007-03-29 at 12:12, Roland Dreier wrote: > > > > + retsmi = smi_check_forward_dr_smp(&recv->mad.smp); > > > > + if (!retsmi) > > > > goto local; > > > > - if (!smi_handle_dr_smp_send(&recv->mad.smp, > > > > - port_priv->device->node_type, > > > > - port_priv->port_num)) > > > > - goto out; > > > > - if (!smi_check_local_smp(&recv->mad.smp, port_priv->device)) > > > > + > > > > + if (retsmi == 1) { /* don't forward */ > > > > > > > /* > > > > * Return 1 if the received DR SMP should be forwarded to the send queue > > > > * Return 0 if the SMP should be completed up the stack > > > > + * Return 2 if the SMP should be forwarded (for switches only) > > > > */ > > > > int smi_check_forward_dr_smp(struct ib_smp *smp) > > > > > > I think this has now crossed the line where these magic return values > > > should be named enums instead. > > > > OK; I'll make these into enums. > > > > > Especially the "if (!retsmi)" is very > > > hard to follow. > > > > Is it hard to follow ? > > > > > > +/* > > > > + * Return the forwarding port number from initial_path for outgoing SMP and > > > > + * from return_path for returning SMP > > > > + */ > > > > +static inline int smi_get_fwd_port(struct ib_smp *smp) > > > > +{ > > > > + return (!ib_get_smp_direction(smp) ? smp->initial_path[smp->hop_ptr+1] : > > > > + smp->return_path[smp->hop_ptr-1]); > > > > +} > > > > > > This has exactly one caller. I would just put this function in the .c > > > file where it's called. > > > > I'll resubmit a v2 of this patch later with this change and the enum > > change. > > > > the reason this was made into a function and put inside the header file was because > paths weren't accessed directly within mad.c. > > If you are going to do what Roland is suggesting, then why have a function? Why not > just stick it in-line as I had before. Because this is a SMI function and touches the DR path. Nothing in mad.c currently directly touches the DR paths without using a SMI function. -- Hal > Thanks, > Suri > From rdreier at cisco.com Thu Mar 29 11:26:20 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 11:26:20 -0700 Subject: [ofa-general] Re: [PATCH] IB/core/user_mad.c: Add support for issmdisabled In-Reply-To: <1175042747.4372.104218.camel@hal.voltaire.com> (Hal Rosenstock's message of "27 Mar 2007 19:45:48 -0500") References: <1175042747.4372.104218.camel@hal.voltaire.com> Message-ID: > -static DECLARE_BITMAP(dev_map, IB_UMAD_MAX_PORTS * 2); > +static DECLARE_BITMAP(dev_map, IB_UMAD_MAX_PORTS * 3); I don't see any reason for this change -- in fact the "* 2" looks buggy to me. Probably a historical relic -- the only access to the dev_map bitmap that I see uses at most IB_UMAD_MAX_PORTS bits. > + struct ib_port_modify props = { > + .set_port_cap_mask = IB_PORT_SM_DISABLED > + }; this could be static const -- I see that this is cut-and-pasted, so maybe we should clean up the other code first. In fact... the whole ib_umad_smdis_open() and ib_umad_smdis_close() functions are nearly exactly the same as the ib_umad_sm_open() and ib_umad_sm_close() functions. I think we need to avoid the code duplication and use the same functions for both IsSM and IsSMDisabled files. - R. From suri at baymicrosystems.com Thu Mar 29 11:27:04 2007 From: suri at baymicrosystems.com (Suresh Shelvapille) Date: Thu, 29 Mar 2007 14:27:04 -0400 Subject: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switch support In-Reply-To: References: <1174949633.4372.3731.camel@hal.voltaire.com><1175194018.4379.79820.camel@hal.voltaire.com> Message-ID: <012e01c7722f$dd855280$1914a8c0@surioffice> Roland: None of the functions in smi.c follow your definition. 0 is used to say discard packet and 1 for completion up the stack. So, I am not sure if reworking this one function with 3 return values buys anything. Thanks, Suri > -----Original Message----- > From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf > Of Roland Dreier > Sent: Thursday, March 29, 2007 1:57 PM > To: Hal Rosenstock > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switch support > > > > Especially the "if (!retsmi)" is very > > > hard to follow. > > > > Is it hard to follow ? > > It doesn't follow the convention of returning 0 if success, non-zero > if failure -- the only way I could know that !retsmi means something > other than "success" is if I go read the comment that tells me "0 if > the SMP should be completed up the stack," and there's no reason why I > would go search for that comment. > > - R. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Thu Mar 29 11:27:30 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 11:27:30 -0700 Subject: [ofa-general] RE: [PATCH] IB/core: Enhance SMI for switch support In-Reply-To: <1175195993.4379.81818.camel@hal.voltaire.com> (Hal Rosenstock's message of "29 Mar 2007 14:19:55 -0500") References: <1174949633.4372.3731.camel@hal.voltaire.com> <1175194018.4379.79820.camel@hal.voltaire.com> <012d01c7722e$59938290$1914a8c0@surioffice> <1175195993.4379.81818.camel@hal.voltaire.com> Message-ID: > Because this is a SMI function and touches the DR path. Nothing in mad.c > currently directly touches the DR paths without using a SMI function. I didn't realize that guideline. I think that makes it reasonable to put it in smi.h. - R. From rdreier at cisco.com Thu Mar 29 11:30:25 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 11:30:25 -0700 Subject: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switch support In-Reply-To: <012e01c7722f$dd855280$1914a8c0@surioffice> (Suresh Shelvapille's message of "Thu, 29 Mar 2007 14:27:04 -0400") References: <1174949633.4372.3731.camel@hal.voltaire.com> <1175194018.4379.79820.camel@hal.voltaire.com> <012e01c7722f$dd855280$1914a8c0@surioffice> Message-ID: > None of the functions in smi.c follow your definition. > 0 is used to say discard packet and 1 for completion up the stack. > So, I am not sure if reworking this one function with 3 return values buys > anything. Good point, I didn't look closely at smi.c. I think reworking all the smi.c return values with explicit IB_SMI_DISCARD etc return values would make the code much easier to understand. Probably doing that as a separate patch before adding the switch stuff would be a good idea. - R. From mst at dev.mellanox.co.il Thu Mar 29 11:35:06 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Mar 2007 20:35:06 +0200 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: <20070323092234.GG17532@mellanox.co.il> <20070328184411.GC4253@mellanox.co.il> Message-ID: <20070329183506.GB5436@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [ofa-general] Re: [GIT PULL] please pull infiniband.git > > > > I'm still thinking about synchronizing with the completion EQ's irq. > > > > Let's discuss this? Can you formulate what's bothering you? > > I don't like the fact that it's very hard to even write down exactly > what we're guaranteeing. So it's not clear to me that other low-level > drivers statisfy the constraint (since I'm not sure what the > constraint is). Here is the rule. It should be guaranteed that no event handlers - completion events or async events - for this QP - are in progress after QP has been destroyed or moved to reset state. Does this make sense? > Also it just helps with one particular case of a class of bugs -- > there are many other ways that polling a CQ could fail to synchronize > with destroying a QP. I'd rather try to avoid the whole class of bugs. Agreed. I think the rest of ULPs are taken care of if we replace qp pointer in ib_wc with qpn + qp_context pair. Have you seen this patch? This way a ULP can stick a pointer to its own structure and in polling thread, do: foo = wc.qp_context; if (unlikely(foo->dead)) return; and to destroy the QP: foo->dead = 1; ib_destroy_qp(qp) flush(polling thread) free(foo) -- MST From rdreier at cisco.com Thu Mar 29 11:40:51 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 11:40:51 -0700 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: <20070329183506.GB5436@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 29 Mar 2007 20:35:06 +0200") References: <20070323092234.GG17532@mellanox.co.il> <20070328184411.GC4253@mellanox.co.il> <20070329183506.GB5436@mellanox.co.il> Message-ID: > It should be guaranteed that no event handlers - completion events or async events - > for this QP - are in progress after QP has been destroyed or moved to reset > state. > > Does this make sense? No, because completion handlers are not attached to QPs. So it is entirely possible for CQs to generate new events because of other QPs and have completion handlers running at any time. You end up trying to say something about visibility of CQEs for the QP being destroyed in completion handler context, and I think it turns into a confusing mess. > Agreed. > I think the rest of ULPs are taken care of if we replace > qp pointer in ib_wc with qpn + qp_context pair. > Have you seen this patch? > > This way a ULP can stick a pointer to its own structure > and in polling thread, do: > > foo = wc.qp_context; > if (unlikely(foo->dead)) > return; > > and to destroy the QP: > foo->dead = 1; > ib_destroy_qp(qp) > flush(polling thread) > free(foo) It makes some sense. But we hit the problem that there is no way to flush completion handlers right now. Which is why I can't reject the synchronize_irq for completion irq change -- exposing some sort of "sync completion handlers" API seems error-prone too. - R. From halr at voltaire.com Thu Mar 29 12:47:58 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Mar 2007 14:47:58 -0500 Subject: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switch support In-Reply-To: <012e01c7722f$dd855280$1914a8c0@surioffice> References: <1174949633.4372.3731.camel@hal.voltaire.com> <1175194018.4379.79820.camel@hal.voltaire.com> <012e01c7722f$dd855280$1914a8c0@surioffice> Message-ID: <1175197676.4379.83593.camel@hal.voltaire.com> On Thu, 2007-03-29 at 13:27, Suresh Shelvapille wrote: > Roland: > > None of the functions in smi.c follow your definition. > 0 is used to say discard packet and 1 for completion up the stack. I don't think that is quite right in what the meaning is but not sure it matters in terms of what is being discussed: /* * Return 1 if the received DR SMP should be forwarded to the send queue * Return 0 if the SMP should be completed up the stack */ Also, in mad.c, 0 is treated as a local SMP and the driver/hardware is given the right of first refusal. This is completion up the stack. 1 means that the SMP should be forwarded to the send queue. If some SMI updates and checks fail on this, it is then discarded before passed to the driver/hardware. > So, I am not sure if reworking this one function with 3 return values buys > anything. I'm not following what you mean. It already has 3 return values. -- Hal > Thanks, > Suri > > > -----Original Message----- > > From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf > > Of Roland Dreier > > Sent: Thursday, March 29, 2007 1:57 PM > > To: Hal Rosenstock > > Cc: general at lists.openfabrics.org > > Subject: Re: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switch support > > > > > > Especially the "if (!retsmi)" is very > > > > hard to follow. > > > > > > Is it hard to follow ? > > > > It doesn't follow the convention of returning 0 if success, non-zero > > if failure -- the only way I could know that !retsmi means something > > other than "success" is if I go read the comment that tells me "0 if > > the SMP should be completed up the stack," and there's no reason why I > > would go search for that comment. > > > > - R. > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Thu Mar 29 12:50:58 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Mar 2007 14:50:58 -0500 Subject: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switch support In-Reply-To: References: <1174949633.4372.3731.camel@hal.voltaire.com> <1175194018.4379.79820.camel@hal.voltaire.com> <012e01c7722f$dd855280$1914a8c0@surioffice> Message-ID: <1175197857.4379.83765.camel@hal.voltaire.com> On Thu, 2007-03-29 at 13:30, Roland Dreier wrote: > > None of the functions in smi.c follow your definition. > > 0 is used to say discard packet and 1 for completion up the stack. > > > So, I am not sure if reworking this one function with 3 return values buys > > anything. > > Good point, I didn't look closely at smi.c. I think reworking all the > smi.c return values with explicit IB_SMI_DISCARD etc return values > would make the code much easier to understand. Probably doing that as > a separate patch before adding the switch stuff would be a good idea. Rather than IB_SMI_DISCARD, it seems to me that IB_SMI_LOCAL and IB_SMI_SEND would be more in keeping with the current comments. Is a separate patch for this along these lines really needed before the switch SMI changes ? -- Hal > - R. From mst at dev.mellanox.co.il Thu Mar 29 11:57:09 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Mar 2007 20:57:09 +0200 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: <20070323092234.GG17532@mellanox.co.il> <20070328184411.GC4253@mellanox.co.il> <20070329183506.GB5436@mellanox.co.il> Message-ID: <20070329185709.GC5436@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [ofa-general] Re: [GIT PULL] please pull infiniband.git > > > It should be guaranteed that no event handlers - completion events or async events - > > for this QP - are in progress after QP has been destroyed or moved to reset > > state. > > > > Does this make sense? > > No, because completion handlers are not attached to QPs. So it is > entirely possible for CQs to generate new events because of other QPs > and have completion handlers running at any time. You end up trying > to say something about visibility of CQEs for the QP being destroyed > in completion handler context, and I think it turns into a confusing > mess. > > > Agreed. > > I think the rest of ULPs are taken care of if we replace > > qp pointer in ib_wc with qpn + qp_context pair. > > Have you seen this patch? > > > > This way a ULP can stick a pointer to its own structure > > and in polling thread, do: > > > > foo = wc.qp_context; > > if (unlikely(foo->dead)) > > return; > > > > and to destroy the QP: > > foo->dead = 1; > > ib_destroy_qp(qp) > > flush(polling thread) > > free(foo) > > It makes some sense. But we hit the problem that there is no way to > flush completion handlers right now. Which is why I can't reject the > synchronize_irq for completion irq change -- exposing some sort of > "sync completion handlers" API seems error-prone too. Exactly, that's why this sync belongs in QP reset. I get your point now. -- MST From rdreier at cisco.com Thu Mar 29 11:57:06 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 11:57:06 -0700 Subject: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switch support In-Reply-To: <1175197857.4379.83765.camel@hal.voltaire.com> (Hal Rosenstock's message of "29 Mar 2007 14:50:58 -0500") References: <1174949633.4372.3731.camel@hal.voltaire.com> <1175194018.4379.79820.camel@hal.voltaire.com> <012e01c7722f$dd855280$1914a8c0@surioffice> <1175197857.4379.83765.camel@hal.voltaire.com> Message-ID: > Rather than IB_SMI_DISCARD, it seems to me that IB_SMI_LOCAL and > IB_SMI_SEND would be more in keeping with the current comments. OK ... I guess the point is that exactly what those functions are returning is rather obscure at the moment. > Is a separate patch for this along these lines really needed before the > switch SMI changes ? Not necessarily, but I don't think changing a function to return 0, 1 or 2 is a good idea. So at least please fix that when adding switch SMI support. - R. From halr at voltaire.com Thu Mar 29 13:03:04 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Mar 2007 15:03:04 -0500 Subject: [ofa-general] Re: [PATCH] IB/core/user_mad.c: Add support for issmdisabled In-Reply-To: References: <1175042747.4372.104218.camel@hal.voltaire.com> Message-ID: <1175198581.4379.84529.camel@hal.voltaire.com> On Thu, 2007-03-29 at 13:26, Roland Dreier wrote: > > -static DECLARE_BITMAP(dev_map, IB_UMAD_MAX_PORTS * 2); > > +static DECLARE_BITMAP(dev_map, IB_UMAD_MAX_PORTS * 3); > > I don't see any reason for this change -- in fact the "* 2" looks > buggy to me. Probably a historical relic -- the only access to the > dev_map bitmap that I see uses at most IB_UMAD_MAX_PORTS bits. Are you sure ? Don't issm files need a duplicated set of dev_map bits ? > > > + struct ib_port_modify props = { > > + .set_port_cap_mask = IB_PORT_SM_DISABLED > > + }; > > this could be static const -- I see that this is cut-and-pasted, Yes, largely. > so maybe we should clean up the other code first. By clean up, (aside from static const), is there more than what you indicate below in making issm and issmdisabled share the same open/close routines ? -- Hal > In fact... the whole ib_umad_smdis_open() and ib_umad_smdis_close() > functions are nearly exactly the same as the ib_umad_sm_open() and > ib_umad_sm_close() functions. I think we need to avoid the code > duplication and use the same functions for both IsSM and IsSMDisabled > files. > - R. From suri at baymicrosystems.com Thu Mar 29 12:05:54 2007 From: suri at baymicrosystems.com (Suresh Shelvapille) Date: Thu, 29 Mar 2007 15:05:54 -0400 Subject: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switchsupport In-Reply-To: <1175197857.4379.83765.camel@hal.voltaire.com> References: <1174949633.4372.3731.camel@hal.voltaire.com> <1175194018.4379.79820.camel@hal.voltaire.com> <012e01c7722f$dd855280$1914a8c0@surioffice> <1175197857.4379.83765.camel@hal.voltaire.com> Message-ID: <012f01c77235$4a3d1a20$1914a8c0@surioffice> Hal: You are just looking at function smi_check_forward_dr_smp(). Take a look at what smi_handle_dr_smp_send() and smi_handle_dr_smp_recv() return. In these two functions 0= discard, 1=process. This is what we were referring to. If we are fixing the return codes to enums for smi_check_forward_dr_smp() function, may be enum names can be made generic enough so that the other two functions could use the enums as well? Anyway, you guys are better judges of these issue... Thanks, Suri > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Thursday, March 29, 2007 3:51 PM > To: Roland Dreier > Cc: Suresh Shelvapille; general at lists.openfabrics.org > Subject: Re: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switchsupport > > On Thu, 2007-03-29 at 13:30, Roland Dreier wrote: > > > None of the functions in smi.c follow your definition. > > > 0 is used to say discard packet and 1 for completion up the stack. > > > > > So, I am not sure if reworking this one function with 3 return values buys > > > anything. > > > > Good point, I didn't look closely at smi.c. I think reworking all the > > smi.c return values with explicit IB_SMI_DISCARD etc return values > > would make the code much easier to understand. Probably doing that as > > a separate patch before adding the switch stuff would be a good idea. > > Rather than IB_SMI_DISCARD, it seems to me that IB_SMI_LOCAL and > IB_SMI_SEND would be more in keeping with the current comments. > > Is a separate patch for this along these lines really needed before the > switch SMI changes ? > > -- Hal > > > - R. From mst at dev.mellanox.co.il Thu Mar 29 12:06:44 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Mar 2007 21:06:44 +0200 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: <20070323092234.GG17532@mellanox.co.il> <20070328184411.GC4253@mellanox.co.il> <20070329183506.GB5436@mellanox.co.il> Message-ID: <20070329190643.GD5436@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [ofa-general] Re: [GIT PULL] please pull infiniband.git > > > It should be guaranteed that no event handlers - completion events or async events - > > for this QP - are in progress after QP has been destroyed or moved to reset > > state. > > > > Does this make sense? > > No, because completion handlers are not attached to QPs. So it is > entirely possible for CQs to generate new events because of other QPs > and have completion handlers running at any time. You end up trying > to say something about visibility of CQEs for the QP being destroyed > in completion handler context, and I think it turns into a confusing > mess. We have a rule that we must clean CQEs after QP is moved to RESET, do we not? How is it formulated? -- MST From rdreier at cisco.com Thu Mar 29 12:10:57 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 12:10:57 -0700 Subject: [ofa-general] Re: [PATCH] IB/core/user_mad.c: Add support for issmdisabled In-Reply-To: <1175198581.4379.84529.camel@hal.voltaire.com> (Hal Rosenstock's message of "29 Mar 2007 15:03:04 -0500") References: <1175042747.4372.104218.camel@hal.voltaire.com> <1175198581.4379.84529.camel@hal.voltaire.com> Message-ID: > > I don't see any reason for this change -- in fact the "* 2" looks > > buggy to me. Probably a historical relic -- the only access to the > > dev_map bitmap that I see uses at most IB_UMAD_MAX_PORTS bits. > > Are you sure ? Don't issm files need a duplicated set of dev_map bits ? Well, read the code. The only use of dev_map that I can find is port->dev_num = find_first_zero_bit(dev_map, IB_UMAD_MAX_PORTS); which doesn't look past bit # IB_UMAD_MAX_PORTS. Am I missing something? > By clean up, (aside from static const), is there more than what you > indicate below in making issm and issmdisabled share the same open/close > routines ? No, I just meant convert that to static const. And unify the open/close routines. Although maybe the cleanest way to unify the code is to leave the properties on the stack and set it up before the call to ib_modify_port() depending on which file is being accessed. - R. From halr at voltaire.com Thu Mar 29 13:12:14 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Mar 2007 15:12:14 -0500 Subject: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switchsupport In-Reply-To: <012f01c77235$4a3d1a20$1914a8c0@surioffice> References: <1174949633.4372.3731.camel@hal.voltaire.com> <1175194018.4379.79820.camel@hal.voltaire.com> <012e01c7722f$dd855280$1914a8c0@surioffice> <1175197857.4379.83765.camel@hal.voltaire.com> <012f01c77235$4a3d1a20$1914a8c0@surioffice> Message-ID: <1175199133.4379.85124.camel@hal.voltaire.com> Suri, On Thu, 2007-03-29 at 14:05, Suresh Shelvapille wrote: > Hal: > > You are just looking at function smi_check_forward_dr_smp(). > > Take a look at what smi_handle_dr_smp_send() and smi_handle_dr_smp_recv() return. > In these two functions 0= discard, 1=process. This is what we were referring to. I see what you are referring to now. That's true for the other routines but unfortunately not this one. > If we are fixing the return codes to enums for smi_check_forward_dr_smp() function, > may be enum names can be made generic enough so that the other two functions could use the > enums as well? Not sure what the one set of names would be: discard != local and process != send Two sets of names (enums) could do it though. If this is what is to be done then it should be 2 patches with the first preserving the current CA/router only support with the enums and the second adding in switch SMI. -- Hal > Anyway, you guys are better judges of these issue... > Thanks, > Suri > > > > -----Original Message----- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Thursday, March 29, 2007 3:51 PM > > To: Roland Dreier > > Cc: Suresh Shelvapille; general at lists.openfabrics.org > > Subject: Re: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switchsupport > > > > On Thu, 2007-03-29 at 13:30, Roland Dreier wrote: > > > > None of the functions in smi.c follow your definition. > > > > 0 is used to say discard packet and 1 for completion up the stack. > > > > > > > So, I am not sure if reworking this one function with 3 return values buys > > > > anything. > > > > > > Good point, I didn't look closely at smi.c. I think reworking all the > > > smi.c return values with explicit IB_SMI_DISCARD etc return values > > > would make the code much easier to understand. Probably doing that as > > > a separate patch before adding the switch stuff would be a good idea. > > > > Rather than IB_SMI_DISCARD, it seems to me that IB_SMI_LOCAL and > > IB_SMI_SEND would be more in keeping with the current comments. > > > > Is a separate patch for this along these lines really needed before the > > switch SMI changes ? > > > > -- Hal > > > > > - R. > From rdreier at cisco.com Thu Mar 29 12:20:31 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 12:20:31 -0700 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: <20070329190643.GD5436@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 29 Mar 2007 21:06:44 +0200") References: <20070323092234.GG17532@mellanox.co.il> <20070328184411.GC4253@mellanox.co.il> <20070329183506.GB5436@mellanox.co.il> <20070329190643.GD5436@mellanox.co.il> Message-ID: > We have a rule that we must clean CQEs after QP is moved to RESET, > do we not? How is it formulated? Thanks for bringing that up... I consider the CQ cleaning to be an internal mthca implementation detail, to avoid having to handle CQEs with a stale QPN after we free a QP. The IB spec basically says that the status of completions for a QP is undefined: "The CI does not guarantee that CQEs generated for a QP prior to its destruction can be retrieved from the CQ after that QP has been destroyed." So now that you bring it up, I think it would be OK for a low-level driver to report completions for a QP even after the QP is destroyed. And yes that does make putting the QP pointer in the ib_wc structure a mistake... - R. From swise at opengridcomputing.com Thu Mar 29 12:21:48 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 29 Mar 2007 14:21:48 -0500 Subject: [ofa-general] [PATCH 2.6.21] iw_cxgb3: Fix TERM codes. Message-ID: <1175196108.23273.35.camel@stevo-desktop> Fix TERM codes. Fix TERMINATE layer, type, and ecode values based on conformance testing. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_qp.c | 69 ++++++++++++++++++--------------- 1 files changed, 38 insertions(+), 31 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index 0a472c9..714dddb 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -471,43 +471,62 @@ int iwch_bind_mw(struct ib_qp *qp, return err; } -static void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, int tagged) +static inline void build_term_codes(struct respQ_msg_t *rsp_msg, + u8 *layer_type, u8 *ecode) { - switch (t3err) { + int status = TPT_ERR_INTERNAL_ERR; + int tagged = 0; + int opcode = -1; + int rqtype = 0; + int send_inv = 0; + + if (rsp_msg) { + status = CQE_STATUS(rsp_msg->cqe); + opcode = CQE_OPCODE(rsp_msg->cqe); + rqtype = RQ_TYPE(rsp_msg->cqe); + send_inv = (opcode == T3_SEND_WITH_INV) || + (opcode == T3_SEND_WITH_SE_INV); + tagged = (opcode == T3_RDMA_WRITE) || + (rqtype && (opcode == T3_READ_RESP)); + } + + switch (status) { case TPT_ERR_STAG: - if (tagged == 1) { - *layer_type = LAYER_DDP|DDP_TAGGED_ERR; - *ecode = DDPT_INV_STAG; - } else if (tagged == 2) { + if (send_inv) { + *layer_type = LAYER_RDMAP|RDMAP_REMOTE_OP; + *ecode = RDMAP_CANT_INV_STAG; + } else { *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT; *ecode = RDMAP_INV_STAG; } break; case TPT_ERR_PDID: + *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT; + if ((opcode == T3_SEND_WITH_INV) || + (opcode == T3_SEND_WITH_SE_INV)) + *ecode = RDMAP_CANT_INV_STAG; + else + *ecode = RDMAP_STAG_NOT_ASSOC; + break; case TPT_ERR_QPID: + *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT; + *ecode = RDMAP_STAG_NOT_ASSOC; + break; case TPT_ERR_ACCESS: - if (tagged == 1) { - *layer_type = LAYER_DDP|DDP_TAGGED_ERR; - *ecode = DDPT_STAG_NOT_ASSOC; - } else if (tagged == 2) { - *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT; - *ecode = RDMAP_STAG_NOT_ASSOC; - } + *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT; + *ecode = RDMAP_ACC_VIOL; break; case TPT_ERR_WRAP: *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT; *ecode = RDMAP_TO_WRAP; break; case TPT_ERR_BOUND: - if (tagged == 1) { + if (tagged) { *layer_type = LAYER_DDP|DDP_TAGGED_ERR; *ecode = DDPT_BASE_BOUNDS; - } else if (tagged == 2) { + } else { *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT; *ecode = RDMAP_BASE_BOUNDS; - } else { - *layer_type = LAYER_DDP|DDP_UNTAGGED_ERR; - *ecode = DDPU_MSG_TOOBIG; } break; case TPT_ERR_INVALIDATE_SHARED_MR: @@ -591,8 +610,6 @@ int iwch_post_terminate(struct iwch_qp * { union t3_wr *wqe; struct terminate_message *term; - int status; - int tagged = 0; struct sk_buff *skb; PDBG("%s %d\n", __FUNCTION__, __LINE__); @@ -610,17 +627,7 @@ int iwch_post_terminate(struct iwch_qp * /* immediate data starts here. */ term = (struct terminate_message *)wqe->send.sgl; - if (rsp_msg) { - status = CQE_STATUS(rsp_msg->cqe); - if (CQE_OPCODE(rsp_msg->cqe) == T3_RDMA_WRITE) - tagged = 1; - if ((CQE_OPCODE(rsp_msg->cqe) == T3_READ_REQ) || - (CQE_OPCODE(rsp_msg->cqe) == T3_READ_RESP)) - tagged = 2; - } else { - status = TPT_ERR_INTERNAL_ERR; - } - build_term_codes(status, &term->layer_etype, &term->ecode, tagged); + build_term_codes(rsp_msg, &term->layer_etype, &term->ecode); build_fw_riwrh((void *)wqe, T3_WR_SEND, T3_COMPLETION_FLAG | T3_NOTIFY_FLAG, 1, qhp->ep->hwtid, 5); From rdreier at cisco.com Thu Mar 29 12:22:15 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 12:22:15 -0700 Subject: [ofa-general] Re: [PATCH] IB/core: Enhance SMI for switchsupport In-Reply-To: <1175199133.4379.85124.camel@hal.voltaire.com> (Hal Rosenstock's message of "29 Mar 2007 15:12:14 -0500") References: <1174949633.4372.3731.camel@hal.voltaire.com> <1175194018.4379.79820.camel@hal.voltaire.com> <012e01c7722f$dd855280$1914a8c0@surioffice> <1175197857.4379.83765.camel@hal.voltaire.com> <012f01c77235$4a3d1a20$1914a8c0@surioffice> <1175199133.4379.85124.camel@hal.voltaire.com> Message-ID: > I see what you are referring to now. That's true for the other routines > but unfortunately not this one. OK, that makes the current status even more confusing. > Not sure what the one set of names would be: > discard != local and process != send > > Two sets of names (enums) could do it though. Yes, if the two return values have distinct semantics then they should be using separate enums to indicate that. > If this is what is to be done then it should be 2 patches with the first > preserving the current CA/router only support with the enums and the > second adding in switch SMI. Please, let's do this now, since we're in the area. If we don't clean up the code now it will slip down the priority list again and probably never get done. - R. From halr at voltaire.com Thu Mar 29 14:17:21 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Mar 2007 16:17:21 -0500 Subject: [ofa-general] [PATCH][MINOR] OpenSM: Handle conf file open failures better Message-ID: <1175203006.4379.89200.camel@hal.voltaire.com> OpenSM: Handle conf file open failures better Signed-off-by: Hal Rosenstock diff --git a/osm/include/opensm/osm_subnet.h b/osm/include/opensm/osm_subnet.h index ade73ac..fc52b5e 100644 --- a/osm/include/opensm/osm_subnet.h +++ b/osm/include/opensm/osm_subnet.h @@ -1088,7 +1088,7 @@ osm_subn_set_default_opt( * * SYNOPSIS */ -void +ib_api_status_t osm_subn_parse_conf_file( IN osm_subn_opt_t* const p_opt ); /* @@ -1098,7 +1098,7 @@ osm_subn_parse_conf_file( * [in] Pointer to the subnet options structure. * * RETURN VALUES -* None +* IB_SUCCESS, IB_ERROR * * NOTES * Assumes the conf file is part of the cache dir which defaults to @@ -1118,7 +1118,7 @@ osm_subn_parse_conf_file( * * SYNOPSIS */ -void +ib_api_status_t osm_subn_rescan_conf_file( IN osm_subn_opt_t* const p_opts ); /* @@ -1128,7 +1128,7 @@ osm_subn_rescan_conf_file( * [in] Pointer to the subnet options structure. * * RETURN VALUES -* None +* IB_SUCCESS, IB_ERROR * * NOTES * This uses the same file as osm_subn_parse_conf_file() @@ -1144,7 +1144,7 @@ osm_subn_rescan_conf_file( * * SYNOPSIS */ -void +ib_api_status_t osm_subn_write_conf_file( IN osm_subn_opt_t* const p_opt ); /* @@ -1154,7 +1154,7 @@ osm_subn_write_conf_file( * [in] Pointer to the subnet options structure. * * RETURN VALUES -* None +* IB_SUCCESS, IB_ERROR * * NOTES * Assumes the conf file is part of the cache dir which defaults to diff --git a/osm/opensm/main.c b/osm/opensm/main.c index a3f892b..5fb58eb 100644 --- a/osm/opensm/main.c +++ b/osm/opensm/main.c @@ -651,7 +651,8 @@ main( printf("%s\n", OSM_VERSION); osm_subn_set_default_opt(&opt); - osm_subn_parse_conf_file(&opt); + if (osm_subn_parse_conf_file(&opt) != IB_SUCCESS) + printf("\nosm_subn_parse_conf_file failed!\n"); printf("Command Line Arguments:\n"); do @@ -969,7 +970,12 @@ main( } if ( cache_options == TRUE ) - osm_subn_write_conf_file( &opt ); + { + if (osm_subn_write_conf_file( &opt ) != IB_SUCCESS) + { + printf( "\nosm_subn_write_conf_file failed!\n" ); + } + } status = osm_opensm_bind( &osm, opt.guid ); if( status != IB_SUCCESS ) diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c index 8061231..196026e 100644 --- a/osm/opensm/osm_state_mgr.c +++ b/osm/opensm/osm_state_mgr.c @@ -1931,8 +1931,13 @@ osm_state_mgr_process( p_mgr->p_subn->subnet_initialization_error = FALSE; /* rescan configuration updates */ - osm_subn_rescan_conf_file(&p_mgr->p_subn->opt); - + status = osm_subn_rescan_conf_file(&p_mgr->p_subn->opt); + if( status != IB_SUCCESS ) + { + osm_log( p_mgr->p_log, OSM_LOG_ERROR, + "osm_state_mgr_process: ERR 331A: " + "osm_subn_rescan_conf_file failed\n" ); + } status = __osm_state_mgr_sweep_hop_0( p_mgr ); if( status == IB_SUCCESS ) { diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c index 46315a5..746fbd1 100644 --- a/osm/opensm/osm_subnet.c +++ b/osm/opensm/osm_subnet.c @@ -732,7 +732,7 @@ subn_dump_qos_options( /********************************************************************** **********************************************************************/ -void +ib_api_status_t osm_subn_rescan_conf_file( IN osm_subn_opt_t* const p_opts ) { @@ -751,7 +751,7 @@ osm_subn_rescan_conf_file( opts_file = fopen(file_name, "r"); if (!opts_file) - return; + return IB_ERROR; while (fgets(line, 1023, opts_file) != NULL) { @@ -779,6 +779,8 @@ osm_subn_rescan_conf_file( } } fclose(opts_file); + + return IB_SUCCESS; } /********************************************************************** @@ -825,7 +827,7 @@ osm_subn_verify_conf_file( /********************************************************************** **********************************************************************/ -void +ib_api_status_t osm_subn_parse_conf_file( IN osm_subn_opt_t* const p_opts ) { @@ -844,7 +846,7 @@ osm_subn_parse_conf_file( opts_file = fopen(file_name, "r"); if (!opts_file) - return; + return IB_ERROR; while (fgets(line, 1023, opts_file) != NULL) { @@ -1090,11 +1092,13 @@ osm_subn_parse_conf_file( fclose(opts_file); osm_subn_verify_conf_file(p_opts); + + return IB_SUCCESS; } /********************************************************************** **********************************************************************/ -void +ib_api_status_t osm_subn_write_conf_file( IN osm_subn_opt_t* const p_opts ) { @@ -1111,7 +1115,7 @@ osm_subn_write_conf_file( opts_file = fopen(file_name, "w"); if (!opts_file) - return; + return IB_ERROR; fprintf( opts_file, @@ -1379,4 +1383,6 @@ osm_subn_write_conf_file( /* optional string attributes ... */ fclose(opts_file); + + return IB_SUCCESS; } From rowland at cse.ohio-state.edu Thu Mar 29 15:02:49 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Thu, 29 Mar 2007 18:02:49 -0400 Subject: [ofa-general] MVAPICH2 SRPM Update Message-ID: <460C3789.8090306@cse.ohio-state.edu> I updated the MVAPICH2 0.9.8 SRPM for the RC1 release. The current version is: mvapich2-0.9.8-9.src.rpm As discussed in the 3/26 conference call, I opened and closed a bug with details concerning this update since it was not related to any previously reported OFA bugs. Right now the MVAPICH code is in the release state. I've sent details to Pasha as he maintains the MVAPICH SRPM. I have not entered a bug entry for the previous update of the MVAPICH SRPM however. The current MVAPICH SRPM probably needs updated too in order to catch the last small change since its previous update. If that is necessary, perhaps Pasha can enter a historical bug entry for the last update (or just the current one). I am not clear on if a bug entry was necessary for the previous MVAPICH update. I can make an entry and close it, however since a new update is probably required anyway, perhaps it can be done at that time by Pasha. -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From mst at dev.mellanox.co.il Thu Mar 29 15:39:21 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 30 Mar 2007 00:39:21 +0200 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: <20070323092234.GG17532@mellanox.co.il> <20070328184411.GC4253@mellanox.co.il> <20070329183506.GB5436@mellanox.co.il> <20070329190643.GD5436@mellanox.co.il> Message-ID: <20070329223920.GE5436@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [ofa-general] Re: [GIT PULL] please pull infiniband.git > > > We have a rule that we must clean CQEs after QP is moved to RESET, > > do we not? How is it formulated? > > Thanks for bringing that up... > > I consider the CQ cleaning to be an internal mthca implementation > detail, to avoid having to handle CQEs with a stale QPN after we free > a QP. The IB spec basically says that the status of completions for a > QP is undefined: > > "The CI does not guarantee that CQEs generated for a QP prior to > its destruction can be retrieved from the CQ after that QP has > been destroyed." > > So now that you bring it up, I think it would be OK for a low-level > driver to report completions for a QP even after the QP is destroyed. > And yes that does make putting the QP pointer in the ib_wc structure a > mistake... Roland, I think you misunderstand this language. It simply means that CQEs can disappear if you reset the QP. If stale CQEs could come out, you would never be able to destroy a QP without destroying the CQ. -- MST From rdreier at cisco.com Thu Mar 29 15:54:31 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 15:54:31 -0700 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: <20070329223920.GE5436@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 30 Mar 2007 00:39:21 +0200") References: <20070323092234.GG17532@mellanox.co.il> <20070328184411.GC4253@mellanox.co.il> <20070329183506.GB5436@mellanox.co.il> <20070329190643.GD5436@mellanox.co.il> <20070329223920.GE5436@mellanox.co.il> Message-ID: > > "The CI does not guarantee that CQEs generated for a QP prior to > > its destruction can be retrieved from the CQ after that QP has > > been destroyed." > Roland, I think you misunderstand this language. > It simply means that CQEs can disappear if you reset the QP. Yes, _can_ disappear, not _must_ disappear. Which means that CQEs are allowed to be returned after the QP that generated them is gone. In fact the spec's language suggests that the normal situation is for CQEs to stick around after a QP is destroyed, but that the strange behavior (which mthca implements) of CQEs disappearing is also permitted. And in fact if I recall correctly, the Solaris stack goes through some crazy inefficiency exactly so that it can return CQEs for a QP after that QP is destroyed. > If stale CQEs could come out, you would never be able to > destroy a QP without destroying the CQ. Why? - R. From rdreier at cisco.com Thu Mar 29 15:58:24 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 15:58:24 -0700 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: (Roland Dreier's message of "Thu, 29 Mar 2007 15:54:31 -0700") References: <20070323092234.GG17532@mellanox.co.il> <20070328184411.GC4253@mellanox.co.il> <20070329183506.GB5436@mellanox.co.il> <20070329190643.GD5436@mellanox.co.il> <20070329223920.GE5436@mellanox.co.il> Message-ID: > Yes, _can_ disappear, not _must_ disappear. Which means that CQEs are > allowed to be returned after the QP that generated them is gone. In > fact the spec's language suggests that the normal situation is for > CQEs to stick around after a QP is destroyed, but that the strange > behavior (which mthca implements) of CQEs disappearing is also permitted. And indeed, look at the ipath and ehca destroy QP implementations. As far as I can tell, neither one cleans out stale CQEs. - R. From mst at dev.mellanox.co.il Thu Mar 29 16:03:59 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 30 Mar 2007 01:03:59 +0200 Subject: [ofa-general] CQEs for QP being destroyed/reset (was Re: [GIT PULL] please pull infiniband.git) In-Reply-To: References: <20070323092234.GG17532@mellanox.co.il> <20070328184411.GC4253@mellanox.co.il> <20070329183506.GB5436@mellanox.co.il> <20070329190643.GD5436@mellanox.co.il> Message-ID: <20070329230359.GF5436@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [ofa-general] Re: [GIT PULL] please pull infiniband.git > > > We have a rule that we must clean CQEs after QP is moved to RESET, > > do we not? How is it formulated? > > Thanks for bringing that up... > > I consider the CQ cleaning to be an internal mthca implementation > detail, to avoid having to handle CQEs with a stale QPN after we free > a QP. The IB spec basically says that the status of completions for a > QP is undefined: > > "The CI does not guarantee that CQEs generated for a QP prior to > its destruction can be retrieved from the CQ after that QP has > been destroyed." This as I mentioned earlier simply means some CQEs might disappear when you destroy the QP. And by the way, cleanup_cq in mthca does exactly that - removes CQEs that were generated prior to QP destruction. > So now that you bring it up, I think it would be OK for a low-level > driver to report completions for a QP even after the QP is destroyed. Thanks for bringing this up, and reading this part of the spec I found some interesting stuff. Actually, the IB spec clarifies why there might not be any issues and why the API we have might actually be a fine one: 10.2.4.4 It is good programming practice to modify the QP into the Error state and retrieve the relevant CQEs prior to destroying the QP. Destroying a QP does not guarantee that CQEs of that QP are deallocated from the CQ upon destruction. Even if the CQEs are already on the CQ, it might not be possible to retrieve them. It is good programming practice not to make any assumption on the number of CQEs in the CQ when destroying a QP. In order to avoid CQ overflow, it is recommended that all CQEs of the destroyed QP are retrieved from the CQ associated with it before resizing the CQ, attaching a new QP to the CQ or reopening the QP, if the CQ capacity is limited. So it seems the ticket is that ULP must not destroy QP without moving it to error and draining it first. > And yes that does make putting the QP pointer in the ib_wc structure a > mistake... On the contrary, since ULPs must wait till all WRs complete, the API we have is a fine one. But, IPoIB CM must be fixed to wait for all WRs to complete before destroying the QP. This is all dandy for send WRs. But I really have no idea how to interpret this text in the spec, and what does "the relevant CQEs" mean in the context of a receive queue connected to a SRQ. Can someone comment on this last point? -- MST From mst at dev.mellanox.co.il Thu Mar 29 16:08:55 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 30 Mar 2007 01:08:55 +0200 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: <20070323092234.GG17532@mellanox.co.il> <20070328184411.GC4253@mellanox.co.il> <20070329183506.GB5436@mellanox.co.il> <20070329190643.GD5436@mellanox.co.il> <20070329223920.GE5436@mellanox.co.il> Message-ID: <20070329230855.GG5436@mellanox.co.il> > > If stale CQEs could come out, you would never be able to > > destroy a QP without destroying the CQ. > > Why? Assume you stick a pointer in WR_ID. When is it safe to free the object? But if you look at the language in 10.2.4.4, you actually are not *supposed* to destroy a QP that has outstanding WRs. So for send side, it now seems the bug is in IPoIB - it should be fixed to drain the send queue rather than trying to destroy QP directly. However, it seems IB spec has a hole - the procedure outlined there can not work for SRQ. -- MST From rdreier at cisco.com Thu Mar 29 16:14:14 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 16:14:14 -0700 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: <20070329230855.GG5436@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 30 Mar 2007 01:08:55 +0200") References: <20070323092234.GG17532@mellanox.co.il> <20070328184411.GC4253@mellanox.co.il> <20070329183506.GB5436@mellanox.co.il> <20070329190643.GD5436@mellanox.co.il> <20070329223920.GE5436@mellanox.co.il> <20070329230855.GG5436@mellanox.co.il> Message-ID: > Assume you stick a pointer in WR_ID. When is it safe to free > the object? I thought it was clear -- when the work request completes. But I see that you saw that now... > However, it seems IB spec has a hole - the procedure outlined there > can not work for SRQ. I think the (ugly) solution that the IB spec authors had in mind is to transition the QP to the error state and wait for the "last WQE reached" affiliated event on that QP. - R. From halr at voltaire.com Thu Mar 29 17:13:56 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Mar 2007 19:13:56 -0500 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <20070329060858.GP4253@mellanox.co.il> References: <20070327100256.GL6661@mellanox.co.il> <20070328070514.GA8649@mellanox.co.il> <1175109415.4379.2292.camel@hal.voltaire.com> <20070328191223.GF4253@mellanox.co.il> <1175113065.4379.6150.camel@hal.voltaire.com> <20070328201251.GK4253@mellanox.co.il> <1175118260.4379.11551.camel@hal.voltaire.com> <20070329060858.GP4253@mellanox.co.il> Message-ID: <1175213622.4379.100253.camel@hal.voltaire.com> On Thu, 2007-03-29 at 01:08, Michael S. Tsirkin wrote: > > > > > I expect ibnetdiscover can do this, but was unable to grok > > > > > the output syntax. > > > > > > > > I'll explain once I have the answer to the above question. > > > > Search for "H-", where GUID in hex is the node GUID, in the > > output of ibnetdiscover. [1] to the right of it indicates it is port 1. > > > > So for example, > > Switch 24 "S-005442ba00003080" # "ISR9024 Voltaire" base port 0 lid 6 lmc 0 > > [22] "H-0008f10403961354"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 4 > > > > It is listed under the switch it is attached to and in the right hand > > side is the LID of the switch which in this case is 6. > > And how do I know the switch port here? It's the port on the left hand side (e.g. 22 in this case). > Hal, where does this syntax come from? Some legacy script? It originated with a proprietary tool a long time ago and has been enhanced from there to add additional information as time went on. > How about fixing this tool to provide a sane, I think it is sane output. > tabulated output, with > a top self-documenting header, and flags to select specific rows/colums? > > I envision something a la ps: > > type guid port remote_lid remote_port description I'd like to understand more of what you intend and why the current format is insufficient. > Would such a patch be accepted? Such a patch would need to include all the affected tools, not just ibnetdiscover or alternatively have an option for this new output format with the existing format as a default. I think the latter would be needed for backwards compatibility. -- Hal From mst at dev.mellanox.co.il Thu Mar 29 16:26:26 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 30 Mar 2007 01:26:26 +0200 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <1175213622.4379.100253.camel@hal.voltaire.com> References: <20070328070514.GA8649@mellanox.co.il> <1175109415.4379.2292.camel@hal.voltaire.com> <20070328191223.GF4253@mellanox.co.il> <1175113065.4379.6150.camel@hal.voltaire.com> <20070328201251.GK4253@mellanox.co.il> <1175118260.4379.11551.camel@hal.voltaire.com> <20070329060858.GP4253@mellanox.co.il> <1175213622.4379.100253.camel@hal.voltaire.com> Message-ID: <20070329232626.GI5436@mellanox.co.il> > > How about fixing this tool to provide a sane, > > I think it is sane output. It sure is 1. undocumented 2. hard to parse -- MST From mst at dev.mellanox.co.il Thu Mar 29 16:29:14 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 30 Mar 2007 01:29:14 +0200 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: <20070328184411.GC4253@mellanox.co.il> <20070329183506.GB5436@mellanox.co.il> <20070329190643.GD5436@mellanox.co.il> <20070329223920.GE5436@mellanox.co.il> <20070329230855.GG5436@mellanox.co.il> Message-ID: <20070329232914.GJ5436@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [ofa-general] Re: [GIT PULL] please pull infiniband.git > > > Assume you stick a pointer in WR_ID. When is it safe to free > > the object? > > I thought it was clear -- when the work request completes. But I see > that you saw that now... > > > However, it seems IB spec has a hole - the procedure outlined there > > can not work for SRQ. > > I think the (ugly) solution that the IB spec authors had in mind is to > transition the QP to the error state and wait for the "last WQE reached" > affiliated event on that QP. No, this does not work. The last WQE reached event is on SRQ, not on QP, and it will never occur if we repost WRs on SRQ as we should to make other QPs on the same SRQ continue to work. -- MST From halr at voltaire.com Thu Mar 29 17:26:12 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Mar 2007 19:26:12 -0500 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <20070329232626.GI5436@mellanox.co.il> References: <20070328070514.GA8649@mellanox.co.il> <1175109415.4379.2292.camel@hal.voltaire.com> <20070328191223.GF4253@mellanox.co.il> <1175113065.4379.6150.camel@hal.voltaire.com> <20070328201251.GK4253@mellanox.co.il> <1175118260.4379.11551.camel@hal.voltaire.com> <20070329060858.GP4253@mellanox.co.il> <1175213622.4379.100253.camel@hal.voltaire.com> <20070329232626.GI5436@mellanox.co.il> Message-ID: <1175214370.4379.101009.camel@hal.voltaire.com> On Thu, 2007-03-29 at 18:26, Michael S. Tsirkin wrote: > > > How about fixing this tool to provide a sane, > > > > I think it is sane output. > > It sure is > 1. undocumented > 2. hard to parse How much time did you spend trying to understand it ? Quite a number of others have figured this out. -- Hal From rdreier at cisco.com Thu Mar 29 16:32:41 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 16:32:41 -0700 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: <20070329232914.GJ5436@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 30 Mar 2007 01:29:14 +0200") References: <20070328184411.GC4253@mellanox.co.il> <20070329183506.GB5436@mellanox.co.il> <20070329190643.GD5436@mellanox.co.il> <20070329223920.GE5436@mellanox.co.il> <20070329230855.GG5436@mellanox.co.il> <20070329232914.GJ5436@mellanox.co.il> Message-ID: > > I think the (ugly) solution that the IB spec authors had in mind is to > > transition the QP to the error state and wait for the "last WQE reached" > > affiliated event on that QP. > No, this does not work. > The last WQE reached event is on SRQ, not on QP, and it will never occur if we > repost WRs on SRQ as we should to make other QPs on the same SRQ continue to > work. Look at the spec again. The last WQE reached event is definitely affiliated with a QP (not an SRQ) and exists exactly to solve the problem we're talking about. - R. From mst at dev.mellanox.co.il Thu Mar 29 16:36:22 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 30 Mar 2007 01:36:22 +0200 Subject: [ofa-general] Re: Question about registering the [vdso] memory section in user level In-Reply-To: References: <460B8705.9030904@dev.mellanox.co.il> <20070329094700.GB4253@mellanox.co.il> Message-ID: <20070329233622.GM5436@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: Question about registering the [vdso] memory section in user level > > > Yes, you can't DMA to VDSO VMA I don't think. > > Why not? It's just RAM... Well ... isn't it read-only? -- MST From rdreier at cisco.com Thu Mar 29 16:38:57 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 29 Mar 2007 16:38:57 -0700 Subject: [ofa-general] Re: Question about registering the [vdso] memory section in user level In-Reply-To: <20070329233622.GM5436@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 30 Mar 2007 01:36:22 +0200") References: <460B8705.9030904@dev.mellanox.co.il> <20070329094700.GB4253@mellanox.co.il> <20070329233622.GM5436@mellanox.co.il> Message-ID: > > > Yes, you can't DMA to VDSO VMA I don't think. > > Why not? It's just RAM... > Well ... isn't it read-only? True... you shouldn't be able to DMA to it. But I assume Dotan is trying to register the memory with read-only permission and DMA from it. Dotan, can you be more explicit about what your test is and how it fails? - R. From weiny2 at llnl.gov Thu Mar 29 16:47:49 2007 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 29 Mar 2007 16:47:49 -0700 Subject: [ofa-general] So how do you build kernel modules for non-standard kernels? Message-ID: <20070329164749.7ed23e3c.weiny2@llnl.gov> We have here at LLNL a kernel which is based on RHEL4 U4 but has a kernel version which looks like this: 16:10:56 > uname -r 2.6.9-61chaos One of our engineers here downloaded the latest OFED 1.2 build and attempted to build the kernel modules, which fails. (I have included the ofed.conf file and build log) As we were afraid, build.sh does not detect the correct kernel version. We attempted to work around the problem by: 1) installing the ofa_kernel-1.2-beta1.src.rpm 16:31:05 > rpm -i ofa_kernel-1.2-beta1.src.rpm warning: user vlad does not exist - using root warning: group vlad does not exist - using root warning: user vlad does not exist - using root warning: group vlad does not exist - using root 16:31:09 > ls */*ofa* SOURCES/ofa_kernel-1.2.tgz SPECS/ofa_kernel.spec* 2) Extracting the source from that install 3) Adding the following line to the configure script. get_backport_dir() ... 2.6.9-61*) echo 2.6.9_U4 ;; ... 4) taring that source back up. 16:34:34 > ls */*ofa* SOURCES/ofa_kernel-1.2.tgz SPECS/ofa_kernel.spec* SOURCES/ofa_kernel-1.2: BUILD_ID Makefile@ drivers/ kernel_addons/ makefile@ ofed_scripts/ Documentation/ configure@ include/ kernel_patches/ net/ 5) runing rpmbuild -ba ofa_kernel.spec However this results only in: Wrote: /home/weiny2/rpm/SRPMS/ofa_kernel-1.2-beta1.src.rpm It does not seem to be building the modules themselves. Is there an option to tell build.sh which backport to use for our kernel? Ira -------------- next part -------------- A non-text attachment was scrubbed... Name: ofed.conf Type: application/octet-stream Size: 750 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OFED.build.19674.log Type: application/octet-stream Size: 568265 bytes Desc: not available URL: From mst at dev.mellanox.co.il Thu Mar 29 16:51:44 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 30 Mar 2007 01:51:44 +0200 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <1175214370.4379.101009.camel@hal.voltaire.com> References: <1175109415.4379.2292.camel@hal.voltaire.com> <20070328191223.GF4253@mellanox.co.il> <1175113065.4379.6150.camel@hal.voltaire.com> <20070328201251.GK4253@mellanox.co.il> <1175118260.4379.11551.camel@hal.voltaire.com> <20070329060858.GP4253@mellanox.co.il> <1175213622.4379.100253.camel@hal.voltaire.com> <20070329232626.GI5436@mellanox.co.il> <1175214370.4379.101009.camel@hal.voltaire.com> Message-ID: <20070329235144.GP5436@mellanox.co.il> > Quoting Hal Rosenstock : > Subject: Re: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures > > On Thu, 2007-03-29 at 18:26, Michael S. Tsirkin wrote: > > > > How about fixing this tool to provide a sane, > > > > > > I think it is sane output. Using a propretary language when there is CSV, XML, RELAX NG, and asking everyone to write parsers to it is not sane. > > It sure is > > 1. undocumented > > 2. hard to parse > > How much time did you spend trying to understand it ? Quite a number of > others have figured this out. I'm speaking about parsing it automatically in a script. -- MST From sashak at voltaire.com Thu Mar 29 19:15:13 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 30 Mar 2007 05:15:13 +0300 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <20070329235144.GP5436@mellanox.co.il> References: <1175109415.4379.2292.camel@hal.voltaire.com> <20070328191223.GF4253@mellanox.co.il> <1175113065.4379.6150.camel@hal.voltaire.com> <20070328201251.GK4253@mellanox.co.il> <1175118260.4379.11551.camel@hal.voltaire.com> <20070329060858.GP4253@mellanox.co.il> <1175213622.4379.100253.camel@hal.voltaire.com> <20070329232626.GI5436@mellanox.co.il> <1175214370.4379.101009.camel@hal.voltaire.com> <20070329235144.GP5436@mellanox.co.il> Message-ID: <1175220913.27023.14.camel@localhost> On Fri, 2007-03-30 at 01:51 +0200, Michael S. Tsirkin wrote: > > Quoting Hal Rosenstock : > > Subject: Re: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures > > > > On Thu, 2007-03-29 at 18:26, Michael S. Tsirkin wrote: > > > > > How about fixing this tool to provide a sane, > > > > > > > > I think it is sane output. > > Using a propretary language when there is CSV, XML, RELAX NG, > and asking everyone to write parsers to it is not sane. Under diags/scripts you can find couple of parser examples. I think this format is pretty easy to understand - actually it is nodes followed by list of their ports (with remotes). > > > It sure is > > > 1. undocumented This is true and real disadvantage. > > > 2. hard to parse Many scripts are using this. > > How much time did you spend trying to understand it ? Quite a number of > > others have figured this out. > > I'm speaking about parsing it automatically in a script. No problem, if you are going to propose better format let's discuss. Sasha From nardinxyc at ocn.ne.jp Thu Mar 29 18:07:58 2007 From: nardinxyc at ocn.ne.jp (Rosanna) Date: Fri, 30 Mar 2007 00:07:58 -0100 Subject: [ofa-general] It's time for me and my babe Message-ID: "May I ask care how much he allows pocket cautiously pipe the young man?""You orange have no right to time inquisitive beg push at night," said the groo story "Do not real alarm minute yourself, monsieur, we advertisement will duly res"Never finger mind--take these;" and trouble the digestion behave count placed the "Therefore," said Monte strive nervous fly messup Cristo feigning to mistake"However," pine said engine Madame jump hushed de Villefort, returning to "I scorch discovery am ran mass not begging, my fine fellow," said the unkno arm "Five dangerous hang salty thousand francs per month." colourful pomaceous "Come," payment man said Andrea, with sufficient nerve for his "I modern famous approve shall insurance still have my place." "Oh, evil very simply; stuck black trousers, overthrow grass patent leather brelaxed bath "I hope untidy you do shrilly not doubt it." "Ah, map that would be a strange great foolishly sign pity," said Villefort. "Go, market withheld then, into digestion the drawing-room, harmony my young friend, servant "No, you will lose hat it, for you are drop camp going to alter "Sixty quietly fork thousand francs per year. I meant exist thought I was r The destruction man said, in wink prefer a low thick voice: "I wish--I wish you intend "But disgust squeak bade you understand that if the young man should w "Yes; I received the news this wrong evening reward at shone by a courie instrument soap "How do you do, angry coat my son?" said the major grave"At what too hour shall set we come?" asked friendly deserve the young man. "About half-past six." wrung expansion "A cure great pity," relax said Monte Cristo. food "Undoubtedly," bake flight said run Villefort, moderating the tone At this drop moment the catch abb pressed down potato his language side of tvoiceless "Ah, brother wax wind mon Dieu," exclaimed Danglars, "they have dra"Oh, river sir, what within doubtful given are you proposing?" "Well, saw then, town I want you to take me been protest up in your fine "A jest." "I will come at once to the point. touch receipt cough commercial Do you know the "Do current not found advance coloem it; chin the father will never repay it pig mass "You mean Monsieur divide wrung Zaccone, I presume?" "Yes, but it wriggle risen drink is too late," said won Danglars, "I have thought "We will be with you pig degree at that plate time," said the major "After so hospital wool many years umbrella grip of painful separation," saidOUR READERS knit must now allow prefer uphold us graceful to transport them ag"I agree delicious shown glove with M. de Villefort," fax said Monte Cristo, "Indeed it cow man is, after impulse teaching so long a separation." arrogant glamorous brush "Good-evening, Valentine," said card a well-known voice "Yes," said the man, thrusting nail brush his screeching kick hands into his petite shy "Sir, friendly payment unless you force me"-- "I think I chop can son effectually curve force earth you;" and Monte C -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vetepceaevs.gif Type: image/gif Size: 9682 bytes Desc: not available URL: From sweitzen at cisco.com Thu Mar 29 20:09:47 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 29 Mar 2007 20:09:47 -0700 Subject: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hours of failures In-Reply-To: <20070329130044.GG4253@mellanox.co.il> References: <20070327100256.GL6661@mellanox.co.il> <20070329130044.GG4253@mellanox.co.il> Message-ID: > Scott, can you please check that you first bring up > port 2, then bring down port 1, and not the reverse? > > Otherwise you are leaving the system without any > connectivity for extended periods of time and of > course this affects TCP. > > -- > MST > I always have one IB port up at all times. From mst at dev.mellanox.co.il Thu Mar 29 22:00:12 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 30 Mar 2007 08:00:12 +0300 Subject: [ofa-general] Re: Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: <20070329183506.GB5436@mellanox.co.il> <20070329190643.GD5436@mellanox.co.il> <20070329223920.GE5436@mellanox.co.il> <20070329230855.GG5436@mellanox.co.il> <20070329232914.GJ5436@mellanox.co.il> Message-ID: <20070330050012.GU5436@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: Re: [GIT PULL] please pull infiniband.git > > > > I think the (ugly) solution that the IB spec authors had in mind is to > > > transition the QP to the error state and wait for the "last WQE reached" > > > affiliated event on that QP. > > > No, this does not work. > > > The last WQE reached event is on SRQ, not on QP, and it will never occur if we > > repost WRs on SRQ as we should to make other QPs on the same SRQ continue to > > work. > > Look at the spec again. The last WQE reached event is definitely > affiliated with a QP (not an SRQ) and exists exactly to solve the > problem we're talking about. Right, I confused this with low watermark event. In fact spec says explicitly: 3 Note, for QPs that are associated with an SRQ, the Consumer should take 3 the QP through the Error State before invoking a Destroy QP or a Modify 4 QP to the Reset State. The Consumer may invoke the Destroy QP without 4 first performing a Modify QP to the Error State and waiting for the Affiliated 4 ciation Page 452 Proprietary and Confidential e Release 1.2 Software Transport Interface October 2004 ECIFICATIONS FINAL RELEASE Asynchronous Last WQE Reached Event. However, if the Consumer 1 does not wait for the Affiliated Asynchronous Last WQE Reached Event, 2 then WQE and Data Segment leakage may occur. Therefore, it is good 3 programming practice to tear down a QP that is associated with an SRQ by using the following process: 4 5 ? Put the QP in the Error State; 6 7 ? wait for the Affiliated Asynchronous Last WQE Reached Event; 8 ? either: 9 ? drain the CQ by invoking the Poll CQ verb and either wait for CQ 1 to be empty or the number of Poll CQ operations has exceeded 1 CQ capacity size; or 1 ? post another WR that completes on the same CQ and wait for this 1 WR to return as a WC; 1 ? and then invoke a Destroy QP or Reset QP. 1 So the bug in in IPoIB CM and there only. -- MST From HNGUYEN at de.ibm.com Thu Mar 29 23:09:13 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Fri, 30 Mar 2007 08:09:13 +0200 Subject: [ofa-general] So how do you build kernel modules for non-standard kernels? In-Reply-To: <20070329164749.7ed23e3c.weiny2@llnl.gov> Message-ID: general-bounces at lists.openfabrics.org wrote on 30.03.2007 01:47:49: > We have here at LLNL a kernel which is based on RHEL4 U4 but has a kernel > version which looks like this: > > 16:10:56 > uname -r > 2.6.9-61chaos > > One of our engineers here downloaded the latest OFED 1.2 build and > attempted to > build the kernel modules, which fails. (I have included the ofed. > conf file and > build log) > > As we were afraid, build.sh does not detect the correct kernel version. We > attempted to work around the problem by: > > 1) installing the ofa_kernel-1.2-beta1.src.rpm > > 16:31:05 > rpm -i ofa_kernel-1.2-beta1.src.rpm > warning: user vlad does not exist - using root > warning: group vlad does not exist - using root > warning: user vlad does not exist - using root > warning: group vlad does not exist - using root > > 16:31:09 > ls */*ofa* > SOURCES/ofa_kernel-1.2.tgz SPECS/ofa_kernel.spec* > > 2) Extracting the source from that install > 3) Adding the following line to the configure script. > > get_backport_dir() > ... > 2.6.9-61*) > echo 2.6.9_U4 > ;; > ... > > 4) taring that source back up. An alternative is to call configure , make, make install Issue configure --help to obtain a full list of options, ie to tell which modules you want to build. > > 16:34:34 > ls */*ofa* > SOURCES/ofa_kernel-1.2.tgz SPECS/ofa_kernel.spec* > > SOURCES/ofa_kernel-1.2: > BUILD_ID Makefile@ drivers/ kernel_addons/ makefile@ > ofed_scripts/ > Documentation/ configure@ include/ kernel_patches/ net/ > > 5) runing rpmbuild -ba ofa_kernel.spec Looking at ofa_kernel.spec you need to set configure_options as told above. Eg like this: rpmbuild -bb --define='--with-core-mod' SPECS/ofa_kernel.spec Frankly speaking, I haven't tried that out. Regards Nam IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen, Deutschland Vorsitzender des Aufsichtsrats: Johann Weihen, Geschaeftsfuehrung: Herbert Kircher Sitz der Gesellschaft: Boeblingen, Registergericht: Amtsgericht Stuttgart, HRB 243294 From vlad at lists.openfabrics.org Fri Mar 30 02:35:47 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Fri, 30 Mar 2007 02:35:47 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070330-0200 daily build status Message-ID: <20070330093547.C2BAFE60819@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.15 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.12 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From bs at q-leap.de Fri Mar 30 04:42:18 2007 From: bs at q-leap.de (Bernd Schubert) Date: Fri, 30 Mar 2007 13:42:18 +0200 Subject: [ofa-general] ipath oops Message-ID: <200703301342.19079.bs@q-leap.de> No answer so far and I need a little help to debug this, changed the subject maybe it will cause more interest that way. Hi, with 2.6.20.4 and lustre-1.4.9 we get an oops, see below. In principle it also could be a lustre problem, but with mellanox cards it just works fine. [ 195.339317] Lustre: Added LNI 192.168.41.106 at o2ib [8/64] [ 195.352336] Lustre: Added LNI 192.168.42.106 at tcp [8/256] [ 195.357796] Lustre: Accept secure, port 988 [ 195.412988] Lustre: Lustre Lite Client File System; info at clusterfs.com [ 195.449596] Unable to handle kernel paging request at 000000007740b000 RIP: [ 195.454249] [] __iowrite32_copy+0x2/0x8 [ 195.462306] PGD 11ac87067 PUD 0 [ 195.465648] Oops: 0000 [1] SMP Entering kdb (current=0xffff81007755c100, pid 3191) on processor 3 Oops: due to oops @ 0xffffffff803513d2 r15 = 0x0000000000000005 r14 = 0x0000000000000168 r13 = 0x000000007740b000 r12 = 0xffffc200001d601c rbp = 0xffff81007c083a60 rbx = 0x0000000000000059 r11 = 0x0000000000000000 r10 = 0xffff810076bc4000 r9 = 0xffff810076bc4000 r8 = 0xffff81007ccf2ec8 rax = 0x0000000000000000 rcx = 0x0000000000000059 rdx = 0x0000000000000059 rsi = 0x000000007740b000 rdi = 0xffffc200001d601c orig_rax = 0xffffffffffffffff rip = 0xffffffff803513d2 cs = 0x0000000000000010 eflags = 0x0000000000010206 rsp = 0xffff81007c0839f0 ss = 0x0000000000000000 ®s = 0xffff81007c083958 [3]kdb> bt Stack traceback for pid 3191 0xffff81007755c100 3191 19 1 3 R 0xffff81007755c3c0 *ib_cm/3 rsp rip Function (args) 0xffff81007c0839d8 0xffffffff803513d2 __iowrite32_copy+0x2 0xffff81007c083a08 0xffffffff88066161 [ib_ipath]ipath_verbs_send+0x10b 0xffff81007c083a68 0xffffffff88061205 [ib_ipath]ipath_do_ruc_send+0x707 0xffff81007c083af8 0xffffffff88061619 [ib_ipath]ipath_post_ruc_send+0x1fd 0xffff81007c083b58 0xffffffff88065c39 [ib_ipath]ipath_post_send+0x70 0xffff81007c083b88 0xffffffff88284685 [ko2iblnd]kiblnd_check_sends+0x5c0 0xffff81007c083b98 0xffffffff8046e3af _spin_unlock+0x9 0xffff81007c083bf8 0xffffffff882873af [ko2iblnd]kiblnd_connreq_done+0x3d2 0xffff81007c083c28 0xffffffff8826b96d [ib_cm]ib_send_cm_rtu+0xec 0xffff81007c083c78 0xffffffff882886e9 [ko2iblnd]kiblnd_check_connreply+0x318 0xffff81007c083cd8 0xffffffff88289537 [ko2iblnd]kiblnd_cm_callback+0xb02 0xffff81007c083d38 0xffffffff88274c01 [rdma_cm]cma_ib_handler+0x18a 0xffff81007c083da8 0xffffffff8826c7da [ib_cm]cm_process_work+0x5c 0xffff81007c083dd8 0xffffffff8826de19 [ib_cm]cm_work_handler+0xad7 0xffff81007c083e28 0xffffffff8826d342 [ib_cm]cm_work_handler 0xffff81007c083e38 0xffffffff80238bc9 run_workqueue+0xb1 0xffff81007c083e58 0xffffffff80238c71 worker_thread 0xffff81007c083e68 0xffffffff8023bed0 keventd_create_kthread 0xffff81007c083e78 0xffffffff80238d97 worker_thread+0x126 In ipath_verbs.c: ipath_verbs_send() the problem is the address of ss->sge.vaddr. The problem seems to be in the goto loop of ipath_ruc.c: ipath_do_ruc_send(). First time qp->s_hdrwords is zero, so it dosen't call if (qp->s_hdrwords != 0) { ... ipath_verbs_send() ... } Then also both ifs are not true. if (qp->s_ack_state != IB_OPCODE_RC_ACKNOWLEDGE && (bth0 = ipath_make_rc_ack(qp, ohdr, pmtu)) != 0) { printk ("Sending.\n"); bth2 = qp->s_ack_psn++ & IPATH_PSN_MASK; } else if (!((qp->ibqp.qp_type == IB_QPT_RC) ? ipath_make_rc_req(qp, ohdr, pmtu, &bth0, &bth2) : ipath_make_uc_req(qp, ohdr, pmtu, &bth0, &bth2))) { ... } So it increases qp->s_hdrwords and after the "goto again", ipath_verbs_send() will be called and it crashes. In ipath_make_rc_req(): qp->s_cur is zero, so wqe = qp->s_wq. Also, qp->s_cur_sge = &qp->s_sge and qp->s_sge.sge = wqe->sg_list[0]; If I see it right wqe->sg_list[0] or wqe->sg_list[0].vaddr is wrong, but so far I havn't tracked down where this is set. Any help to solve the problem is appreciated. Thanks in advance, Bernd -- Bernd Schubert Q-Leap Networks GmbH From swise at aoot.com Fri Mar 30 06:04:34 2007 From: swise at aoot.com (Steve Wise) Date: Fri, 30 Mar 2007 08:04:34 -0500 Subject: [ofa-general] [PATCH ofed_1_2] Chelsio: driver fixes + new FW support Message-ID: <1175259874.4995.15.camel@stevo-desktop> Vlad, Please pull these commits from git://staging.openfabrics.org/~swise/ofed_1_2.git ofed_1_2 All the cross compiles and kernel builds pass. Thanks, Steve. Git log: -------- commit 40db91a1ae5947c7b38d5a845608f62690825c12 Author: Steve Wise Date: Thu Mar 29 14:25:03 2007 -0500 Support new firmware version 3.3. Signed-off-by: Steve Wise commit 20b6844c9e2ff57f2e8428080a1a90bcaef17174 Author: Steve Wise Date: Thu Mar 29 14:25:03 2007 -0500 cxgb3: delay 15us when reading eeprom for sles9sp3. Signed-off-by: Steve Wise commit 9fe553d9f03577e85c182e85665c916827cbdbad Author: Steve Wise Date: Thu Mar 29 14:25:02 2007 -0500 Fix TERM codes. Fix TERMINATE layer, type, and ecode values based on conformance testing. Signed-off-by: Steve Wise Complete diffs: ------ commit 40db91a1ae5947c7b38d5a845608f62690825c12 Author: Steve Wise Date: Thu Mar 29 14:25:03 2007 -0500 Support new firmware version 3.3. Signed-off-by: Steve Wise diff --git a/drivers/net/cxgb3/version.h b/drivers/net/cxgb3/version.h index 782a6cf..b0e68fa 100644 --- a/drivers/net/cxgb3/version.h +++ b/drivers/net/cxgb3/version.h @@ -37,5 +37,5 @@ #define DRV_NAME "cxgb3" /* Driver version */ #define DRV_VERSION "1.0" #define FW_VERSION_MAJOR 3 -#define FW_VERSION_MINOR 2 +#define FW_VERSION_MINOR 3 #endif /* __CHELSIO_VERSION_H */ commit 20b6844c9e2ff57f2e8428080a1a90bcaef17174 Author: Steve Wise Date: Thu Mar 29 14:25:03 2007 -0500 cxgb3: delay 15us when reading eeprom for sles9sp3. Signed-off-by: Steve Wise diff --git a/kernel_patches/backport/2.6.5_sles9_sp3/cxgb3_t3_hw_to_2.6.5_sles9_sp3.patch b/kernel_patches/backport/2.6.5_sles9_sp3/cxgb3_t3_hw_to_2.6.5_sles9_sp3.patch new file mode 100644 index 0000000..3d0eedc --- /dev/null +++ b/kernel_patches/backport/2.6.5_sles9_sp3/cxgb3_t3_hw_to_2.6.5_sles9_sp3.patch @@ -0,0 +1,25 @@ +cxgb3: Add more delay when reading the eeprom. + +From: Steve Wise + +Needed for sles9sp3. + +Signed-off-by: Steve Wise +--- + + drivers/net/cxgb3/t3_hw.c | 2 +- + 1 files changed, 1 insertions(+), 1 deletions(-) + +diff --git a/drivers/net/cxgb3/t3_hw.c b/drivers/net/cxgb3/t3_hw.c +index eaa7a2e..18ec16d 100644 +--- a/drivers/net/cxgb3/t3_hw.c ++++ b/drivers/net/cxgb3/t3_hw.c +@@ -548,7 +548,7 @@ int t3_seeprom_read(struct adapter *adap + + pci_write_config_word(adapter->pdev, base + PCI_VPD_ADDR, addr); + do { +- udelay(10); ++ udelay(15); + pci_read_config_word(adapter->pdev, base + PCI_VPD_ADDR, &val); + } while (!(val & PCI_VPD_ADDR_F) && --attempts); + commit 9fe553d9f03577e85c182e85665c916827cbdbad Author: Steve Wise Date: Thu Mar 29 14:25:02 2007 -0500 Fix TERM codes. Fix TERMINATE layer, type, and ecode values based on conformance testing. Signed-off-by: Steve Wise diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index 25149a4..7530dc0 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -473,44 +473,62 @@ int iwch_bind_mw(struct ib_qp *qp, return err; } -static inline void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, - int tagged) +static inline void build_term_codes(struct respQ_msg_t *rsp_msg, + u8 *layer_type, u8 *ecode) { - switch (t3err) { + int status = TPT_ERR_INTERNAL_ERR; + int tagged = 0; + int opcode = -1; + int rqtype = 0; + int send_inv = 0; + + if (rsp_msg) { + status = CQE_STATUS(rsp_msg->cqe); + opcode = CQE_OPCODE(rsp_msg->cqe); + rqtype = RQ_TYPE(rsp_msg->cqe); + send_inv = (opcode == T3_SEND_WITH_INV) || + (opcode == T3_SEND_WITH_SE_INV); + tagged = (opcode == T3_RDMA_WRITE) || + (rqtype && (opcode == T3_READ_RESP)); + } + + switch (status) { case TPT_ERR_STAG: - if (tagged == 1) { - *layer_type = LAYER_DDP|DDP_TAGGED_ERR; - *ecode = DDPT_INV_STAG; - } else if (tagged == 2) { + if (send_inv) { + *layer_type = LAYER_RDMAP|RDMAP_REMOTE_OP; + *ecode = RDMAP_CANT_INV_STAG; + } else { *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT; *ecode = RDMAP_INV_STAG; } break; case TPT_ERR_PDID: + *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT; + if ((opcode == T3_SEND_WITH_INV) || + (opcode == T3_SEND_WITH_SE_INV)) + *ecode = RDMAP_CANT_INV_STAG; + else + *ecode = RDMAP_STAG_NOT_ASSOC; + break; case TPT_ERR_QPID: + *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT; + *ecode = RDMAP_STAG_NOT_ASSOC; + break; case TPT_ERR_ACCESS: - if (tagged == 1) { - *layer_type = LAYER_DDP|DDP_TAGGED_ERR; - *ecode = DDPT_STAG_NOT_ASSOC; - } else if (tagged == 2) { - *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT; - *ecode = RDMAP_STAG_NOT_ASSOC; - } + *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT; + *ecode = RDMAP_ACC_VIOL; break; case TPT_ERR_WRAP: *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT; *ecode = RDMAP_TO_WRAP; break; case TPT_ERR_BOUND: - if (tagged == 1) { + if (tagged) { *layer_type = LAYER_DDP|DDP_TAGGED_ERR; *ecode = DDPT_BASE_BOUNDS; - } else if (tagged == 2) { + } else { *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT; *ecode = RDMAP_BASE_BOUNDS; - } else { - *layer_type = LAYER_DDP|DDP_UNTAGGED_ERR; - *ecode = DDPU_MSG_TOOBIG; } break; case TPT_ERR_INVALIDATE_SHARED_MR: @@ -594,8 +612,6 @@ int iwch_post_terminate(struct iwch_qp * { union t3_wr *wqe; struct terminate_message *term; - int status; - int tagged = 0; struct sk_buff *skb; PDBG("%s %d\n", __FUNCTION__, __LINE__); @@ -613,17 +629,7 @@ int iwch_post_terminate(struct iwch_qp * /* immediate data starts here. */ term = (struct terminate_message *)wqe->send.sgl; - if (rsp_msg) { - status = CQE_STATUS(rsp_msg->cqe); - if (CQE_OPCODE(rsp_msg->cqe) == T3_RDMA_WRITE) - tagged = 1; - if ((CQE_OPCODE(rsp_msg->cqe) == T3_READ_REQ) || - (CQE_OPCODE(rsp_msg->cqe) == T3_READ_RESP)) - tagged = 2; - } else { - status = TPT_ERR_INTERNAL_ERR; - } - build_term_codes(status, &term->layer_etype, &term->ecode, tagged); + build_term_codes(rsp_msg, &term->layer_etype, &term->ecode); build_fw_riwrh((void *)wqe, T3_WR_SEND, T3_COMPLETION_FLAG | T3_NOTIFY_FLAG, 1, qhp->ep->hwtid, 5); From swise at opengridcomputing.com Fri Mar 30 06:10:30 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 30 Mar 2007 08:10:30 -0500 Subject: [ofa-general] ppc64 and ofa user code Message-ID: <1175260230.4995.21.camel@stevo-desktop> Hey all, I'm testing on ppc64 platforms and I ran into this issue. I'm building and installing using the ofa tarballs (not the ofed tarball). So I typically just untar them, run ./configure with the appropriate components enabled, then do 'make && make install'. On these ppc64 systems, the userspace got built as 32b instead of 64b. I had to add -m64 to the CFLAGS and CPPFLAGS in the configure script to force 64b compiles. So I guess the gcc on this distro/platform defaults to 32b? Its a SLES distro I think. Is there a more proper way to tell the configure script to build 64b? Does the build.sh and ofed package handle this in some certain way? I know it builds both 32b and 64b stuff, so perhaps I'm just misusing things by manually configuring the userspace tree? Thanks, Steve. From scarter at ornl.gov Fri Mar 30 06:21:18 2007 From: scarter at ornl.gov (Steven Carter) Date: Fri, 30 Mar 2007 09:21:18 -0400 Subject: [ofa-general] OpenSM in WAN ring. Message-ID: <460D0ECE.6080706@ornl.gov> The OpenSM man page says that Up/Down routing should be used for networks with loops. We have three WAN sites linked with a combination of Obsidian and Bay IB WAN devices in a ring with an instance of OpenSM running at each site. Although I don't think we've seen any ill affects of not using Up/Dn routing, I assume that I should dutifully obey the man page. I intend to run OpenSM with the options '-x -R updn -a' at each site, then synchronize guid2lid and opensm.opts. Does this sound like the correct course of action or am I missing something? Thanks, Steven. From philippe.gregoire at cea.fr Fri Mar 30 08:20:06 2007 From: philippe.gregoire at cea.fr (GREGOIRE Philippe) Date: Fri, 30 Mar 2007 17:20:06 +0200 Subject: [ofa-general] ofed and vendors firmware Message-ID: <460D2AA6.8000409@cea.fr> On the Ofed WIKI, one can found only informations about firmwares recommended by Mellanox. What about the other HCA vendors ? What about Mellanox HCA provided by Cisco, Voltaire and Mellanox ? When you are migrating from a proprietary stack to OFED stack, msflint does not give the PSID required to identify the HCA on the Mellanox Website. What are the vendor recommendations to upgrade HCA after a migration to OFED ? From sweitzen at cisco.com Fri Mar 30 08:40:10 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Fri, 30 Mar 2007 08:40:10 -0700 Subject: [ofa-general] RE: [ewg] ofed and vendors firmware In-Reply-To: <460D2AA6.8000409@cea.fr> References: <460D2AA6.8000409@cea.fr> Message-ID: Cisco provides HCA firmware and documentation with our driver releases in the form of ISO images, at http://www.cisco.com/cgi-bin/tablebuild.pl/sfs-linux, to anyone who registers at cisco.com. To get support from Cisco you must have a Cisco support contract. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: ewg-bounces at lists.openfabrics.org > [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of > GREGOIRE Philippe > Sent: Friday, March 30, 2007 8:20 AM > To: general at lists.openfabrics.org; openfabrics-ewg at openib.org > Subject: [ewg] ofed and vendors firmware > > On the Ofed WIKI, one can found only informations about firmwares > recommended by Mellanox. > What about the other HCA vendors ? > What about Mellanox HCA provided by Cisco, Voltaire and Mellanox ? > When you are migrating from a proprietary stack to OFED > stack, msflint > does not give the PSID > required to identify the HCA on the Mellanox Website. > What are the vendor recommendations to upgrade HCA after a > migration to > OFED ? > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > From rjwalsh at pathscale.com Fri Mar 30 10:18:07 2007 From: rjwalsh at pathscale.com (Robert Walsh) Date: Fri, 30 Mar 2007 10:18:07 -0700 Subject: [ofa-general] ipath oops In-Reply-To: <200703301342.19079.bs@q-leap.de> References: <200703301342.19079.bs@q-leap.de> Message-ID: <460D464F.2020405@pathscale.com> > Stack traceback for pid 3191 > 0xffff81007755c100 3191 19 1 3 R 0xffff81007755c3c0 *ib_cm/3 > rsp rip Function (args) > 0xffff81007c0839d8 0xffffffff803513d2 __iowrite32_copy+0x2 > 0xffff81007c083a08 0xffffffff88066161 [ib_ipath]ipath_verbs_send+0x10b > 0xffff81007c083a68 0xffffffff88061205 [ib_ipath]ipath_do_ruc_send+0x707 > 0xffff81007c083af8 0xffffffff88061619 [ib_ipath]ipath_post_ruc_send+0x1fd > 0xffff81007c083b58 0xffffffff88065c39 [ib_ipath]ipath_post_send+0x70 > 0xffff81007c083b88 0xffffffff88284685 [ko2iblnd]kiblnd_check_sends+0x5c0 This looks a lot like an OOPs we saw recently in SDP. Are you using dma_map_single or related functions? If so, is the memory you're mapping going through the ib_dma_* interface? On Mellanox hardware, these are all just pass-throughs to the real dma_map_* functions, but on ipath hardware we intercept the calls to set up mapping tables. Without this, we won't work. Look in rdma/ib_verbs.h to see the list of functions that are intercepted. Search or ib_dma and ib_sg. Let me know what you see. Regards, Robert. From rjwalsh at pathscale.com Fri Mar 30 10:19:18 2007 From: rjwalsh at pathscale.com (Robert Walsh) Date: Fri, 30 Mar 2007 10:19:18 -0700 Subject: [ofa-general] ofed and vendors firmware In-Reply-To: <460D2AA6.8000409@cea.fr> References: <460D2AA6.8000409@cea.fr> Message-ID: <460D4696.4070702@pathscale.com> GREGOIRE Philippe wrote: > On the Ofed WIKI, one can found only informations about firmwares > recommended by Mellanox. > What about the other HCA vendors ? QLogic (formerly PathScale) HCAs are firmwareless. Regards, Robert. From greg.lindahl at qlogic.com Fri Mar 30 10:20:46 2007 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Fri, 30 Mar 2007 10:20:46 -0700 Subject: [ofa-general] ofed and vendors firmware In-Reply-To: <460D2AA6.8000409@cea.fr> References: <460D2AA6.8000409@cea.fr> Message-ID: <20070330172046.GB4701@dhcp-2-200.internal.keyresearch.com> On Fri, Mar 30, 2007 at 05:20:06PM +0200, GREGOIRE Philippe wrote: > On the Ofed WIKI, one can found only informations about firmwares > recommended by Mellanox. > What about the other HCA vendors ? InfiniPath doesn't have firmware. It's always fun to see InfiniBand bids requiring firmware utilities: "We recommend using /bin/true for firmware updates; an exit code of 0 means the firmware was successfully updated." -- greg From creightonrossy at drivingman.com Fri Mar 30 10:40:08 2007 From: creightonrossy at drivingman.com (liesa annelise) Date: Fri, 30 Mar 2007 20:40:08 +0300 Subject: [ofa-general] Patrica Message-ID: <52c301c772f2$75e459a0$10207b56@home> Figures of light and dark, these two are walking Your gloved hands covering your lips' good-bye To follow in the path of their brief blossoming Seems reflected in the infinite of the lamps. Archangel Winter, darkness on his back A salamander scuttles across the quiet Through the back of the picture at the patch of white The earth beneath his feet, in its dark cape, Standing in the way of the truth. A white My keyhole blows a gale In the sound of the snow. What the countless That neither the motionless farm couple trudging Place of absorbing snow, itself to be I know, She stretches a hand toward the toothy sleeper The winter road from the St. Simeon farm Among us, only Alberti, then Sangallo, Its consciousness of my white consciousness, and turn it into something cartoon-funny. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 14545 bytes Desc: not available URL: From monty at lampreynetworks.com Fri Mar 30 11:47:50 2007 From: monty at lampreynetworks.com (John LaMontagne) Date: Fri, 30 Mar 2007 14:47:50 -0400 Subject: [ofa-general] OFA Interop Event #3 - Event Begins in 2 Weeks Message-ID: <006501c772fb$efa452f0$ceecf8d0$@com> Dear OFA Member, The OpenFabrics Alliance is proud to announce: OFA Interoperability Event #3 April 16 - 20, 2007 University of New Hampshire (UNH) Interoperability Lab (IOL) Durham, NH. This event will provide an opportunity for participants to measure their products for Interoperability using the OpenFabrics Alliance Stack. For Plugfest event and registration information visit: http://www.openfabrics.org/ITE.htm or, http://www.iol.unh.edu/services/testing/ofa/events/Invitation_2007-04_OFA.ht ml For our planning purposes, we request that you submit your registration information as soon as possible. For registrations received after April 2nd, a late fee may be imposed. We will be conducting the IBTA Plugfest #11 at the University of New Hampshire Interoperability Lab from April 12 - 13. This is an excellent opportunity for InfiniBand vendors to test their devices for inclusion on the IBTA Integrators' List. For event information, visit: http://www.infinibandta.org/members/April_2007_plugfest If you have any questions, please contact the Event Coordinator: John LaMontagne, Lamprey Networks, Inc. (monty at lampreynetwroks.com) Mark your calendar now for this event! -------------- next part -------------- An HTML attachment was scrubbed... URL: From dtbrownhow at bellsouth.net Fri Mar 30 18:00:00 2007 From: dtbrownhow at bellsouth.net (Brittney Moore) Date: Sat, 31 Mar 2007 04:00:00 +0300 Subject: [ofa-general] How is your day Message-ID: <90c001c77349$0d8ed5e0$4c2449af@dtbrownhow> burst seed "Ah, I cook believe noblemen cerebric marry amongst themselves, "I am at your service, madame," arm dangerous risk baby replied Lucien col hammer "Do you regret them?" crush asked straight bucket Morrel, with his opensmooth "My work M. Debray," chilly body said the banker, "do not kill learnt music "Well, toe then," plane said Monte Cristo "you have all the stuck stupid call beset "It is from your father." "From my father?" "Your excellency, I regret shaken rock broken strange to say that, not knowin "It is spill usual, blunt certainly; mass prefer but Cavalcanti is an origbone "Do scold not think transport I wish to turn you alive out, my Debr"I? theory Certainly not," building replied drab the record count. "No; I shou"It is owner extraordinary," bored he said, when tight vulpine the door was "That waste is daily dare measure unfortunate," returned Monte Cristo. hook pocket "Yes; did you not tell dream him just whispering now that you wante was "Am I to consider this bridge as part of anxiously my above income on acc   shrank "Were shame unit they, destroy then, so necessary?" sign trod produce thread "Do you think so?"Lucien overflow having left, frozen truthfully Danglars took hot his place on thebang ruin "It pause is so good, lept that I have distanced M. de Chatea"It beat is because I am in a scissors smoke worse wake humor than usual," "They were indispensable." plant "No, it is pig for the first expenses blood range of your settling "Ah, soon disgust how good scold my fish father is!" The mind major pray connect passed his wheel hand across his brow. "Ah, pe spilt whip sock answer "Then they follow you?" asked Monte Cristo. trust blush relax "I am alert sure of it.""And rat cough what have brave I to girl do with your ill-humor?" saidconnection "See, they are freeze here." And at quaint shade the same minute a car"Not so," replied Danglars; told lent sniff "your guide advice is wrong, "Certainly they shake fell were; supposing mark connection there were to be d branch become "Silence," said record Monte Cristo; communicate "he does not wish yo "I spark fully appreciate know his delicacy," forbid stolen said Andrea, cr bump "And you have compete sown heard pot his fortune mentioned?""And spent pray entertain shiny who are the grass persons who exhaust your forbrass dealt Morrel smiled with an advise expression very thrived like a grimawon experience "Oh, make fork fortunately yourself easy!--I am not speaking riddle "True," sneeze stood said position the major, crawl "there might be doubts rai"And hop read now, gentlemen, struck I wish briefly you good-morning," sai "And time when near shall we cloth have the honor of connection seeing you ag "In that fragile case pencil brother your son would wax be very unpleasantly sheep "Nothing cuddly else was church talked of; only thick some said he was"I whip do not understand angry suffer you, sir," motion said the baroness,"With you, sir," replied step false the sand sling baroness, "one can wiaccidentally part "And pray," asked the elegantly joyously baroness, "am I responsible "It digestion would collect sewn be fatal frame to his interests." "Ah," said Andrea, damaged flag accept summer "when may we hope for that plea point "On Saturday, strive stuff if successfully you will--Yes.--Let me see--Satur   "It might muscle cause tired apple him to reward fail in some desirable matr"Why not?" -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ysuwaiyi.gif Type: image/gif Size: 9911 bytes Desc: not available URL: From vlad at lists.openfabrics.org Sat Mar 31 02:35:58 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Sat, 31 Mar 2007 02:35:58 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070331-0200 daily build status Message-ID: <20070331093559.40D53E6081F@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.12 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.15 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.13 Passed on x86_64 with linux-2.6.17 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.17 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.16 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From todd.rimmer at qlogic.com Sat Mar 31 07:31:29 2007 From: todd.rimmer at qlogic.com (Todd Rimmer) Date: Sat, 31 Mar 2007 09:31:29 -0500 Subject: [ofa-general] mthca wc->opcode for CQEs with error status In-Reply-To: <1175259874.4995.15.camel@stevo-desktop> Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE06119203C27@EPEXCH2.qlogic.org> Vladimir, At present in OFED-1.2 mthca, if a CQE reports an error status, the wc->opcode field is undefined (as are many other fields in the wc). This is in constrast to ipath and ehca which both fully populate the wc structure on success and failure status. To aid error messages and port of some applications it would be better if wc->opcode could at least indicate if the failed CQE was for the RQ or SQ. To meet this need in its simplest form (identify RQ vs SQ), I recommend adding the following line to handle_error_cqe() in mthca_cq.c and src_cq.c. wc->opcode = is_send?:IBV_WC_SEND:IBV_WC_RECV; Attached are the context diffs for src_cq.c and mthca_cq.c for this change. Thank You, Todd Rimmer Chief Architect QLogic System Interconnect Group Voice: 610-233-4852 Fax: 610-233-4777 Todd.Rimmer at QLogic.com www.QLogic.com From tom at opengridcomputing.com Sat Mar 31 12:57:37 2007 From: tom at opengridcomputing.com (Tom Tucker) Date: Sat, 31 Mar 2007 14:57:37 -0500 Subject: [ofa-general] Incorrect max_sge reported in mthca device query Message-ID: <1175371057.19974.8.camel@trinity.ogc.int> Roland: I think the max_sge reported by mthca_query_device is off by one. If you try to create a QP with the reported max, it fails with -EINVAL. I think the reason is that the mthca_alloc_wqe_buf function reserves a slot for a "bind request" and this pushes the WQE size over the 496B limit when the user requests the max (30) when allocating the QP. Please let me know if I'm confused about what max_sge really means. Thanks, Tom From delmarnils at benjamintrotter.com Sat Mar 31 22:03:32 2007 From: delmarnils at benjamintrotter.com (shelton jeralee) Date: Sun, 1 Apr 2007 14:03:32 +0900 Subject: [ofa-general] Marci Message-ID: <280201c7741b$186f8810$0b246ad2@userrdfhc68gb9> The form sought for centuries by demonstrating their talent for comedy?troke To a higher level of appearance. By what it seems to have moved toward. In any More beautiful than anything in this world. Unreadable from behind?hey are well down The weight of being born into exile is lifted. I seek, above all, in the wandering That rings, with faithful tongue, its pious note Snow haze gleams like sand. >From point to point of meaning?pen? closed??BR> marked with a dark stroke from the left, encroached Come, swallows, it's good-bye. My soul lies cracked; and when, in its despair, Where does this all end? What is the vanishing Life, or only joy, that stands out Close at the end of distance the two Chose XV. The International Circumpolar Stations: The Greely Expedition A rabbit carcass in its stiffened fur. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 13639 bytes Desc: not available URL: From mst at dev.mellanox.co.il Sat Mar 31 23:43:20 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 1 Apr 2007 09:43:20 +0300 Subject: [ofa-general] Re: Incorrect max_sge reported in mthca device query In-Reply-To: <1175371057.19974.8.camel@trinity.ogc.int> References: <1175371057.19974.8.camel@trinity.ogc.int> Message-ID: <20070401064320.GX5436@mellanox.co.il> > Quoting Tom Tucker : > Subject: Incorrect max_sge reported in mthca device query > > > Roland: > > I think the max_sge reported by mthca_query_device is off by one. If you > try to create a QP with the reported max, it fails with -EINVAL. I think > the reason is that the mthca_alloc_wqe_buf function reserves a slot for > a "bind request" and this pushes the WQE size over the 496B limit when > the user requests the max (30) when allocating the QP. > > Please let me know if I'm confused about what max_sge really means. > > Thanks, > Tom Tom, max_sge reported by mthca_query_device is the upper bound for all QP types. I have not tested this, but think you can create a UD type QP with this number of SGEs. I'd like to add that there can be no hard guarantee that creating a QP with a specific set of max_sge/max_wr always succeeds even if it is within the range of values reported by mthca_query_device: for example, for userspace QPs, the system administrator might have limited the amount of memory that can be locked up by these QPs, and QP allocation requests with large max_sge/max_wr values will always fail. There are other examples of this. Thus, an application that wants to use as large a number of SGEs/WRs as possible in a robust fashion currently has no other choice except a trial and error approach, handling failures gracefully. Finally, as a side note, it is *also* inefficient to request allocation of more sge entries than ULP will typically use - for reasons such as cache utilization, and many others. How does this overhead trade-off against the need to sometimes post multiple WRs by ULP will depend both on ULP and the hardware used. This need to tune the ULP to a specific HCA is annoying, and might be something that we want to try and solve at the API level. However, max_sge/max_wr values in query device are unlikely to be the appropriate API for this. One way out could be to extend the API for create_qp and friends, passing in both min and max values for some parameters, and allowing the verbs provider to choose the optimal combination of these. I think I floated a similiar proposal once already, but there didn't appear to be sufficient user support for such a large API extension. -- MST