From vlad at lists.openfabrics.org Sat Nov 1 03:06:32 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sat, 1 Nov 2008 03:06:32 -0700 (PDT) Subject: [ofa-general] ofa_1_4_kernel 20081101-0200 daily build status Message-ID: <20081101100633.1A5A9E60D7F@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: Build failed on x86_64 with linux-2.6.25 Log: /home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.25_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c: In function 'ioremap_wc': /home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.25_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:260: error: implicit declaration of function '__ioremap' /home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.25_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:260: warning: return makes pointer from integer without a cast make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.25_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.25_x86_64_check/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.25_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.25_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.25' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.26 Log: /home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.26_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c: In function 'ioremap_wc': /home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.26_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:260: error: implicit declaration of function '__ioremap' /home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.26_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:260: warning: return makes pointer from integer without a cast make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.26_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.26_x86_64_check/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.26_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.26_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.26' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.24 Log: /home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.24_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:218: error: 'cpu_data' undeclared (first use in this function) /home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.24_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:218: error: (Each undeclared identifier is reported only once /home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.24_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:218: error: for each function it appears in.) make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.24_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.24_x86_64_check/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.24_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.24_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.24' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-42.ELsmp Log: patching file drivers/infiniband/hw/ipath/ipath_init_chip.c Hunk #1 succeeded at 529 (offset 135 lines). Hunk #2 succeeded at 537 (offset 135 lines). Hunk #3 succeeded at 848 (offset 135 lines). patching file drivers/infiniband/hw/ipath/ipath_sysfs.c patching file drivers/infiniband/hw/ipath/ipath_user_pages.c Patch ipath_0110_2.6.9.patch does not apply (enforce with -f) Failed executing /usr/bin/quilt ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-55.ELsmp Log: patching file drivers/infiniband/hw/ipath/ipath_init_chip.c Hunk #1 succeeded at 529 (offset 135 lines). Hunk #2 succeeded at 537 (offset 135 lines). Hunk #3 succeeded at 848 (offset 135 lines). patching file drivers/infiniband/hw/ipath/ipath_sysfs.c patching file drivers/infiniband/hw/ipath/ipath_user_pages.c Patch ipath_0110_2.6.9.patch does not apply (enforce with -f) Failed executing /usr/bin/quilt ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.27 Log: /home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.27_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c: In function 'ioremap_wc': /home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.27_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:260: error: implicit declaration of function '__ioremap' /home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.27_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:260: warning: return makes pointer from integer without a cast make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.27_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.27_x86_64_check/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.27_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20081101-0200_linux-2.6.27_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.27' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-78.ELsmp Log: patching file drivers/infiniband/hw/ipath/ipath_init_chip.c Hunk #1 succeeded at 529 (offset 135 lines). Hunk #2 succeeded at 537 (offset 135 lines). Hunk #3 succeeded at 848 (offset 135 lines). patching file drivers/infiniband/hw/ipath/ipath_sysfs.c patching file drivers/infiniband/hw/ipath/ipath_user_pages.c Patch ipath_0110_2.6.9.patch does not apply (enforce with -f) Failed executing /usr/bin/quilt ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-67.ELsmp Log: patching file drivers/infiniband/hw/ipath/ipath_init_chip.c Hunk #1 succeeded at 529 (offset 135 lines). Hunk #2 succeeded at 537 (offset 135 lines). Hunk #3 succeeded at 848 (offset 135 lines). patching file drivers/infiniband/hw/ipath/ipath_sysfs.c patching file drivers/infiniband/hw/ipath/ipath_user_pages.c Patch ipath_0110_2.6.9.patch does not apply (enforce with -f) Failed executing /usr/bin/quilt ---------------------------------------------------------------------------------- From rdreier at cisco.com Sat Nov 1 11:15:10 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 01 Nov 2008 11:15:10 -0700 Subject: [ofa-general] [PATCH] RDMA/cxgb3: Fix too-big reserved field zeroing in iwch_post_zb_read() Message-ID: The array wqe->read.reserved has only two entries, but iwch_post_zb_read() sets [0], [1], and [2], which is one too many. This is harmless since it runs into the next field, rem_stag, which is initialized correctly immediately after, but we might as well get things right, especially since it makes the code smaller. This was spotted by the Coverity checker (CID 2475). Signed-off-by: Roland Dreier --- I'll queue this up unless someone tells me I'm misreading things and gooofed up here... drivers/infiniband/hw/cxgb3/iwch_qp.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index 3e4585c..19661b2 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -745,7 +745,6 @@ int iwch_post_zb_read(struct iwch_qp *qhp) wqe->read.rdmaop = T3_READ_REQ; wqe->read.reserved[0] = 0; wqe->read.reserved[1] = 0; - wqe->read.reserved[2] = 0; wqe->read.rem_stag = cpu_to_be32(1); wqe->read.rem_to = cpu_to_be64(1); wqe->read.local_stag = cpu_to_be32(1); -- 1.6.0.2 From rdreier at cisco.com Sat Nov 1 11:44:13 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 01 Nov 2008 11:44:13 -0700 Subject: [ofa-general] Suspicious code in schedule_nes_timer() Message-ID: schedule_nes_timer() starts as follows. Observe a couple of things: int schedule_nes_timer(struct nes_cm_node *cm_node, struct sk_buff *skb, enum nes_timer_type type, int send_retrans, int close_when_complete) { unsigned long flags; struct nes_cm_core *cm_core = cm_node->cm_core; >>> cm_node is directly dereferenced here... struct nes_timer_entry *new_send; int ret = 0; u32 was_timer_set; if (!cm_node) return -EINVAL; >>> and then later tested for NULL... so if cm_node is NULL, then the code will oops before it hits the return -EINVAL. It seems that callers must guarantee that cm_node isn't NULL, so it would make sense to delete the "if (!cm_node)" test, right? - R. From swise at opengridcomputing.com Sat Nov 1 12:47:11 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Sat, 01 Nov 2008 14:47:11 -0500 Subject: [ofa-general] Re: [PATCH] RDMA/cxgb3: Fix too-big reserved field zeroing in iwch_post_zb_read() In-Reply-To: References: Message-ID: <490CB23F.2070709@opengridcomputing.com> Acked-by: Steve Wise From sashak at voltaire.com Sat Nov 1 12:58:07 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 1 Nov 2008 21:58:07 +0200 Subject: [ofa-general] [PATCH] Encode agent id in request transaction id. Message-ID: <20081101195807.GA12081@sashak.voltaire.com> For requests agent id will be encoded as bits 32-47 into MAD transaction id (ibsim will use now higher bits (48-63) for client id encoding). So response's agent id will be decoded from MAD and not resolved by management class value. This is in order to simulate kernel's user_mad layer behavior. Signed-off-by: Sasha Khapyorsky --- ibsim/ibsim.c | 2 +- ibsim/sim_mad.c | 2 +- umad2sim/umad2sim.c | 13 ++++++++++++- 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/ibsim/ibsim.c b/ibsim/ibsim.c index c050be1..149b6b9 100644 --- a/ibsim/ibsim.c +++ b/ibsim/ibsim.c @@ -668,7 +668,7 @@ int disconnect_client(int id) static Client *client_by_trid(Port *port, uint64_t trid) { - unsigned i = (unsigned)(trid >> 32); + unsigned i = (unsigned)(trid >> 48); if (i < IBSIM_MAX_CLIENTS && clients[i].pid && clients[i].port->portguid == port->portguid) return &clients[i]; diff --git a/ibsim/sim_mad.c b/ibsim/sim_mad.c index fbe81aa..c49f4cc 100644 --- a/ibsim/sim_mad.c +++ b/ibsim/sim_mad.c @@ -108,7 +108,7 @@ static uint64_t update_trid(uint8_t *mad, unsigned response, Client *cl) { uint64_t trid = mad_get_field64(mad, 0, IB_MAD_TRID_F); if (!response) { - trid = (trid&0xffffffffULL)|(((uint64_t)cl->id)<<32); + trid = (trid&0xffffffffffffULL)|(((uint64_t)cl->id)<<48); mad_set_field64(mad, 0, IB_MAD_TRID_F, trid); } return trid; diff --git a/umad2sim/umad2sim.c b/umad2sim/umad2sim.c index 2b37a8d..f896540 100644 --- a/umad2sim/umad2sim.c +++ b/umad2sim/umad2sim.c @@ -406,7 +406,12 @@ static ssize_t umad2sim_read(struct umad2sim_dev *dev, void *buf, size_t count) mgmt_class = 0; } - umad->agent_id = dev->agent_idx[mgmt_class]; + if (mad_get_field(req.mad, 0, IB_MAD_RESPONSE_F)) { + uint64_t trid = mad_get_field64(req.mad, 0, IB_MAD_TRID_F); + umad->agent_id = (trid >> 32) & 0xffff; + } else + umad->agent_id = dev->agent_idx[mgmt_class]; + umad->status = ntohl(req.status); umad->timeout_ms = 0; umad->retries = 0; @@ -476,6 +481,12 @@ static ssize_t umad2sim_write(struct umad2sim_dev *dev, req.length = htonll(cnt); + if (!mad_get_field(req.mad, 0, IB_MAD_RESPONSE_F)) { + uint64_t trid = mad_get_field64(req.mad, 0, IB_MAD_TRID_F); + trid = (trid&0xffff0000ffffffffULL)|(((uint64_t)umad->agent_id)<<32); + mad_set_field64(req.mad, 0, IB_MAD_TRID_F, trid); + } + cnt = write(dev->sim_client.fd_pktout, (void *)&req, sizeof(req)); if (cnt < 0) { ERROR("umad2sim_write: cannot write\n"); -- 1.6.0.3.517.g759a From vlad at lists.openfabrics.org Sun Nov 2 03:15:39 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sun, 2 Nov 2008 03:15:39 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081102-0200 daily build status Message-ID: <20081102111539.796CCE60E18@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: Build failed on x86_64 with linux-2.6.26 Log: /home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.26_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c: In function 'ioremap_wc': /home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.26_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:260: error: implicit declaration of function '__ioremap' /home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.26_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:260: warning: return makes pointer from integer without a cast make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.26_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.26_x86_64_check/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.26_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.26_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.26' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.25 Log: /home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.25_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c: In function 'ioremap_wc': /home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.25_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:260: error: implicit declaration of function '__ioremap' /home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.25_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:260: warning: return makes pointer from integer without a cast make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.25_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.25_x86_64_check/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.25_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.25_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.25' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.24 Log: /home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.24_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:218: error: 'cpu_data' undeclared (first use in this function) /home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.24_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:218: error: (Each undeclared identifier is reported only once /home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.24_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:218: error: for each function it appears in.) make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.24_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.24_x86_64_check/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.24_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.24_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.24' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-55.ELsmp Log: patching file drivers/infiniband/hw/ipath/ipath_init_chip.c Hunk #1 succeeded at 529 (offset 135 lines). Hunk #2 succeeded at 537 (offset 135 lines). Hunk #3 succeeded at 848 (offset 135 lines). patching file drivers/infiniband/hw/ipath/ipath_sysfs.c patching file drivers/infiniband/hw/ipath/ipath_user_pages.c Patch ipath_0110_2.6.9.patch does not apply (enforce with -f) Failed executing /usr/bin/quilt ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-42.ELsmp Log: patching file drivers/infiniband/hw/ipath/ipath_init_chip.c Hunk #1 succeeded at 529 (offset 135 lines). Hunk #2 succeeded at 537 (offset 135 lines). Hunk #3 succeeded at 848 (offset 135 lines). patching file drivers/infiniband/hw/ipath/ipath_sysfs.c patching file drivers/infiniband/hw/ipath/ipath_user_pages.c Patch ipath_0110_2.6.9.patch does not apply (enforce with -f) Failed executing /usr/bin/quilt ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.27 Log: /home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.27_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c: In function 'ioremap_wc': /home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.27_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:260: error: implicit declaration of function '__ioremap' /home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.27_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.c:260: warning: return makes pointer from integer without a cast make[4]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.27_x86_64_check/drivers/infiniband/hw/ipath/ipath_wc_pat.o] Error 1 make[3]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.27_x86_64_check/drivers/infiniband/hw/ipath] Error 2 make[2]: *** [/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.27_x86_64_check/drivers/infiniband] Error 2 make[1]: *** [_module_/home/vlad/tmp/ofa_1_4_kernel-20081102-0200_linux-2.6.27_x86_64_check] Error 2 make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.27' make: *** [kernel] Error 2 ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-78.ELsmp Log: patching file drivers/infiniband/hw/ipath/ipath_init_chip.c Hunk #1 succeeded at 529 (offset 135 lines). Hunk #2 succeeded at 537 (offset 135 lines). Hunk #3 succeeded at 848 (offset 135 lines). patching file drivers/infiniband/hw/ipath/ipath_sysfs.c patching file drivers/infiniband/hw/ipath/ipath_user_pages.c Patch ipath_0110_2.6.9.patch does not apply (enforce with -f) Failed executing /usr/bin/quilt ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-67.ELsmp Log: patching file drivers/infiniband/hw/ipath/ipath_init_chip.c Hunk #1 succeeded at 529 (offset 135 lines). Hunk #2 succeeded at 537 (offset 135 lines). Hunk #3 succeeded at 848 (offset 135 lines). patching file drivers/infiniband/hw/ipath/ipath_sysfs.c patching file drivers/infiniband/hw/ipath/ipath_user_pages.c Patch ipath_0110_2.6.9.patch does not apply (enforce with -f) Failed executing /usr/bin/quilt ---------------------------------------------------------------------------------- From jgarzik at pobox.com Sun Nov 2 05:20:26 2008 From: jgarzik at pobox.com (Jeff Garzik) Date: Sun, 02 Nov 2008 08:20:26 -0500 Subject: [ofa-general][PATCH 1/3]mlx4: Multiple completion vectors support In-Reply-To: References: <4907348E.7060508@mellanox.co.il> <490A8FA9.7080802@pobox.com> Message-ID: <490DA91A.1030703@pobox.com> Roland Dreier wrote: > > Roland, OK for me to put merge this via net-next (the standard avenue > > for drivers/net patches during -rc)? > > Actually please let me review this and merge it through my tree, since > it has a bigger impact on the IB side of mlx4. It seems most appropriate to get an Acked-by from you, and merge through me tree, IMO. While it clearly has IB impact, most of the changes are in the lower-level mlx4_en driver. But if you feel strongly... Jeff From ogerlitz at voltaire.com Sun Nov 2 06:55:58 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 02 Nov 2008 16:55:58 +0200 Subject: [ofa-general][PATCH 1/3]mlx4: Multiple completion vectors support In-Reply-To: <490DA91A.1030703@pobox.com> References: <4907348E.7060508@mellanox.co.il> <490A8FA9.7080802@pobox.com> <490DA91A.1030703@pobox.com> Message-ID: <490DBF7E.70506@voltaire.com> Jeff Garzik wrote: > Roland Dreier wrote: >> Actually please let me review this and merge it through my tree, >> since it has a bigger impact on the IB side of mlx4. > It seems most appropriate to get an Acked-by from you, and merge > through me tree, IMO. While it clearly has IB impact, most of the > changes are in the lower-level mlx4_en driver. Hi Jeff, As of the importance and influence on this patch set on the IB stack, I believe the correct way to go would be to let Roland manage the review and integration as he has both (net, rdma and actually in the future also the storage stack would use this driver...) views in mind. Or. From monis at Voltaire.COM Sun Nov 2 07:44:37 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Sun, 02 Nov 2008 17:44:37 +0200 Subject: [ofa-general] [PATCH] ipoib: Fix loss of connectivity after bonding failover on both sides In-Reply-To: <490B448C.5080306@Voltaire.COM> References: <490B448C.5080306@Voltaire.COM> Message-ID: <490DCAE5.6010608@Voltaire.COM> Yossi Etigin wrote: > Fix bonding failover in the case poth peers have failover and gratuitous > arp > is lost. The patch was tested and seems to fix the problem To reproduce with a simulation of a lost gratuitous Host A pings constantly Host B. Both hosts with bonding interface (ib0 and ib1 as slaves) Host B: ifconfig ib0 down Host B: ifconfig ib1 down Host A: ifconfig ib0 down Host A: ifconfig ib1 down Host B: ifconfig ib0 up Host B: ifconfig ib1 up Host A: ifconfig ib0 up Host A: ifconfig ib1 up Now, even when all interfaces are up and functioning, ping is not being replied. From kliteyn at dev.mellanox.co.il Sun Nov 2 07:54:47 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 02 Nov 2008 17:54:47 +0200 Subject: [ofa-general] [PATCH] opensm/osm_ucast_cache.c: fixing wrong memset size Message-ID: <490DCD47.3000303@dev.mellanox.co.il> Fixing wrong memset size in osm_ucast_cache.c Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_ucast_cache.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/opensm/opensm/osm_ucast_cache.c b/opensm/opensm/osm_ucast_cache.c index cfbc49a..9db8d59 100644 --- a/opensm/opensm/osm_ucast_cache.c +++ b/opensm/opensm/osm_ucast_cache.c @@ -118,7 +118,8 @@ static cache_switch_t *__cache_sw_new(uint16_t lid_ho) return NULL; } - memset(p_cache_sw->ports, 0, sizeof(*p_cache_sw->ports)); + memset(p_cache_sw->ports, 0, + sizeof(cache_port_t) * (CACHE_SW_PORTS + 1)); p_cache_sw->num_ports = CACHE_SW_PORTS + 1; /* port[0] fields represent this switch details - lid and type */ -- 1.5.1.4 From rdreier at cisco.com Sun Nov 2 08:03:46 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 02 Nov 2008 08:03:46 -0800 Subject: [ofa-general][PATCH 1/3]mlx4: Multiple completion vectors support In-Reply-To: <490DA91A.1030703@pobox.com> (Jeff Garzik's message of "Sun, 02 Nov 2008 08:20:26 -0500") References: <4907348E.7060508@mellanox.co.il> <490A8FA9.7080802@pobox.com> <490DA91A.1030703@pobox.com> Message-ID: > It seems most appropriate to get an Acked-by from you, and merge > through me tree, IMO. While it clearly has IB impact, most of the > changes are in the lower-level mlx4_en driver. Actually most of the changes are in mlx4_core, which is the common HW driver that both mlx4_en and mlx4_ib use, and which I've been maintaining up till now: > drivers/infiniband/hw/mlx4/cq.c | 2 +- > drivers/infiniband/hw/mlx4/main.c | 2 +- > drivers/net/mlx4/cq.c | 14 ++++++++-- > drivers/net/mlx4/en_cq.c | 9 ++++-- > drivers/net/mlx4/en_main.c | 4 +- > drivers/net/mlx4/eq.c | 47 ++++++++++++++++++++++++------------ > drivers/net/mlx4/main.c | 14 ++++++---- > drivers/net/mlx4/mlx4.h | 4 +- > include/linux/mlx4/device.h | 4 ++- Not that it's a huge change anywhere, but only the mlx4_en changes are in en_cq.c and en_main.c, ie 13 out of 100 changed lines. In general I think I have a bigger chance of merging more mlx4_core stuff through my tree, so it will probably be smoother in terms of conflicts etc. if I carry this patch. - R. From jgarzik at pobox.com Sun Nov 2 08:17:00 2008 From: jgarzik at pobox.com (Jeff Garzik) Date: Sun, 02 Nov 2008 11:17:00 -0500 Subject: [ofa-general][PATCH 1/3]mlx4: Multiple completion vectors support In-Reply-To: References: <4907348E.7060508@mellanox.co.il> <490A8FA9.7080802@pobox.com> <490DA91A.1030703@pobox.com> Message-ID: <490DD27C.4070109@pobox.com> Roland Dreier wrote: > In general I think I have a bigger chance of merging more mlx4_core > stuff through my tree, so it will probably be smoother in terms of > conflicts etc. if I carry this patch. Fine by me... Jeff From sashak at voltaire.com Sun Nov 2 10:16:51 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 2 Nov 2008 20:16:51 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_ucast_cache.c: fixing wrong memset size In-Reply-To: <490DCD47.3000303@dev.mellanox.co.il> References: <490DCD47.3000303@dev.mellanox.co.il> Message-ID: <20081102181651.GP7502@sashak.voltaire.com> Hi Yevgeny, On 17:54 Sun 02 Nov , Yevgeny Kliteynik wrote: > Fixing wrong memset size in osm_ucast_cache.c > > Signed-off-by: Yevgeny Kliteynik > --- > opensm/opensm/osm_ucast_cache.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/opensm/opensm/osm_ucast_cache.c b/opensm/opensm/osm_ucast_cache.c > index cfbc49a..9db8d59 100644 > --- a/opensm/opensm/osm_ucast_cache.c > +++ b/opensm/opensm/osm_ucast_cache.c > @@ -118,7 +118,8 @@ static cache_switch_t *__cache_sw_new(uint16_t lid_ho) > return NULL; > } > > - memset(p_cache_sw->ports, 0, sizeof(*p_cache_sw->ports)); > + memset(p_cache_sw->ports, 0, > + sizeof(cache_port_t) * (CACHE_SW_PORTS + 1)); > p_cache_sw->num_ports = CACHE_SW_PORTS + 1; > > /* port[0] fields represent this switch details - lid and type */ Then you obviously will need also to fix similar things (memset() and memcpy() sizes) in __cache_add_port() function where ports array is reallocated. So why to not make it simpler, just in single alloc following *known* switch's port numbers? Like below. If it is fine for you I will push it out. Sasha >From c7e9e41cdea3164a07f9cbf47f68a8836f096524 Mon Sep 17 00:00:00 2001 From: Sasha Khapyorsky Date: Sun, 2 Nov 2008 20:02:37 +0200 Subject: [PATCH] opensm/osm_ucase_cache: simplify cached links allocation code Simplify cached links allocation code, fix related memset(), memcpy() bugs. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_ucast_cache.c | 101 ++++++++++++--------------------------- 1 files changed, 31 insertions(+), 70 deletions(-) diff --git a/opensm/opensm/osm_ucast_cache.c b/opensm/opensm/osm_ucast_cache.c index cfbc49a..b142a14 100644 --- a/opensm/opensm/osm_ucast_cache.c +++ b/opensm/opensm/osm_ucast_cache.c @@ -70,11 +70,11 @@ typedef struct cache_switch { cl_map_item_t map_item; boolean_t dropped; uint16_t max_lid_ho; - uint8_t num_ports; - cache_port_t *ports; uint16_t num_hops; uint8_t **hops; uint8_t *lft; + uint8_t num_ports; + cache_port_t ports[0]; } cache_switch_t; /********************************************************************** @@ -104,22 +104,17 @@ static void __cache_sw_set_leaf(cache_switch_t * p_sw) /********************************************************************** **********************************************************************/ -static cache_switch_t *__cache_sw_new(uint16_t lid_ho) +static cache_switch_t *__cache_sw_new(uint16_t lid_ho, unsigned num_ports) { - cache_switch_t *p_cache_sw = malloc(sizeof(cache_switch_t)); + cache_switch_t *p_cache_sw = malloc(sizeof(cache_switch_t) + + num_ports * sizeof(cache_port_t)); if (!p_cache_sw) return NULL; - memset(p_cache_sw, 0, sizeof(*p_cache_sw)); + memset(p_cache_sw, 0, + sizeof(*p_cache_sw) + num_ports * sizeof(cache_port_t)); - p_cache_sw->ports = malloc(sizeof(cache_port_t) * (CACHE_SW_PORTS + 1)); - if (!p_cache_sw->ports) { - free(p_cache_sw); - return NULL; - } - - memset(p_cache_sw->ports, 0, sizeof(*p_cache_sw->ports)); - p_cache_sw->num_ports = CACHE_SW_PORTS + 1; + p_cache_sw->num_ports = num_ports; /* port[0] fields represent this switch details - lid and type */ p_cache_sw->ports[0].remote_lid_ho = lid_ho; @@ -161,79 +156,48 @@ static cache_switch_t *__cache_get_sw(osm_ucast_mgr_t * p_mgr, uint16_t lid_ho) /********************************************************************** **********************************************************************/ - -static cache_switch_t *__cache_get_or_add_sw(osm_ucast_mgr_t * p_mgr, - uint16_t lid_ho) -{ - cache_switch_t *p_cache_sw = __cache_get_sw(p_mgr, lid_ho); - if (!p_cache_sw) { - p_cache_sw = __cache_sw_new(lid_ho); - if (p_cache_sw) - cl_qmap_insert(&p_mgr->cache_sw_tbl, lid_ho, - &p_cache_sw->map_item); - } - return p_cache_sw; -} - -/********************************************************************** - **********************************************************************/ - -static void __cache_add_port(osm_ucast_mgr_t * p_mgr, uint16_t lid_ho, - uint8_t port_num, uint16_t remote_lid_ho, - boolean_t is_ca) +static void __cache_add_sw_link(osm_ucast_mgr_t * p_mgr, osm_physp_t *p, + uint16_t remote_lid_ho, boolean_t is_ca) { cache_switch_t *p_cache_sw; + uint16_t lid_ho = cl_ntoh16(osm_node_get_base_lid(p->p_node, 0)); OSM_LOG_ENTER(p_mgr->p_log); - if (!lid_ho || !remote_lid_ho || !port_num) + if (!lid_ho || !remote_lid_ho || !p->port_num) goto Exit; OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, "Caching switch port: lid %u [port %u] -> lid %u (%s)\n", - lid_ho, port_num, remote_lid_ho, (is_ca) ? "CA/RTR" : "SW"); + lid_ho, p->port_num, remote_lid_ho, (is_ca) ? "CA/RTR" : "SW"); - p_cache_sw = __cache_get_or_add_sw(p_mgr, lid_ho); + p_cache_sw = __cache_get_sw(p_mgr, lid_ho); if (!p_cache_sw) { - OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR, - "ERR AD01: Out of memory - cache is invalid\n"); - osm_ucast_cache_invalidate(p_mgr); - goto Exit; - } - - if (port_num >= p_cache_sw->num_ports) { - /* calculate new size of ports array, rounded - up to a multiple of CACHE_SW_PORTS */ - uint8_t new_size = CACHE_SW_PORTS * - ((port_num + CACHE_SW_PORTS) / CACHE_SW_PORTS); - cache_port_t *ports = - malloc(sizeof(cache_port_t) * (new_size + 1)); - if (!ports) { + p_cache_sw = __cache_sw_new(lid_ho, p->p_node->sw->num_ports); + if (!p_cache_sw) { OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR, - "ERR AD02: Out of memory - cache is invalid\n"); + "ERR AD01: Out of memory - cache is invalid\n"); osm_ucast_cache_invalidate(p_mgr); goto Exit; } + cl_qmap_insert(&p_mgr->cache_sw_tbl, lid_ho, + &p_cache_sw->map_item); + } - memset(ports, 0, sizeof(*ports)); - - if (p_cache_sw->ports) { - memcpy(ports, p_cache_sw->ports, - sizeof(*p_cache_sw->ports)); - free(p_cache_sw->ports); - } - - p_cache_sw->ports = ports; - p_cache_sw->num_ports = new_size + 1; + if (p->port_num >= p_cache_sw->num_ports) { + OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR, + "ERR AD02: Wrong switch? - cache is invalid\n"); + osm_ucast_cache_invalidate(p_mgr); + goto Exit; } if (is_ca) __cache_sw_set_leaf(p_cache_sw); - if (p_cache_sw->ports[port_num].remote_lid_ho == 0) { + if (p_cache_sw->ports[p->port_num].remote_lid_ho == 0) { /* cache this link only if it hasn't been already cached */ - p_cache_sw->ports[port_num].remote_lid_ho = remote_lid_ho; - p_cache_sw->ports[port_num].is_leaf = is_ca; + p_cache_sw->ports[p->port_num].remote_lid_ho = remote_lid_ho; + p_cache_sw->ports[p->port_num].is_leaf = is_ca; } Exit: OSM_LOG_EXIT(p_mgr->p_log); @@ -962,16 +926,13 @@ void osm_ucast_cache_add_link(osm_ucast_mgr_t * p_mgr, lid_ho_2 = cl_ntoh16(osm_node_get_base_lid(p_node_2, 0)); /* lost switch-2-switch link - cache both sides */ - __cache_add_port(p_mgr, lid_ho_1, p_physp1->port_num, - lid_ho_2, FALSE); - __cache_add_port(p_mgr, lid_ho_2, p_physp2->port_num, - lid_ho_1, FALSE); + __cache_add_sw_link(p_mgr, p_physp1, lid_ho_2, FALSE); + __cache_add_sw_link(p_mgr, p_physp2, lid_ho_1, FALSE); } else { lid_ho_2 = cl_ntoh16(osm_physp_get_base_lid(p_physp2)); /* lost link to CA/RTR - cache only switch side */ - __cache_add_port(p_mgr, lid_ho_1, p_physp1->port_num, - lid_ho_2, TRUE); + __cache_add_sw_link(p_mgr, p_physp1, lid_ho_2, TRUE); } Exit: -- 1.6.0.3.517.g759a From kliteyn at dev.mellanox.co.il Sun Nov 2 12:59:12 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 02 Nov 2008 22:59:12 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_ucast_cache.c: fixing wrong memset size In-Reply-To: <20081102181651.GP7502@sashak.voltaire.com> References: <490DCD47.3000303@dev.mellanox.co.il> <20081102181651.GP7502@sashak.voltaire.com> Message-ID: <490E14A0.8080105@dev.mellanox.co.il> Sasha Khapyorsky wrote: > Hi Yevgeny, > > On 17:54 Sun 02 Nov , Yevgeny Kliteynik wrote: >> Fixing wrong memset size in osm_ucast_cache.c >> >> Signed-off-by: Yevgeny Kliteynik >> --- >> opensm/opensm/osm_ucast_cache.c | 3 ++- >> 1 files changed, 2 insertions(+), 1 deletions(-) >> >> diff --git a/opensm/opensm/osm_ucast_cache.c b/opensm/opensm/osm_ucast_cache.c >> index cfbc49a..9db8d59 100644 >> --- a/opensm/opensm/osm_ucast_cache.c >> +++ b/opensm/opensm/osm_ucast_cache.c >> @@ -118,7 +118,8 @@ static cache_switch_t *__cache_sw_new(uint16_t lid_ho) >> return NULL; >> } >> >> - memset(p_cache_sw->ports, 0, sizeof(*p_cache_sw->ports)); >> + memset(p_cache_sw->ports, 0, >> + sizeof(cache_port_t) * (CACHE_SW_PORTS + 1)); >> p_cache_sw->num_ports = CACHE_SW_PORTS + 1; >> >> /* port[0] fields represent this switch details - lid and type */ > > Then you obviously will need also to fix similar things (memset() and > memcpy() sizes) in __cache_add_port() function where ports array is > reallocated. > > So why to not make it simpler, just in single alloc following *known* > switch's port numbers? Like below. > > If it is fine for you I will push it out. Sure, this one is better. Please apply. -- Yevgeny > Sasha > > >>From c7e9e41cdea3164a07f9cbf47f68a8836f096524 Mon Sep 17 00:00:00 2001 > From: Sasha Khapyorsky > Date: Sun, 2 Nov 2008 20:02:37 +0200 > Subject: [PATCH] opensm/osm_ucase_cache: simplify cached links allocation code > > Simplify cached links allocation code, fix related memset(), memcpy() > bugs. > > Signed-off-by: Sasha Khapyorsky > --- > opensm/opensm/osm_ucast_cache.c | 101 ++++++++++++--------------------------- > 1 files changed, 31 insertions(+), 70 deletions(-) > > diff --git a/opensm/opensm/osm_ucast_cache.c b/opensm/opensm/osm_ucast_cache.c > index cfbc49a..b142a14 100644 > --- a/opensm/opensm/osm_ucast_cache.c > +++ b/opensm/opensm/osm_ucast_cache.c > @@ -70,11 +70,11 @@ typedef struct cache_switch { > cl_map_item_t map_item; > boolean_t dropped; > uint16_t max_lid_ho; > - uint8_t num_ports; > - cache_port_t *ports; > uint16_t num_hops; > uint8_t **hops; > uint8_t *lft; > + uint8_t num_ports; > + cache_port_t ports[0]; > } cache_switch_t; > > /********************************************************************** > @@ -104,22 +104,17 @@ static void __cache_sw_set_leaf(cache_switch_t * p_sw) > /********************************************************************** > **********************************************************************/ > > -static cache_switch_t *__cache_sw_new(uint16_t lid_ho) > +static cache_switch_t *__cache_sw_new(uint16_t lid_ho, unsigned num_ports) > { > - cache_switch_t *p_cache_sw = malloc(sizeof(cache_switch_t)); > + cache_switch_t *p_cache_sw = malloc(sizeof(cache_switch_t) + > + num_ports * sizeof(cache_port_t)); > if (!p_cache_sw) > return NULL; > > - memset(p_cache_sw, 0, sizeof(*p_cache_sw)); > + memset(p_cache_sw, 0, > + sizeof(*p_cache_sw) + num_ports * sizeof(cache_port_t)); > > - p_cache_sw->ports = malloc(sizeof(cache_port_t) * (CACHE_SW_PORTS + 1)); > - if (!p_cache_sw->ports) { > - free(p_cache_sw); > - return NULL; > - } > - > - memset(p_cache_sw->ports, 0, sizeof(*p_cache_sw->ports)); > - p_cache_sw->num_ports = CACHE_SW_PORTS + 1; > + p_cache_sw->num_ports = num_ports; > > /* port[0] fields represent this switch details - lid and type */ > p_cache_sw->ports[0].remote_lid_ho = lid_ho; > @@ -161,79 +156,48 @@ static cache_switch_t *__cache_get_sw(osm_ucast_mgr_t * p_mgr, uint16_t lid_ho) > > /********************************************************************** > **********************************************************************/ > - > -static cache_switch_t *__cache_get_or_add_sw(osm_ucast_mgr_t * p_mgr, > - uint16_t lid_ho) > -{ > - cache_switch_t *p_cache_sw = __cache_get_sw(p_mgr, lid_ho); > - if (!p_cache_sw) { > - p_cache_sw = __cache_sw_new(lid_ho); > - if (p_cache_sw) > - cl_qmap_insert(&p_mgr->cache_sw_tbl, lid_ho, > - &p_cache_sw->map_item); > - } > - return p_cache_sw; > -} > - > -/********************************************************************** > - **********************************************************************/ > - > -static void __cache_add_port(osm_ucast_mgr_t * p_mgr, uint16_t lid_ho, > - uint8_t port_num, uint16_t remote_lid_ho, > - boolean_t is_ca) > +static void __cache_add_sw_link(osm_ucast_mgr_t * p_mgr, osm_physp_t *p, > + uint16_t remote_lid_ho, boolean_t is_ca) > { > cache_switch_t *p_cache_sw; > + uint16_t lid_ho = cl_ntoh16(osm_node_get_base_lid(p->p_node, 0)); > > OSM_LOG_ENTER(p_mgr->p_log); > > - if (!lid_ho || !remote_lid_ho || !port_num) > + if (!lid_ho || !remote_lid_ho || !p->port_num) > goto Exit; > > OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, > "Caching switch port: lid %u [port %u] -> lid %u (%s)\n", > - lid_ho, port_num, remote_lid_ho, (is_ca) ? "CA/RTR" : "SW"); > + lid_ho, p->port_num, remote_lid_ho, (is_ca) ? "CA/RTR" : "SW"); > > - p_cache_sw = __cache_get_or_add_sw(p_mgr, lid_ho); > + p_cache_sw = __cache_get_sw(p_mgr, lid_ho); > if (!p_cache_sw) { > - OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR, > - "ERR AD01: Out of memory - cache is invalid\n"); > - osm_ucast_cache_invalidate(p_mgr); > - goto Exit; > - } > - > - if (port_num >= p_cache_sw->num_ports) { > - /* calculate new size of ports array, rounded > - up to a multiple of CACHE_SW_PORTS */ > - uint8_t new_size = CACHE_SW_PORTS * > - ((port_num + CACHE_SW_PORTS) / CACHE_SW_PORTS); > - cache_port_t *ports = > - malloc(sizeof(cache_port_t) * (new_size + 1)); > - if (!ports) { > + p_cache_sw = __cache_sw_new(lid_ho, p->p_node->sw->num_ports); > + if (!p_cache_sw) { > OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR, > - "ERR AD02: Out of memory - cache is invalid\n"); > + "ERR AD01: Out of memory - cache is invalid\n"); > osm_ucast_cache_invalidate(p_mgr); > goto Exit; > } > + cl_qmap_insert(&p_mgr->cache_sw_tbl, lid_ho, > + &p_cache_sw->map_item); > + } > > - memset(ports, 0, sizeof(*ports)); > - > - if (p_cache_sw->ports) { > - memcpy(ports, p_cache_sw->ports, > - sizeof(*p_cache_sw->ports)); > - free(p_cache_sw->ports); > - } > - > - p_cache_sw->ports = ports; > - p_cache_sw->num_ports = new_size + 1; > + if (p->port_num >= p_cache_sw->num_ports) { > + OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR, > + "ERR AD02: Wrong switch? - cache is invalid\n"); > + osm_ucast_cache_invalidate(p_mgr); > + goto Exit; > } > > if (is_ca) > __cache_sw_set_leaf(p_cache_sw); > > - if (p_cache_sw->ports[port_num].remote_lid_ho == 0) { > + if (p_cache_sw->ports[p->port_num].remote_lid_ho == 0) { > /* cache this link only if it hasn't been already cached */ > - p_cache_sw->ports[port_num].remote_lid_ho = remote_lid_ho; > - p_cache_sw->ports[port_num].is_leaf = is_ca; > + p_cache_sw->ports[p->port_num].remote_lid_ho = remote_lid_ho; > + p_cache_sw->ports[p->port_num].is_leaf = is_ca; > } > Exit: > OSM_LOG_EXIT(p_mgr->p_log); > @@ -962,16 +926,13 @@ void osm_ucast_cache_add_link(osm_ucast_mgr_t * p_mgr, > lid_ho_2 = cl_ntoh16(osm_node_get_base_lid(p_node_2, 0)); > > /* lost switch-2-switch link - cache both sides */ > - __cache_add_port(p_mgr, lid_ho_1, p_physp1->port_num, > - lid_ho_2, FALSE); > - __cache_add_port(p_mgr, lid_ho_2, p_physp2->port_num, > - lid_ho_1, FALSE); > + __cache_add_sw_link(p_mgr, p_physp1, lid_ho_2, FALSE); > + __cache_add_sw_link(p_mgr, p_physp2, lid_ho_1, FALSE); > } else { > lid_ho_2 = cl_ntoh16(osm_physp_get_base_lid(p_physp2)); > > /* lost link to CA/RTR - cache only switch side */ > - __cache_add_port(p_mgr, lid_ho_1, p_physp1->port_num, > - lid_ho_2, TRUE); > + __cache_add_sw_link(p_mgr, p_physp1, lid_ho_2, TRUE); > } > > Exit: From rdreier at cisco.com Sun Nov 2 21:34:24 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 02 Nov 2008 21:34:24 -0800 Subject: [ofa-general] Re: [PATCH 07/10] rdma/nes: reindent mis-indented spinlocks In-Reply-To: ("Ilpo =?utf-8?Q?J=C3=A4rvinen=22's?= message of "Thu, 30 Oct 2008 13:39:43 +0200 (EET)") References: Message-ID: thanks, applied From rdreier at cisco.com Sun Nov 2 21:41:17 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 02 Nov 2008 21:41:17 -0800 Subject: [ofa-general] Re: [PATCH v3] RDMA/nes: Mitigate compatibility issue regarding PCI write credits In-Reply-To: <20081031183943.GA7376@ctung-MOBL> (Chien Tung's message of "Fri, 31 Oct 2008 13:39:43 -0500") References: <20081031183943.GA7376@ctung-MOBL> Message-ID: thanks, applied all three. From rdreier at cisco.com Sun Nov 2 21:47:49 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 02 Nov 2008 21:47:49 -0800 Subject: [ofa-general] Re: [PATCH] ipoib: fix hang in ipoib_flush_paths In-Reply-To: <490B0040.3040802@Voltaire.COM> (Yossi Etigin's message of "Fri, 31 Oct 2008 14:55:28 +0200") References: <490B0040.3040802@Voltaire.COM> Message-ID: thanks, applied. From jackm at dev.mellanox.co.il Mon Nov 3 01:39:27 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Mon, 3 Nov 2008 11:39:27 +0200 Subject: [ofa-general] poll CQ failed -2 with connectX In-Reply-To: <200810271838.48510.ricklist@microway.com> References: <200810271838.48510.ricklist@microway.com> Message-ID: <200811031139.28122.jackm@dev.mellanox.co.il> Rick, Your problem was that you had a SUSE-packaged ofed-driver set (named ofed-kmp-default) installed on all your machines (maybe automatically part of the OpenSuse install?): For example, on one of your hosts, I ran #> rpm -qi ofed-kmp-default Name : ofed-kmp-default Relocations: (not relocatable) Version : 1.2.5_2.6.22.18_0.2 Vendor: SUSE LINUX Products GmbH, Nuernberg, Germany Release : 18.1 Build Date: Mon Jun 9 12:42:40 2008 Install Date: Wed Jul 30 18:26:56 2008 Build Host: kalman.suse.de Group : System/Base Source RPM: ofed-1.2.5-18.1.src.rpm Size : 3359904 License: GPL v2 or later Signature : DSA/SHA1, Mon Jun 9 12:47:02 2008, Key ID a84edae89c800aca Packager : http://bugs.opensuse.org URL : http://www.openfabrics.org Summary : Infiniband Kernel Modules The SUSE-rpm driver set is based on OFED 1.2.5. This RPM installs the OFED drivers under directory /lib/modules/ Hi all, > > I am configuring an opteron cluster with connectX Infiniband. I have a > problem that if I run one of the NAS tests, it works the first, and maybe 2nd > time, but after that the jobs instantly fail with messages like this- > > [Rank 44][cm.c: line 860]poll CQ failed -2 > [Rank 51][cm.c: line 860]poll CQ failed -2 > [Rank 119][cm.c: line 860]poll CQ failed -2 > [Rank 85][cm.c: line 860]poll CQ failed -2 > [Rank 0][cm.c: line 860]poll CQ failed -2 > [Rank 9][cm.c: line 860]poll CQ failed -2 > [Rank 26][cm.c: line 860]poll CQ failed -2[Rank 43][cm.c: line 860] > poll CQ failed -2 > [Rank 94][cm.c: line 860]poll CQ failed -2 > [Rank 111][cm.c: line 860]poll CQ failed -2 > > I can easily reproduce this with only 2 systems using a 16 process LU job, > class B. > > Here are the configs I've tried- > Suse 11 with distro provided IB driver and libraries,etc, using mvapich as > provided by ohio state > Suse 11 with distro driver, using OFED 1.3.1 libraries and mvapich > Suse 10.3 with OFED 1.3.1, OFED 1.2.5.4, and OFED 1.4rc3 > > They all have the same basic problem. I think one of them reported "Error > polling CQ" instead of "poll CQ failed". > > If I replace the connectX cards with regular DDR cards the problem goes away. > > I'm getting quite stumped at this point and would appreciate any suggestions > or patches. > > Thanks, > Rick From dimitar.dimitrov at markit.com Mon Nov 3 02:54:40 2008 From: dimitar.dimitrov at markit.com (Dimitar Dimitrov) Date: Mon, 03 Nov 2008 11:54:40 +0100 Subject: [ofa-general] Mellanox OFED package vs. RedHat IB packages Message-ID: <1225709680.15893.9.camel@wks-ubuntu.ops.marketxs.com> Hello, As a beginner with InfiniBand technology I would like to ask the following: Would there be any advantage in using the Mellanox provided OFED packages over the already supplied ones from the RedHat repository? We are using "Mellanox Technologies MT25208 InfiniHost III Ex" adapters. I noticed a slight difference in the name of some of the applications provided, but would expect the functionality to be the same. So which one "is better"? -- Regards, Dimitar The content of this e-mail is confidential and may be privileged. It may be read, copied and used only by the intended recipient and may not be disclosed, copied or distributed. If you received this email in error, please contact the sender immediately by return e-mail or by telephoning +44 20 7260 2000, delete it and do not disclose its contents to any person. You should take full responsibility for checking this email for viruses. Markit reserves the right to monitor all e-mail communications through its network. Markit and its affiliated companies make no warranty as to the accuracy or completeness of any information contained in this message and hereby exclude any liability of any kind for the information contained herein. Any opinions expressed in this message are those of the author and do not necessarily reflect the opinions of Markit. For full details about Markit, its offerings and legal terms and conditions, please see Markit’s website at http://www.markit.com . From vlad at lists.openfabrics.org Mon Nov 3 03:22:40 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Mon, 3 Nov 2008 03:22:40 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081103-0200 daily build status Message-ID: <20081103112240.D0743E60E77@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From constantine.gavrilov at gmail.com Mon Nov 3 03:09:17 2008 From: constantine.gavrilov at gmail.com (Constantine Gavrilov) Date: Mon, 03 Nov 2008 13:09:17 +0200 Subject: [ofa-general] patch: support long (above 14 bytes) HW addresses in arp_ioctl Message-ID: <490EDBDD.1030104@gmail.com> While working with OFED infiniband stack that uses 20 byte long HW addresses for IP over IB, I have paid attention to the following arp_ioctl problem. The ioctl uses a data structure that limits a length of HW address to 14 bytes. The IP stack and the arp cache code do not have that limitation. This leads to the following problems: * arp_ioctl cannot be used to set, get, or delete arp entries for those adapters that have HW addresses longer than 14 bytes * arp_ioctl will corrupt the kernel and user memory when this ioctl is used on the adapters that have HW addresses longer that 14 bytes. This is because when copying the HW address, the arp_ioctl code copies dev->addr_len bytes without checking that addr_len is not above 14 bytes. This is done both for copy_to_user() and memcpy() calls on kernel data structures allocated on stack. The memcpy() call in particular, will corrupt kernel stack. Attached please find the patch that fixes both problems. In addition, the patch changes the maximal number of bytes for HW address that will be seen in /proc/net/arp from ~10 to ~30. Without the last change, output of /proc/net/arp truncates the the large MAC entries, which makes the arp utility useless. The patch does not change the existing ABI but extends it. The kernel structure used in arp_ioctl calls is changed to support larger addresses, while the user-space structure is extended by appending extra-space to the end of the structure if ATF_NEWARPCTL -- a new flag -- is set in arp_flags of existing user-space structure. This allows avoiding big changes to the existing code while preserving the ABI compatibility. -- ---------------------------------------- Constantine Gavrilov Kernel Developer Platform Group XIV, an IBM global brand 1 Azrieli Center, Tel-Aviv Phone: +972-3-6074672 Fax: +972-3-6959749 ---------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: arp_ioctl.patch Type: text/x-patch Size: 5244 bytes Desc: not available URL: From vlad at dev.mellanox.co.il Mon Nov 3 06:56:21 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 03 Nov 2008 16:56:21 +0200 Subject: [ofa-general] [ewg] OFED meeting agenda for today (Nov 3) Message-ID: <490F1115.9040705@dev.mellanox.co.il> Agenda for OFED meeting today on OFED 1.4 status: 1. OFED 1.4 status: - Updated MPI packages: mvapich-1.1.0-3103.src.rpm, mvapich2-trunk-3103.src.rpm - Close RC4 date (originally planned to Nov 4) 2. Bugs review: Id Sev Pri OS Assignee Status Summary 1221 major P2 SLES 10 Jeffrey.C.Becker at nasa.gov NEW SLES10 sp2: remote logins via ssh fail due to rpcbind and automounter failures 1298 major P3 RHEL 5 Jeffrey.C.Becker at nasa.gov NEW nfsrdma rh5.1 causes kernel panic 1299 major P3 RHEL 5 Jeffrey.C.Becker at nasa.gov NEW nfs module is missing symbols in rh5.1 1283 blocker P1 RHEL 5 jeremy.brown at qlogic.com NEW Intel MPI fails on Qlogc HCA 1326 blocker P1 RHEL 4 jeremy.brown at qlogic.com NEW ipath driver fails to build on IA64 in the 10/28/08 daily build 1335 major P3 Other monis at voltaire.com NEW Bonding: packet lost during failover 1301 major P3 RHEL 4 olgas at voltaire.com NEW Can not load rds module on RH4 up7 1323 blocker P1 All stefan.roscher at de.ibm.com REOPENED IB/ehca: possibillity of kernel panic under certain circumstances 1242 critical P2 RHEL 4 yannick.cote at qlogic.com NEW kernel panic while running mpi2007 against ofed1.4 -- ib_ipath: ipath_sdma_verbs_send 1336 critical P1 RHEL 5 bugzilla at openib.org NEW Can't to unloading the mlx4_ib module on ppc64 Regards, Vladimir From kliteyn at dev.mellanox.co.il Mon Nov 3 07:05:56 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 03 Nov 2008 17:05:56 +0200 Subject: [ofa-general] [PATCH] opensm/osm_ucast_cache: fixing coredump Message-ID: <490F1354.7060305@dev.mellanox.co.il> Following the recent changes in ports allocation - fixing core dump. Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_ucast_cache.c | 2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/opensm/opensm/osm_ucast_cache.c b/opensm/opensm/osm_ucast_cache.c index b142a14..13dee11 100644 --- a/opensm/opensm/osm_ucast_cache.c +++ b/opensm/opensm/osm_ucast_cache.c @@ -135,8 +135,6 @@ static void __cache_sw_destroy(cache_switch_t * p_sw) free(p_sw->lft); if (p_sw->hops) free(p_sw->hops); - if (p_sw->ports) - free(p_sw->ports); free(p_sw); } -- 1.5.1.4 From kliteyn at dev.mellanox.co.il Mon Nov 3 07:07:27 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 03 Nov 2008 17:07:27 +0200 Subject: [ofa-general] [PATCH] opensm/osm_sa.c: adding missing include Message-ID: <490F13AF.3040303@dev.mellanox.co.il> Hi Sasha, Adding missing include to fix compilation warning. Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_sa.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_sa.c b/opensm/opensm/osm_sa.c index 6c02d5d..185557f 100644 --- a/opensm/opensm/osm_sa.c +++ b/opensm/opensm/osm_sa.c @@ -48,6 +48,7 @@ #include #include #include +#include #include #include #include -- 1.5.1.4 From sashak at voltaire.com Mon Nov 3 07:24:28 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 3 Nov 2008 17:24:28 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_ucast_cache: fixing coredump In-Reply-To: <490F1354.7060305@dev.mellanox.co.il> References: <490F1354.7060305@dev.mellanox.co.il> Message-ID: <20081103152428.GG31856@sashak.voltaire.com> On 17:05 Mon 03 Nov , Yevgeny Kliteynik wrote: > Following the recent changes in ports allocation - fixing core dump. > > Signed-off-by: Yevgeny Kliteynik > --- > opensm/opensm/osm_ucast_cache.c | 2 -- > 1 files changed, 0 insertions(+), 2 deletions(-) > > diff --git a/opensm/opensm/osm_ucast_cache.c b/opensm/opensm/osm_ucast_cache.c > index b142a14..13dee11 100644 > --- a/opensm/opensm/osm_ucast_cache.c > +++ b/opensm/opensm/osm_ucast_cache.c > @@ -135,8 +135,6 @@ static void __cache_sw_destroy(cache_switch_t * p_sw) > free(p_sw->lft); > if (p_sw->hops) > free(p_sw->hops); > - if (p_sw->ports) > - free(p_sw->ports); Sure. Applied. Thanks. Sasha From sashak at voltaire.com Mon Nov 3 07:24:51 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 3 Nov 2008 17:24:51 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_sa.c: adding missing include In-Reply-To: <490F13AF.3040303@dev.mellanox.co.il> References: <490F13AF.3040303@dev.mellanox.co.il> Message-ID: <20081103152451.GH31856@sashak.voltaire.com> On 17:07 Mon 03 Nov , Yevgeny Kliteynik wrote: > Hi Sasha, > > Adding missing include to fix compilation warning. > > Signed-off-by: Yevgeny Kliteynik Applied. Thanks. Sasha From dimitar.dimitrov at markit.com Mon Nov 3 08:12:54 2008 From: dimitar.dimitrov at markit.com (Dimitar Dimitrov) Date: Mon, 03 Nov 2008 17:12:54 +0100 Subject: [ofa-general] Error during compile of ofed-1.3.1 Message-ID: <1225728774.15893.17.camel@wks-ubuntu.ops.marketxs.com> Installing OFED 1.3.1 on RHEL AS 4 (update 7), kernel 2.6.9-78.0.1.ELsmp it ends up with the following error: make[1]: Entering directory `/usr/src/kernels/2.6.9-78.0.1.EL-smp-x86_64' mkdir -p /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/.tmp_versions make -f scripts/Makefile.build obj=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1 make -f scripts/Makefile.build obj=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband make -f scripts/Makefile.build obj=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/core gcc -Wp,-MD,/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/core/.addr.o.d -nostdinc -iwithprefix include -D__KERNEL__ -include include/linux/autoconf.h -include /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/include/linux/autoconf.h -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/include -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/debug -I/usr/local/include/scst -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/ulp/srpt -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/net/cxgb3 -Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Os -fomit-frame-pointer -g -Wdeclaration-after-statement -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -DMODULE -DKBUILD_BASENAME=addr -DKBUILD_MODNAME=ib_addr -c -o /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/core/.tmp_addr.o /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/core/addr.c In file included from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/core/addr.c:32: include/linux/inetdevice.h:50: field `mr_gq_timer' has incomplete type include/linux/inetdevice.h:51: field `mr_ifc_timer' has incomplete type include/linux/inetdevice.h:56: confused by earlier errors, bailing out make[4]: *** [/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/core] Error 2 make[2]: *** [/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband] Error 2 make[1]: *** [_module_/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1] Error 2 make[1]: Leaving directory `/usr/src/kernels/2.6.9-78.0.1.EL-smp-x86_64' make: *** [kernel] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.56254 (%build) RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root Bad exit status from /var/tmp/rpm-tmp.56254 (%build) Is there something that can be done about it? -- Regards, Dimitar The content of this e-mail is confidential and may be privileged. It may be read, copied and used only by the intended recipient and may not be disclosed, copied or distributed. If you received this email in error, please contact the sender immediately by return e-mail or by telephoning +44 20 7260 2000, delete it and do not disclose its contents to any person. You should take full responsibility for checking this email for viruses. Markit reserves the right to monitor all e-mail communications through its network. Markit and its affiliated companies make no warranty as to the accuracy or completeness of any information contained in this message and hereby exclude any liability of any kind for the information contained herein. Any opinions expressed in this message are those of the author and do not necessarily reflect the opinions of Markit. For full details about Markit, its offerings and legal terms and conditions, please see Markit’s website at http://www.markit.com . -------------- next part -------------- An HTML attachment was scrubbed... URL: From constantine.gavrilov at gmail.com Mon Nov 3 08:34:36 2008 From: constantine.gavrilov at gmail.com (Constantine Gavrilov) Date: Mon, 03 Nov 2008 18:34:36 +0200 Subject: [ofa-general] Re: patch: support long (above 14 bytes) HW addresses in arp_ioctl In-Reply-To: <490EDBDD.1030104@gmail.com> References: <490EDBDD.1030104@gmail.com> Message-ID: <490F281C.60800@gmail.com> Updated version of the patch uses MAX_ADDR_LEN from netdevice.h as the maximal length of MAC address. Constantine Gavrilov wrote: > While working with OFED infiniband stack that uses 20 byte long HW > addresses for IP over IB, I have paid attention to the following > arp_ioctl problem. > > The ioctl uses a data structure that limits a length of HW address to > 14 bytes. The IP stack and the arp cache code do not have that > limitation. This leads to the following problems: > > * arp_ioctl cannot be used to set, get, or delete arp entries for > those adapters that have HW addresses longer than 14 bytes > * arp_ioctl will corrupt the kernel and user memory when this ioctl is > used on the adapters that have HW addresses longer that 14 bytes. > This is because when copying the HW address, the arp_ioctl code copies > dev->addr_len bytes without checking that addr_len is not above 14 > bytes. This is done both for copy_to_user() and memcpy() calls on > kernel data structures allocated on stack. The memcpy() call in > particular, will corrupt kernel stack. > > Attached please find the patch that fixes both problems. In addition, > the patch changes the maximal number of bytes for HW address that will > be seen in /proc/net/arp from ~10 to ~30. Without the last change, > output of /proc/net/arp truncates the the large MAC entries, which > makes the arp utility useless. > > The patch does not change the existing ABI but extends it. The kernel > structure used in arp_ioctl calls is changed to support larger > addresses, while the user-space structure is extended by appending > extra-space to the end of the structure if ATF_NEWARPCTL -- a new > flag -- is set in arp_flags of existing user-space structure. This > allows avoiding big changes to the existing code while preserving the > ABI compatibility. > -- ---------------------------------------- Constantine Gavrilov Kernel Developer Platform Group XIV, an IBM global brand 1 Azrieli Center, Tel-Aviv Phone: +972-3-6074672 Fax: +972-3-6959749 ---------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: arp_ioctl.patch Type: text/x-patch Size: 5246 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5355 bytes Desc: S/MIME Cryptographic Signature URL: From constantine.gavrilov at gmail.com Mon Nov 3 08:53:06 2008 From: constantine.gavrilov at gmail.com (Constantine Gavrilov) Date: Mon, 03 Nov 2008 18:53:06 +0200 Subject: [ofa-general] Re: patch: support long (above 14 bytes) HW addresses in arp_ioctl In-Reply-To: <490F281C.60800@gmail.com> References: <490EDBDD.1030104@gmail.com> <490F281C.60800@gmail.com> Message-ID: <490F2C72.3000008@gmail.com> Try to resend as with a "web-friendly version". Hopefully, this can be read in the mail archive. Constantine Gavrilov wrote: > Updated version of the patch uses MAX_ADDR_LEN from netdevice.h as the > maximal length of MAC address. > > Constantine Gavrilov wrote: >> While working with OFED infiniband stack that uses 20 byte long HW >> addresses for IP over IB, I have paid attention to the following >> arp_ioctl problem. >> >> The ioctl uses a data structure that limits a length of HW address to >> 14 bytes. The IP stack and the arp cache code do not have that >> limitation. This leads to the following problems: >> >> * arp_ioctl cannot be used to set, get, or delete arp entries for >> those adapters that have HW addresses longer than 14 bytes >> * arp_ioctl will corrupt the kernel and user memory when this ioctl >> is used on the adapters that have HW addresses longer that 14 bytes. >> This is because when copying the HW address, the arp_ioctl code >> copies dev->addr_len bytes without checking that addr_len is not >> above 14 bytes. This is done both for copy_to_user() and memcpy() >> calls on kernel data structures allocated on stack. The memcpy() call >> in particular, will corrupt kernel stack. >> >> Attached please find the patch that fixes both problems. In addition, >> the patch changes the maximal number of bytes for HW address that >> will be seen in /proc/net/arp from ~10 to ~30. Without the last >> change, output of /proc/net/arp truncates the the large MAC entries, >> which makes the arp utility useless. >> >> The patch does not change the existing ABI but extends it. The >> kernel structure used in arp_ioctl calls is changed to support larger >> addresses, while the user-space structure is extended by appending >> extra-space to the end of the structure if ATF_NEWARPCTL -- a new >> flag -- is set in arp_flags of existing user-space structure. This >> allows avoiding big changes to the existing code while preserving the >> ABI compatibility. >> > -- ---------------------------------------- Constantine Gavrilov Kernel Developer Platform Group XIV, an IBM global brand 1 Azrieli Center, Tel-Aviv Phone: +972-3-6074672 Fax: +972-3-6959749 ---------------------------------------- -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: arp_ioctl.patch.txt URL: From rdreier at cisco.com Mon Nov 3 09:39:30 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 03 Nov 2008 09:39:30 -0800 Subject: [ofa-general] Re: patch: support long (above 14 bytes) HW addresses in arp_ioctl In-Reply-To: <490EDBDD.1030104@gmail.com> (Constantine Gavrilov's message of "Mon, 03 Nov 2008 13:09:17 +0200") References: <490EDBDD.1030104@gmail.com> Message-ID: > * arp_ioctl will corrupt the kernel and user memory when this ioctl is > used on the adapters that have HW addresses longer that 14 bytes. > This is because when copying the HW address, the arp_ioctl code copies > dev->addr_len bytes without checking that addr_len is not above 14 > bytes. This is done both for copy_to_user() and memcpy() calls on > kernel data structures allocated on stack. The memcpy() call in > particular, will corrupt kernel stack. It's not obvious to me after a quick glance where this kernel memory corruption occurs, but clearly we should at least fix this bug. > The patch does not change the existing ABI but extends it. The kernel > structure used in arp_ioctl calls is changed to support larger > addresses, while the user-space structure is extended by appending > extra-space to the end of the structure if ATF_NEWARPCTL -- a new flag > -- is set in arp_flags of existing user-space structure. This allows > avoiding big changes to the existing code while preserving the ABI > compatibility. However, given that applications need to be changed to use this, wouldn't it make more sense just to change those applications to use rtnetlink, which already supports large hardware addresses? ie is there much point to extending a legacy ABI to add a feature that the preferred modern interface already has? - R. From john.russo at qlogic.com Mon Nov 3 09:53:57 2008 From: john.russo at qlogic.com (John Russo) Date: Mon, 3 Nov 2008 11:53:57 -0600 Subject: [ofa-general] BOF Slides for WinOF (Resend to correct list) Message-ID: <99863D2ED484D449811D97A4C44C9CBD96905A@EPEXCH2.qlogic.org> Here are some slides to use for WinOF in the BOF presentation. __________________________ John F. Russo Manager, Engineering QLogic Corporation 780 Fifth Avenue, Suite 140 King of Prussia, PA 19406 Direct: 610-233-4866 Main: 610-233-4800 Fax: 610-233-4777 Cell: 610-246-9903 Email: John.Russo at qlogic.com www.qlogic.com True success is the undeniable truth that we have proved ourselves. -Joe Luppino-Esposito -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 3677 bytes Desc: image001.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenFabrics BOF for WinOF.ppt Type: application/vnd.ms-powerpoint Size: 281600 bytes Desc: OpenFabrics BOF for WinOF.ppt URL: From dimitar.dimitrov at markit.com Mon Nov 3 10:01:50 2008 From: dimitar.dimitrov at markit.com (Dimitar Dimitrov) Date: Mon, 03 Nov 2008 19:01:50 +0100 Subject: [ofa-general] Re: Error during compile of ofed-1.3.1 In-Reply-To: <1225728774.15893.17.camel@wks-ubuntu.ops.marketxs.com> References: <1225728774.15893.17.camel@wks-ubuntu.ops.marketxs.com> Message-ID: <1225735310.15893.32.camel@wks-ubuntu.ops.marketxs.com> After some trial and error installed OFED-1.4-20081103-0630 which worked fine. However still not clear what was wrong with the official 1.3.1 release? Dimitar On Mon, 2008-11-03 at 17:12 +0100, Dimitar Dimitrov wrote: > Installing OFED 1.3.1 on RHEL AS 4 (update 7), kernel > 2.6.9-78.0.1.ELsmp it ends up with the following error: > > make[1]: Entering directory > `/usr/src/kernels/2.6.9-78.0.1.EL-smp-x86_64' > mkdir -p /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/.tmp_versions > make -f scripts/Makefile.build > obj=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1 > make -f scripts/Makefile.build > obj=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband > make -f scripts/Makefile.build > obj=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/core > gcc > -Wp,-MD,/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/core/.addr.o.d -nostdinc -iwithprefix include -D__KERNEL__ -include include/linux/autoconf.h -include /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/include/linux/autoconf.h -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/include -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/debug -I/usr/local/include/scst -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/ulp/srpt -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/net/cxgb3 -Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Os -fomit-frame-pointer -g -Wdeclaration-after-statement -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -DMODULE -DKBUILD_BASENAME=addr -DKBUILD_MODNAME=ib_addr -c -o /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/core/.tmp_addr.o /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/core/addr.c > In file included > from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/core/addr.c:32: > include/linux/inetdevice.h:50: field `mr_gq_timer' has incomplete type > include/linux/inetdevice.h:51: field `mr_ifc_timer' has incomplete > type > include/linux/inetdevice.h:56: confused by earlier errors, bailing out > make[4]: *** > [/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/core/addr.o] Error 1 > make[3]: *** > [/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband/core] > Error 2 > make[2]: *** > [/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1/drivers/infiniband] Error > 2 > make[1]: *** [_module_/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3.1] > Error 2 > make[1]: Leaving directory > `/usr/src/kernels/2.6.9-78.0.1.EL-smp-x86_64' > make: *** [kernel] Error 2 > error: Bad exit status from /var/tmp/rpm-tmp.56254 (%build) > > > RPM build errors: > user vlad does not exist - using root > group vlad does not exist - using root > user vlad does not exist - using root > group vlad does not exist - using root > Bad exit status from /var/tmp/rpm-tmp.56254 (%build) > > Is there something that can be done about it? > > -- > Regards, > Dimitar -- Regards, Dimitar The content of this e-mail is confidential and may be privileged. It may be read, copied and used only by the intended recipient and may not be disclosed, copied or distributed. If you received this email in error, please contact the sender immediately by return e-mail or by telephoning +44 20 7260 2000, delete it and do not disclose its contents to any person. You should take full responsibility for checking this email for viruses. Markit reserves the right to monitor all e-mail communications through its network. Markit and its affiliated companies make no warranty as to the accuracy or completeness of any information contained in this message and hereby exclude any liability of any kind for the information contained herein. Any opinions expressed in this message are those of the author and do not necessarily reflect the opinions of Markit. For full details about Markit, its offerings and legal terms and conditions, please see Markit’s website at http://www.markit.com . From vlad at dev.mellanox.co.il Mon Nov 3 10:19:43 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 03 Nov 2008 20:19:43 +0200 Subject: [ofa-general] Re: Error during compile of ofed-1.3.1 In-Reply-To: <1225735310.15893.32.camel@wks-ubuntu.ops.marketxs.com> References: <1225728774.15893.17.camel@wks-ubuntu.ops.marketxs.com> <1225735310.15893.32.camel@wks-ubuntu.ops.marketxs.com> Message-ID: <490F40BF.1090605@dev.mellanox.co.il> Dimitar Dimitrov wrote: > After some trial and error installed OFED-1.4-20081103-0630 which worked > fine. However still not clear what was wrong with the official 1.3.1 > release? > > Dimitar > > Hi Dimitar, OFED-1.3.1 does not support RedHat EL4 up7. See, OFED-1.3.1/docs/OFED_release_notes.txt for the list of supported platforms. Note, OFED-1.4-20081103-0630 have some IPoIB issues. Please use OFED-1.4-20081102-0630, or wait about 30 min for the new daily build. Regards, Vladimir From constantine.gavrilov at gmail.com Mon Nov 3 10:56:44 2008 From: constantine.gavrilov at gmail.com (Constantine Gavrilov) Date: Mon, 03 Nov 2008 20:56:44 +0200 Subject: [ofa-general] Re: patch: support long (above 14 bytes) HW addresses in arp_ioctl In-Reply-To: References: <490EDBDD.1030104@gmail.com> Message-ID: <490F496C.2010608@gmail.com> In arp_req_get() in net/arp.c, there is code: memcpy(r->arp_ha.sa_data, neigh->ha, dev->addr_len); dev->addr_len can be larger than size of r->arp_ha.sa_data. Inititally, I thought it would corrupt kernel stack. I was wrong, since r still has enough space not to overflow even for the largest HW address (32 bytes). It would corrupt the data structure though, and that corrupted reply would be propagated to user. There is a similar situation in arp_req_set(), where a "junk" arp entry will be set if dev->addr_len is larger that 14 bytes. At the very minimum, both arp_req_set() and arp_req_get() should return error (-EINVAL), and not return junk or set junk. Truncated /proc/net/arp output should also be fixed. I was not aware that rtnetlink is capable of doing things like arp table or interface manipulation (like netdevice ioctls). My applications needs to be able to manipulate arp cache for large macs, and I do not mind recompiling by adding a flag. I do not mind fixing arp cli to use this either (venerable arp does use arp_ioctl). And there are many many legacy solutions that use arp_ioctl() in programs and arp utility in scripts. Consider porting those to infiniband. Will rtnetlink work for any net_device (like netdevice ioctls do) for ARP and interface configurations calls or does it require special support in net_device itself? Any possible problems with rtnetlink? Roland Dreier wrote: > > * arp_ioctl will corrupt the kernel and user memory when this ioctl is > > used on the adapters that have HW addresses longer that 14 bytes. > > This is because when copying the HW address, the arp_ioctl code copies > > dev->addr_len bytes without checking that addr_len is not above 14 > > bytes. This is done both for copy_to_user() and memcpy() calls on > > kernel data structures allocated on stack. The memcpy() call in > > particular, will corrupt kernel stack. > > It's not obvious to me after a quick glance where this kernel memory > corruption occurs, but clearly we should at least fix this bug. > > > The patch does not change the existing ABI but extends it. The kernel > > structure used in arp_ioctl calls is changed to support larger > > addresses, while the user-space structure is extended by appending > > extra-space to the end of the structure if ATF_NEWARPCTL -- a new flag > > -- is set in arp_flags of existing user-space structure. This allows > > avoiding big changes to the existing code while preserving the ABI > > compatibility. > > However, given that applications need to be changed to use this, > wouldn't it make more sense just to change those applications to use > rtnetlink, which already supports large hardware addresses? ie is there > much point to extending a legacy ABI to add a feature that the preferred > modern interface already has? > > - R. > -- ---------------------------------------- Constantine Gavrilov Kernel Developer Platform Group XIV, an IBM global brand 1 Azrieli Center, Tel-Aviv Phone: +972-3-6074672 Fax: +972-3-6959749 ---------------------------------------- From chien.tin.tung at intel.com Mon Nov 3 12:05:24 2008 From: chien.tin.tung at intel.com (Chien Tung) Date: Mon, 3 Nov 2008 14:05:24 -0600 Subject: [ofa-general] [PATCH] RDMA/nes: Initialize limit_maxrdreqsz to 0 Message-ID: <20081103200524.GA7140@ctung-MOBL> From: Chien Tung RDMA/nes: Initialize limit_maxrdreqsz to 0 Initialize limit_maxrdreqsz to 0 so the workaround is off by default. Signed-off-by: Chien Tung -- Left out initialization from previous patch (commit 633693660045b3e46a63ed618eb38a54339fbcc0). Don't know how easy it would be to fix the previous patch. drivers/infiniband/hw/nes/nes.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c index aa1dc41..b60572e 100644 --- a/drivers/infiniband/hw/nes/nes.c +++ b/drivers/infiniband/hw/nes/nes.c @@ -95,7 +95,7 @@ unsigned int wqm_quanta = 0x10000; module_param(wqm_quanta, int, 0644); MODULE_PARM_DESC(wqm_quanta, "WQM quanta"); -static unsigned int limit_maxrdreqsz; +static unsigned int limit_maxrdreqsz = 0; module_param(limit_maxrdreqsz, bool, 0644); MODULE_PARM_DESC(limit_maxrdreqsz, "Limit max read request size to 256 Bytes"); From chien.tin.tung at intel.com Mon Nov 3 12:05:27 2008 From: chien.tin.tung at intel.com (Chien Tung) Date: Mon, 3 Nov 2008 14:05:27 -0600 Subject: [ofa-general] [PATCH] RDMA/nes: Check cm_node before using it Message-ID: <20081103200527.GA7408@ctung-MOBL> From: Chien Tung RDMA/nes: Check cm_node before using it Moved cm_core assignment after cm_node check. Signed-off-by: Chien Tung -- drivers/infiniband/hw/nes/nes_cm.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 2caf9da..31341fa 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -376,13 +376,16 @@ int schedule_nes_timer(struct nes_cm_node *cm_node, struct sk_buff *skb, int close_when_complete) { unsigned long flags; - struct nes_cm_core *cm_core = cm_node->cm_core; + struct nes_cm_core *cm_core; struct nes_timer_entry *new_send; int ret = 0; u32 was_timer_set; if (!cm_node) return -EINVAL; + + cm_core = cm_node->cm_core; + new_send = kzalloc(sizeof(*new_send), GFP_ATOMIC); if (!new_send) return -1; From rdreier at cisco.com Mon Nov 3 15:47:07 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 03 Nov 2008 15:47:07 -0800 Subject: [ofa-general] poll CQ failed -2 with connectX In-Reply-To: <200811031139.28122.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Mon, 3 Nov 2008 11:39:27 +0200") References: <200810271838.48510.ricklist@microway.com> <200811031139.28122.jackm@dev.mellanox.co.il> Message-ID: > However, the userspace drivers used were indeed from OFED 1.3.1 > and/or OFED 1.4, resulting in a mismatch between kernel-space and > userspace. > > Specifically, ConnectX cards support XRC (Extended RC) in OFED 1.3.1 > and OFED 1.4 (XRC was not present in OFED 1.2.5). The 1.3.1 / 1.4 > userspace libraries identified some of the QPs created by the OFED > 1.2.5 kernel modules as XRC QPs and returned an error as a result > (correctly indicating that these "XRC" qp's did not exist as XRC > qp's). I think we need newer userspace to continue to work with old kernels; it's a huge pain if someone needs to roll back userspace just to test an older kernel (eg if bisecting a regression or something like that). The simplest thing would be for libmlx4 to check if the kernel driver reports the XRC capability, say when creating the first QP for a given process, and treat the QPN bits appropriately depending on whether the kernel supports XRC or not. - R. From rdreier at cisco.com Mon Nov 3 15:53:26 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 03 Nov 2008 15:53:26 -0800 Subject: [ofa-general] Re: patch: support long (above 14 bytes) HW addresses in arp_ioctl In-Reply-To: <490F496C.2010608@gmail.com> (Constantine Gavrilov's message of "Mon, 03 Nov 2008 20:56:44 +0200") References: <490EDBDD.1030104@gmail.com> <490F496C.2010608@gmail.com> Message-ID: [netdev added to cc list] > In arp_req_get() in net/arp.c, there is code: > > memcpy(r->arp_ha.sa_data, neigh->ha, dev->addr_len); > > dev->addr_len can be larger than size of > r->arp_ha.sa_data. Inititally, I thought it would corrupt kernel > stack. I was wrong, since r still has enough space not to overflow > even for the largest HW address (32 bytes). It would corrupt the data > structure though, and that corrupted reply would be propagated to > user. > > There is a similar situation in arp_req_set(), where a "junk" arp > entry will be set if dev->addr_len is larger that 14 bytes. > > At the very minimum, both arp_req_set() and arp_req_get() should > return error (-EINVAL), and not return junk or set junk. Truncated > /proc/net/arp output should also be fixed. The EINVAL return makes sense; I'm not sure /proc/net/arp is important enough to fix. I guess it depends on the impact of the fix. > I was not aware that rtnetlink is capable of doing things like arp > table or interface manipulation (like netdevice ioctls). My > applications needs to be able to manipulate arp cache for large macs, > and I do not mind recompiling by adding a flag. I do not mind fixing > arp cli to use this either (venerable arp does use arp_ioctl). And > there are many many legacy solutions that use arp_ioctl() in programs > and arp utility in scripts. Consider porting those to infiniband. > > Will rtnetlink work for any net_device (like netdevice ioctls do) for > ARP and interface configurations calls or does it require special > support in net_device itself? Any possible problems with rtnetlink? rtnetlink is the preferred modern interface between userspace and kernel for networking information. There is also the "iproute2" package that provides a good command line interface that is capable of handling IPoIB addresses. For example: $ ip addr show dev ib1 5: ib1: mtu 2044 qdisc pfifo_fast state UP qlen 256 link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:00:01:65 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff inet 192.168.145.74/24 brd 192.168.145.255 scope global ib1 inet6 fe80::202:c903:0:165/64 scope link valid_lft forever preferred_lft forever $ ip neigh 192.168.145.73 dev ib1 lladdr 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:00:01:30 STALE 172.29.224.1 dev eth0 lladdr 00:00:0c:07:ac:e0 REACHABLE and so on. - R. From chu11 at llnl.gov Mon Nov 3 16:39:51 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 03 Nov 2008 16:39:51 -0800 Subject: [ofa-general] [opensm patch] support dump_conf command in opensm console Message-ID: <1225759191.7307.9.camel@cardanus.llnl.gov> Hey Sasha, When config files are rescanned and loaded, there's no way to know if the right configuration was actually reloaded or not. A console command to dump the current config is a useful way to verify the loading of new configs or not. This patch assumes the fixes from my "fix qos config parsing bugs" is accepted. Al -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-support-dump_conf-console-command.patch Type: text/x-patch Size: 8850 bytes Desc: not available URL: From rdreier at cisco.com Mon Nov 3 21:47:45 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 03 Nov 2008 21:47:45 -0800 Subject: [ofa-general] Re: [PATCH] RDMA/nes: Initialize limit_maxrdreqsz to 0 In-Reply-To: <20081103200524.GA7140@ctung-MOBL> (Chien Tung's message of "Mon, 3 Nov 2008 14:05:24 -0600") References: <20081103200524.GA7140@ctung-MOBL> Message-ID: > Left out initialization from previous patch (commit > 633693660045b3e46a63ed618eb38a54339fbcc0). Don't know how easy > it would be to fix the previous patch. In general, if I haven't asked Linus to pull a given patch yet, it's easy to go back and amend it, and if I have asked him to pull, it's too late to change things (we just add the fix later). However, in this case: > -static unsigned int limit_maxrdreqsz; > +static unsigned int limit_maxrdreqsz = 0; the "fix" is bogus -- unless I'm very confused, limit_maxrdreqsz is a static variable which is already in BSS and hence initialized to zero. And the kernel style is to leave off the superfluous initializer. Running your patch through checkpatch.pl would have shown the clue "ERROR: do not initialise statics to 0 or NULL" as well. So I think the original patch is fine. - R. From rdreier at cisco.com Mon Nov 3 21:54:10 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 03 Nov 2008 21:54:10 -0800 Subject: [ofa-general] [PATCH] RDMA/nes: Check cm_node before using it In-Reply-To: <20081103200527.GA7408@ctung-MOBL> (Chien Tung's message of "Mon, 3 Nov 2008 14:05:27 -0600") References: <20081103200527.GA7408@ctung-MOBL> Message-ID: > RDMA/nes: Check cm_node before using it > > Moved cm_core assignment after cm_node check. This patch is fine -- but I've never seen the oops the current code would cause, and I'm guessing you haven't either. Is there any way that schedule_nes_timer() gets passed a NULL cm_node? - R. From ogerlitz at voltaire.com Tue Nov 4 01:23:46 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 4 Nov 2008 11:23:46 +0200 (IST) Subject: [ofa-general] [PATCH] perftest: don't attach the sender QP Message-ID: don't attach the sender QP to the MGID Signed-off-by: Or Gerlitz Index: perftest-1.2/send_bw.c =================================================================== --- perftest-1.2.orig/send_bw.c +++ perftest-1.2/send_bw.c @@ -421,7 +421,7 @@ static struct pingpong_context *pp_init_ return NULL; } - if ((user_parm->connection_type==UD) && (user_parm->use_mcg)) { + if ((user_parm->connection_type==UD) && (user_parm->use_mcg) && !user_parm->servername) { union ibv_gid gid; uint8_t mcg_gid[16] = MCG_GID; From jackm at dev.mellanox.co.il Tue Nov 4 01:26:56 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 4 Nov 2008 11:26:56 +0200 Subject: [ofa-general] poll CQ failed -2 with connectX In-Reply-To: References: <200810271838.48510.ricklist@microway.com> <200811031139.28122.jackm@dev.mellanox.co.il> Message-ID: <200811041126.56200.jackm@dev.mellanox.co.il> On Tuesday 04 November 2008 01:47, Roland Dreier wrote: > The simplest thing would be for libmlx4 to check if the kernel driver > reports the XRC capability, say when creating the first QP for a given > process, and treat the QPN bits appropriately depending on whether the > kernel supports XRC or not. > Actually, I already have a patch which does query-device when allocating a new user context. Since we have an device-capability XRC flag (bit 20), we can save that in the user context. I submitted the patch to the list last October (2007): http://lists.openfabrics.org/pipermail/general/2007-October/042351.html (This XRC capability issue is another reason for having this patch -- need to save the device flags as well, and add a flags word to the user context). When creating a CQ, we can then add a "kernel-supports-xrc" flag to the cq context, and test for that during cq_poll_one when testing if the QPN is an XRC qpn or not. I'll prepare a patch for libmlx4. It won't be in time for ofed 1.4-rc4 since that is going out already (possibly even today). - Jack From dorons at Voltaire.COM Tue Nov 4 01:59:59 2008 From: dorons at Voltaire.COM (Doron Shoham) Date: Tue, 04 Nov 2008 11:59:59 +0200 Subject: [ofa-general] [PATCH] change log_max_size to MB Message-ID: <49101D1F.4040605@Voltaire.COM> fixes a bug that log-limit in opensm.conf is in bytes while opensm '-L' option accept the size in MB Signed-off-by: Doron Shoham --- opensm/opensm/osm_subnet.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 0422d0f..8406232 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -1278,6 +1278,7 @@ int osm_subn_parse_conf_file(char *file_name, osm_subn_opt_t * const p_opts) opts_unpack_uint32("log_max_size", p_key, p_val, (void *) & p_opts->log_max_size); + p_opts->log_max_size * 1024 *1024; /* convert to MB */ opts_unpack_charp("partition_config_file", p_key, p_val, &p_opts->partition_config_file); -- 1.5.4 From jackm at dev.mellanox.co.il Tue Nov 4 02:14:38 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 4 Nov 2008 12:14:38 +0200 Subject: [ofa-general] mlx4: Allow resetting capability mask to defaults with SET_PORT Message-ID: <200811041214.39085.jackm@dev.mellanox.co.il> mlx4: Allow resetting capability mask to defaults with SET_PORT Commit 7ff93f8b7... introduced support for different port types. As part of that support, SET_PORT is invoked to set the port type during driver startup. However, as a side-effect, for IB ports the invocation of this command also sets the port capability mask to zero (losing the default configuration values set by FW). This fix introduces use of the new rcm (reset capability mask) bit in the SET_PORT command (bit 30 of first mailbox dword) which resets the capability mask to the FW default value for that port (ignoring the value included in the command mailbox). The fix is to set the rcm bit when first setting the port-type to IB, thus also restoring the capability mask to its default value (rather than to zero). (The fix also sets the rqk bit to reset the Qkey violations counter). The fix requires ConnectX fw 2.5.927 or later to operate properly; it will do no harm, however, if the driver runs over earlier FW -- the problem simply will still occur. This patch fixes Bugzilla 1183 (which occurred because the IsTrapSupported bit in the capability mask was zeroed). Signed-off-by: Jack Morgenstein diff --git a/drivers/net/mlx4/port.c b/drivers/net/mlx4/port.c index e2fdab4..145d6e1 100644 --- a/drivers/net/mlx4/port.c +++ b/drivers/net/mlx4/port.c @@ -273,7 +273,8 @@ int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port) ((u8 *) mailbox->buf)[3] = 6; ((__be16 *) mailbox->buf)[4] = cpu_to_be16(1 << 15); ((__be16 *) mailbox->buf)[6] = cpu_to_be16(1 << 15); - } + } else + ((u8 *) mailbox->buf)[3] = 3; err = mlx4_cmd(dev, mailbox->dma, port, is_eth, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B); From vlad at dev.mellanox.co.il Tue Nov 4 02:13:56 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 04 Nov 2008 12:13:56 +0200 Subject: [ofa-general] OFED Nov 3 2008 meeting summary on OFED 1.4 status Message-ID: <49102064.7080004@dev.mellanox.co.il> Meeting minutes on the web: http://www.openfabrics.org/txt/documentation/linux/EWG_meeting_minutes/ Meeting Summary: ============== RC4 is delayed - will be released on Thursday Nov 6. Details: ======= Bugs to be fixed in RC4: 1283 blocker P1 RHEL 5 yannick.cote at qlogic.com NEW Intel MPI fails on Qlogc HCA 1326 blocker P1 RHEL 4 yannick.cote at qlogic.com NEW ipath driver fails to build on IA64 in the 10/28/08 daily build 1335 major P3 Other monis at voltaire.com NEW Bonding: packet lost during failover 1301 major P3 RHEL 4 olgas at voltaire.com NEW Can not load rds module on RH4 up7 1323 blocker P1 All stefan.roscher at de.ibm.com REOPENED IB/ehca: possibillity of kernel panic under certain circumstances 1242 critical P2 RHEL 4 yannick.cote at qlogic.com NEW kernel panic while running mpi2007 against ofed1.4 -- ib_ipath: ipath_sdma_verbs_send 1336 critical P1 RHEL 5 bugzilla at openib.org NEW Can't to unloading the mlx4_ib module on ppc64 Regards, Vladimir From dimitar.dimitrov at markit.com Tue Nov 4 02:32:24 2008 From: dimitar.dimitrov at markit.com (Dimitar Dimitrov) Date: Tue, 04 Nov 2008 11:32:24 +0100 Subject: [ofa-general] Re: Error during compile of ofed-1.3.1 In-Reply-To: <490F40BF.1090605@dev.mellanox.co.il> References: <1225728774.15893.17.camel@wks-ubuntu.ops.marketxs.com> <1225735310.15893.32.camel@wks-ubuntu.ops.marketxs.com> <490F40BF.1090605@dev.mellanox.co.il> Message-ID: <1225794744.26563.6.camel@wks-ubuntu.ops.marketxs.com> Hi Vladimir, Thanks for your reply. I am compiling OFED-1.4-20081104-0127 right now and hope it works ok. I thought RedHat packages were using 1.3.1 (official release) as a base, but now I see the source rpm package is numbered 1.3.2. So in this case would I be safer using some of the 1.3.2 releases? I reckon I should stick to the latest and wait patiently till the official 1.4 release (also in a testing phase here). Regards, Dimitar On Mon, 2008-11-03 at 20:19 +0200, Vladimir Sokolovsky wrote: > Dimitar Dimitrov wrote: > > After some trial and error installed OFED-1.4-20081103-0630 which worked > > fine. However still not clear what was wrong with the official 1.3.1 > > release? > > > > Dimitar > > > > > Hi Dimitar, > OFED-1.3.1 does not support RedHat EL4 up7. > See, OFED-1.3.1/docs/OFED_release_notes.txt for the list of supported > platforms. > > Note, OFED-1.4-20081103-0630 have some IPoIB issues. > Please use OFED-1.4-20081102-0630, or wait about 30 min for the new > daily build. > > Regards, > Vladimir > -- Regards, Dimitar The content of this e-mail is confidential and may be privileged. It may be read, copied and used only by the intended recipient and may not be disclosed, copied or distributed. If you received this email in error, please contact the sender immediately by return e-mail or by telephoning +44 20 7260 2000, delete it and do not disclose its contents to any person. You should take full responsibility for checking this email for viruses. Markit reserves the right to monitor all e-mail communications through its network. Markit and its affiliated companies make no warranty as to the accuracy or completeness of any information contained in this message and hereby exclude any liability of any kind for the information contained herein. Any opinions expressed in this message are those of the author and do not necessarily reflect the opinions of Markit. For full details about Markit, its offerings and legal terms and conditions, please see Markit’s website at http://www.markit.com . From vlad at lists.openfabrics.org Tue Nov 4 03:20:47 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Tue, 4 Nov 2008 03:20:47 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081104-0200 daily build status Message-ID: <20081104112047.A91B3E60CF9@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From hal.rosenstock at gmail.com Tue Nov 4 04:22:22 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 4 Nov 2008 07:22:22 -0500 Subject: ***SPAM*** Re: [ofa-general] mlx4: Allow resetting capability mask to defaults with SET_PORT In-Reply-To: <200811041214.39085.jackm@dev.mellanox.co.il> References: <200811041214.39085.jackm@dev.mellanox.co.il> Message-ID: Jack, On Tue, Nov 4, 2008 at 5:14 AM, Jack Morgenstein wrote: > mlx4: Allow resetting capability mask to defaults with SET_PORT > > Commit 7ff93f8b7... introduced support for different port types. > As part of that support, SET_PORT is invoked to set the port type > during driver startup. However, as a side-effect, for IB ports > the invocation of this command also sets the port capability mask > to zero (losing the default configuration values set by FW). > > This fix introduces use of the new rcm (reset capability mask) bit > in the SET_PORT command (bit 30 of first mailbox dword) which resets > the capability mask to the FW default value for that port (ignoring > the value included in the command mailbox). > > The fix is to set the rcm bit when first setting the port-type to IB, > thus also restoring the capability mask to its default value (rather > than to zero). > (The fix also sets the rqk bit to reset the Qkey violations counter). > > The fix requires ConnectX fw 2.5.927 or later Is this released firmware ? If not, when is it to be released ? -- Hal > to operate properly; > it will do no harm, however, if the driver runs over earlier FW -- > the problem simply will still occur. > > This patch fixes Bugzilla 1183 (which occurred because the > IsTrapSupported bit in the capability mask was zeroed). > > Signed-off-by: Jack Morgenstein > > diff --git a/drivers/net/mlx4/port.c b/drivers/net/mlx4/port.c > index e2fdab4..145d6e1 100644 > --- a/drivers/net/mlx4/port.c > +++ b/drivers/net/mlx4/port.c > @@ -273,7 +273,8 @@ int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port) > ((u8 *) mailbox->buf)[3] = 6; > ((__be16 *) mailbox->buf)[4] = cpu_to_be16(1 << 15); > ((__be16 *) mailbox->buf)[6] = cpu_to_be16(1 << 15); > - } > + } else > + ((u8 *) mailbox->buf)[3] = 3; > err = mlx4_cmd(dev, mailbox->dma, port, is_eth, MLX4_CMD_SET_PORT, > MLX4_CMD_TIME_CLASS_B); > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From jackm at dev.mellanox.co.il Tue Nov 4 04:44:03 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 4 Nov 2008 14:44:03 +0200 Subject: [ofa-general] mlx4: Allow resetting capability mask to defaults with SET_PORT In-Reply-To: References: <200811041214.39085.jackm@dev.mellanox.co.il> Message-ID: <200811041444.03355.jackm@dev.mellanox.co.il> On Tuesday 04 November 2008 14:22, Hal Rosenstock wrote: > Is this released firmware ? If not, when is it to be released ? This FW has not yet been released. The next ConnectX FW release (which will include this change) is scheduled for the end of this month. - Jack From jackm at dev.mellanox.co.il Tue Nov 4 05:10:00 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 4 Nov 2008 15:10:00 +0200 Subject: [ofa-general] Error during compile of ofed-1.3.1 In-Reply-To: <1225728774.15893.17.camel@wks-ubuntu.ops.marketxs.com> References: <1225728774.15893.17.camel@wks-ubuntu.ops.marketxs.com> Message-ID: <200811041510.00177.jackm@dev.mellanox.co.il> On Monday 03 November 2008 18:12, Dimitar Dimitrov wrote: > Installing OFED 1.3.1 on RHEL AS 4 (update 7), kernel 2.6.9-78.0.1.ELsmp > it ends up with the following error: > OFED 1.3.1 (released in June 2008) does not support update 7 (which was released in July 2008). The upcoming OFED 1.4 does support RHEL AS 4 (update 7). (release candidates are already available. The most recent is rc3; rc4 should be out this week) - Jack From kelly at tradebotsystems.com Tue Nov 4 06:51:37 2008 From: kelly at tradebotsystems.com (Kelly Burkhart) Date: Tue, 4 Nov 2008 08:51:37 -0600 Subject: [ofa-general] infiniband multicast (libibverbs) Message-ID: <98B0CDCB28A5EE4CB3678CD99406644E343482@tbmail2.tradebot.com> I'm experimenting with multicast and am having an interesting issue. The setup is ripped mostly from ib_send_lat.c. I have a client which sends and a server which reads. All sends/recieves use a 2048 byte message. The client can send any number of messages at any message rate. The client spins in a tight loop while sending to reduce bursts of messages (1000 messgages/sec are spread out over the sec). The client embeds a sequence number in the message. After setup, the server does this: post 2048 recvs for(;;) { ibv_poll_cq in loop, waiting for completion post recv check sequence number } If I specify more than about 6500 messages/sec, I skip some sequences and receive others multiple times. I always receive the same number of messages the client sent. It appears as though all of the messages come through, but I'm missing some and reading others twice. I suspect that there is some trick to more reliable multicast messaging that I don't know about. Does anyone have hints for multicasting high message rates with a small percentage of drops or misses? Thanks, -K From chien.tin.tung at intel.com Tue Nov 4 07:12:24 2008 From: chien.tin.tung at intel.com (Tung, Chien Tin) Date: Tue, 4 Nov 2008 08:12:24 -0700 Subject: [ofa-general] [PATCH] RDMA/nes: Check cm_node before using it In-Reply-To: References: <20081103200527.GA7408@ctung-MOBL> Message-ID: <60BEFF3FBD4C6047B0F13F205CAFA383030F148BD8@azsmsx501.amr.corp.intel.com> >This patch is fine -- but I've never seen the oops the current code >would cause, and I'm guessing you haven't either. Is there >any way that >schedule_nes_timer() gets passed a NULL cm_node? I checked all caller of schedule_nes_timer() and didn't see any instances of cm_node being NULL. We have a few cm related patches coming, I will roll this change into one of them. Thanks, Chien From rdreier at cisco.com Tue Nov 4 09:14:02 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 04 Nov 2008 09:14:02 -0800 Subject: [ofa-general] infiniband multicast (libibverbs) In-Reply-To: <98B0CDCB28A5EE4CB3678CD99406644E343482@tbmail2.tradebot.com> (Kelly Burkhart's message of "Tue, 4 Nov 2008 08:51:37 -0600") References: <98B0CDCB28A5EE4CB3678CD99406644E343482@tbmail2.tradebot.com> Message-ID: > If I specify more than about 6500 messages/sec, I skip some sequences > and receive others multiple times. I always receive the same number of > messages the client sent. It appears as though all of the messages come > through, but I'm missing some and reading others twice. Sounds like a bug in your code -- I don't know why you would see duplicate messages unless you are somehow processing the same receive buffer twice or something like that. - R. From weiny2 at llnl.gov Tue Nov 4 09:57:44 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 4 Nov 2008 09:57:44 -0800 Subject: [ofa-general] [PATCH] opensm/opensm/osm_state_mgr.c: Add check for valid physical port before using pointer. Message-ID: <20081104095744.35893d4a.weiny2@llnl.gov> >From 567c3893f24f4dc25ef5f4e74ef9deeb8ae541ad Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Mon, 3 Nov 2008 14:47:50 -0800 Subject: [PATCH] opensm/opensm/osm_state_mgr.c: Add check for valid physical port before using pointer. There are times when PortInfo fails which leaves osm_node_t with invalid osm_physp_t pointers. In this case do not use an invalid pointer. Signed-off-by: Ira Weiny --- opensm/opensm/osm_state_mgr.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index ba3b6bf..841438c 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -542,6 +542,12 @@ static void __osm_state_mgr_get_node_desc(IN cl_map_item_t * const p_object, /* get a physp to request from. */ p_physp = osm_node_get_any_physp_ptr(p_node); + if (!osm_physp_is_valid(p_physp)) { + OSM_LOG(sm->p_log, OSM_LOG_ERROR, + "__osm_state_mgr_get_node_desc: ERR 331C: " + "Failed to get valid physical port object\n"); + goto exit; + } mad_context.nd_context.node_guid = osm_node_get_node_guid(p_node); -- 1.5.4.5 From weiny2 at llnl.gov Tue Nov 4 09:58:12 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 4 Nov 2008 09:58:12 -0800 Subject: [ofa-general] [PATCH] Add check for previous versions of plugins. Message-ID: <20081104095812.2ff5920c.weiny2@llnl.gov> >From 0db0d6667ed8baede1093a95127e2ce9c81959bd Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Mon, 3 Nov 2008 15:50:15 -0800 Subject: [PATCH] Add check for previous versions of plugins. If old interface plugins are available to OpenSM they will cause a crash. Check for this old version and error out gracefully. Signed-off-by: Ira Weiny --- opensm/include/opensm/osm_event_plugin.h | 1 + opensm/opensm/osm_event_plugin.c | 10 ++++++++++ 2 files changed, 11 insertions(+), 0 deletions(-) diff --git a/opensm/include/opensm/osm_event_plugin.h b/opensm/include/opensm/osm_event_plugin.h index b2deeba..0b80b63 100644 --- a/opensm/include/opensm/osm_event_plugin.h +++ b/opensm/include/opensm/osm_event_plugin.h @@ -150,6 +150,7 @@ typedef struct osm_epi_trap_event { #define OSM_EVENT_PLUGIN_IMPL_NAME "osm_event_plugin" #define OSM_EVENT_PLUGIN_INTERFACE_VER 2 typedef struct osm_event_plugin { + int interface_version; const char *osm_version; void *(*create) (struct osm_opensm *osm); void (*delete) (void *plugin_data); diff --git a/opensm/opensm/osm_event_plugin.c b/opensm/opensm/osm_event_plugin.c index c6999f5..86cabf0 100644 --- a/opensm/opensm/osm_event_plugin.c +++ b/opensm/opensm/osm_event_plugin.c @@ -96,6 +96,16 @@ osm_epi_plugin_t *osm_epi_construct(osm_opensm_t *osm, char *plugin_name) goto Exit; } + /* check for new interface */ + if (rc->impl->interface_version < OSM_EVENT_PLUGIN_INTERFACE_VER) { + OSM_LOG(&osm->log, OSM_LOG_ERROR, "Error loading plugin" + "\'%s\': is the wrong interface version (%d); " + "OpenSM expected %d\n", + plugin_name, rc->impl->interface_version, + OSM_EVENT_PLUGIN_INTERFACE_VER); + goto Exit; + } + /* Check the version to make sure this module will work with us */ if (strcmp(rc->impl->osm_version, osm->osm_version)) { OSM_LOG(&osm->log, OSM_LOG_ERROR, "Error loading plugin" -- 1.5.4.5 From rdreier at cisco.com Tue Nov 4 10:52:52 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 04 Nov 2008 10:52:52 -0800 Subject: [ofa-general] Re: mlx4: Allow resetting capability mask to defaults with SET_PORT In-Reply-To: <200811041214.39085.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Tue, 4 Nov 2008 12:14:38 +0200") References: <200811041214.39085.jackm@dev.mellanox.co.il> Message-ID: > The fix requires ConnectX fw 2.5.927 or later to operate properly; > it will do no harm, however, if the driver runs over earlier FW -- > the problem simply will still occur. This doesn't seem like an acceptable solution -- this means that anyone using a new kernel with older firmware has a broken system. Can't we just keep track of the current capability mask and make sure to set it properly when doing the SET_PORT command? Actually, looking at the code, it seems we really should unify the multiple mlx4_SET_PORT implementations anyway. - R. From rdreier at cisco.com Tue Nov 4 10:57:26 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 04 Nov 2008 10:57:26 -0800 Subject: [ofa-general] Re: [PATCH] libibverbs: Update Dotan's email in all of the files In-Reply-To: <200810180435.00292.dotanba@gmail.com> (Dotan Barak's message of "Sat, 18 Oct 2008 04:35:00 +0200") References: <200810180435.00292.dotanba@gmail.com> Message-ID: thanks, applied From rdreier at cisco.com Tue Nov 4 11:00:57 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 04 Nov 2008 11:00:57 -0800 Subject: [ofa-general] [PATCH] ipoib: fix crash in path_rec_completion In-Reply-To: <490B01C6.7020302@Voltaire.COM> (Yossi Etigin's message of "Fri, 31 Oct 2008 15:01:58 +0200") References: <490B01C6.7020302@Voltaire.COM> Message-ID: thanks, applied From rdreier at cisco.com Tue Nov 4 11:17:21 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 04 Nov 2008 11:17:21 -0800 Subject: [ofa-general] Re: [PATCH] mlx4/profile.c: fix warning res_name defined but not used In-Reply-To: <20081023233255.GB14519@orion> (Alexander Beregalov's message of "Fri, 24 Oct 2008 03:32:55 +0400") References: <20081023233255.GB14519@orion> Message-ID: Thanks. What if we fix this like the following instead -- change mlx4_dbg so it always looks to the compiler like it uses all its parameters? This generates the same code for me, and looks cleaner in that it actually reduces the amount of #ifdef'ed stuff. --- drivers/net/mlx4/mlx4.h | 9 +++------ 1 files changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index fa431fa..56a2e21 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -87,6 +87,9 @@ enum { #ifdef CONFIG_MLX4_DEBUG extern int mlx4_debug_level; +#else /* CONFIG_MLX4_DEBUG */ +#define mlx4_debug_level (0) +#endif /* CONFIG_MLX4_DEBUG */ #define mlx4_dbg(mdev, format, arg...) \ do { \ @@ -94,12 +97,6 @@ extern int mlx4_debug_level; dev_printk(KERN_DEBUG, &mdev->pdev->dev, format, ## arg); \ } while (0) -#else /* CONFIG_MLX4_DEBUG */ - -#define mlx4_dbg(mdev, format, arg...) do { (void) mdev; } while (0) - -#endif /* CONFIG_MLX4_DEBUG */ - #define mlx4_err(mdev, format, arg...) \ dev_err(&mdev->pdev->dev, format, ## arg) #define mlx4_info(mdev, format, arg...) \ From dotanba at gmail.com Tue Nov 4 11:33:31 2008 From: dotanba at gmail.com (Dotan Barak) Date: Tue, 04 Nov 2008 21:33:31 +0200 Subject: [ofa-general] infiniband multicast (libibverbs) In-Reply-To: <98B0CDCB28A5EE4CB3678CD99406644E343482@tbmail2.tradebot.com> References: <98B0CDCB28A5EE4CB3678CD99406644E343482@tbmail2.tradebot.com> Message-ID: <4910A38B.60900@gmail.com> Kelly Burkhart wrote: > I'm experimenting with multicast and am having an interesting issue. > The setup is ripped mostly from ib_send_lat.c. I have a client which > sends and a server which reads. All sends/recieves use a 2048 byte > message. > > The client can send any number of messages at any message rate. The > client spins in a tight loop while sending to reduce bursts of messages > (1000 messgages/sec are spread out over the sec). The client embeds a > sequence number in the message. > > After setup, the server does this: > > post 2048 recvs > for(;;) { > ibv_poll_cq in loop, waiting for completion > post recv > check sequence number > } > > If I specify more than about 6500 messages/sec, I skip some sequences > and receive others multiple times. I always receive the same number of > messages the client sent. It appears as though all of the messages come > through, but I'm missing some and reading others twice. > Do you use the "volatile" when you access the pointed memory buffer? > I suspect that there is some trick to more reliable multicast messaging > that I don't know about. Does anyone have hints for multicasting high > message rates with a small percentage of drops or misses? > Do you have worst results than the ib_send_bw.c? Can you try to send unicast messages (with minimum changes) to see if the issue is related to multicast send? Anyway, you should remember that multicast messages are being sent over UD QPs and messages can be dropped. Dotan From cameron at harr.org Tue Nov 4 11:38:03 2008 From: cameron at harr.org (Cameron Harr) Date: Tue, 04 Nov 2008 12:38:03 -0700 Subject: [ofa-general] SRP/mlx4 interrupts throttling performance In-Reply-To: <490B45B0.7030208@vlnb.net> References: <48E386F6.5040502@fusionio.com> <48E38BAF.5000801@harr.org> <48E6498A.3070002@mellanox.com> <48E65FE0.2060602@harr.org> <48E67ACC.1020903@harr.org> <48E695F9.80703@harr.org> <48E9E681.8090600@vlnb.net> <48EA2F42.80008@harr.org> <48EB8CBC.30303@harr.org> <48EB96C5.2060202@vlnb.net> <48EBA581.4040301@mellanox.com> <48EBA72B.4000909@harr.org> <48EBBDB1.1080203@harr.org> <48EBE6B6.4060804@mellanox.com> <48ECEA4D.7080504@harr.org> <48F79CA9.8090806@vlnb.net> <49022438.9030903@harr.org> <490B45B0.7030208@vlnb.net> Message-ID: <4910A49B.1050004@harr.org> Vladislav Bolkhovitin wrote: > Cameron Harr wrote: >> Vladislav Bolkhovitin wrote: >>>> ** Sometimes the benchmark "zombied" (process doing no work, but >>>> process can't be killed) after running a certain amount of time. >>>> However, it wasn't repeatable in a reliable way, so I mark that >>>> this particular run has zombied before. >>> That means that there is a bug somewhere. Usually such bugs are >>> found in few hours of code auditing (srpt driver is pretty simple) >>> or by using kernel debug facilities (example diff to .config >>> attached). I personally always prefer put my effort on fixing real >>> things, not inventing various workarounds, like srpt_thread in this >>> case. >>> >>> So I would: >>> >>> 1. Completely remove srpt thread and all related code. It doesn't do >>> anything, which can't be done in SIRQ context (tasklet) >>> >>> 2. Audit the code to check if it does any action, which it >>> shouldn't do on SIRQ and fix it. This step isn't required, but >>> usually it saves a lot of time of puzzled debugging in the future. >>> >>> 3. Change in srpt_handle_rdma_comp() and srpt_handle_new_iu() >>> SCST_CONTEXT_THREAD to SCST_CONTEXT_DIRECT_ATOMIC. >> I'm assuming you didn't want me to implement this change this time, correct? >> I also changed it in srpt_handle_err_comp() >>> Then I would run the problematic tests (heavy tpc-h workload, e.g.) >>> on debug kernel and fix found problems. >>> >>> Anyway, Cameron, can you get the latest code from SCST trunk and try >>> with it? It was recently updated. Also please add the case with >>> changes from (3) above. >> This is all with version 1.0.1 of SCST (v532). >> In my fio test, I do runs with srpt thread=1 and then =0. When it was >> set to zero during the test, I got many errors printed out by FIO, >> and the target eventually crashed. This is the first part of a long >> call trace. >> >> NMI Watchdog detected LOCKUP on CPU 0 >> CPU 0 >> Modules linked in: ib_srpt(U) scst_vdisk(U) scst(U) fio_driver(PU) >> fio_port(PU) autofs4 hidp rfcomm l2cap bluetooth sunrpc ib_ipoib >> mlx4_ib ib_cm ib_sa ib_mad ib_core ipv6 xfrm_nalgo crypto_api >> nls_utf8 hfsplus dm_mirror dm_multipath dm_mod video sbs backlight >> i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp >> parport i2c_i801 shpchp i2c_core e1000e mlx4_core i5000_edac edac_mc >> pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd >> ehci_hcd >> Pid: 25732, comm: scsi_tgt0 Tainted: P 2.6.18-92.1.13.el5 #1 >> RIP: 0010:[] [] >> .text.lock.spinlock+0x29/0x30 >> RSP: 0018:ffffffff80418a88 EFLAGS: 00000086 >> RAX: ffff810785307fd8 RBX: ffffffff884e68a0 RCX: 0000000000000000 >> RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff884e68a0 >> RBP: ffffffff884e62a0 R08: ffff810790926900 R09: ffff8107909268e8 >> R10: 0000000000000018 R11: ffffffff884fcab3 R12: 0000000000000001 >> R13: 0000000000000001 R14: 0000000000000000 R15: ffff8107f0f374c0 >> FS: 0000000000000000(0000) GS:ffffffff803a0000(0000) >> knlGS:0000000000000000 >> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >> CR2: 00000037bc0986d0 CR3: 0000000000201000 CR4: 00000000000006e0 >> Process scsi_tgt0 (pid: 25732, threadinfo ffff810785306000, task >> ffff810810852100) >> Stack: 0000000000000000 ffffffff884c509d ffff8107909268e8 >> ffff810790926900 >> 00000002071dd688 0000020000000220 0000000000000200 00000000da984c08 >> 0000000000000000 ffff8107909267f0 ffff810806ceee20 0000000000000001 >> Call Trace: >> [] :scst:sgv_pool_alloc+0x10c/0x5d3 >> [] :scst:scst_alloc_space+0x5b/0x106 >> [] :scst:scst_process_active_cmd+0x4fc/0x131c >> [] :scst:scst_cmd_init_done+0x17f/0x3ef >> [] :ib_srpt:srpt_handle_new_iu+0x281/0x4e7 >> [] :mlx4_ib:mlx4_ib_free_srq_wqe+0x27/0x4f >> [] :mlx4_ib:get_sw_cqe+0x12/0x30 >> [] :mlx4_ib:mlx4_ib_poll_cq+0x432/0x48f >> [] :ib_srpt:srpt_completion+0x190/0x250 >> [] :mlx4_core:mlx4_eq_int+0x3b/0x26f >> [] :mlx4_core:mlx4_msi_x_interrupt+0xf/0x17 > > According to this trace, Vu was incorrect when he wrote that > srpt_handle_new_iu called on tasklet context. It at least sometimes > called from IRQ context. Try with the attached patch. It's against the > latest trunk. I tried with the latest scst and srpt as of this morning. Previously, I had used srpt-1.0.0. The following results are with BLOCKIO, and I'll have a NULLIO in a bit. You can see from here that I don't hang any more, but the srpt thread=0 are a little lower. As before this run was done with ioengine=libaio and iodepth=16. I pretty much always get significantly better performance with libaio than with sync or other engines. Also, the iodepth setting tended to give me better results. ---------------------------------------------- type=randwrite bs=512 drives=1 scst_threads=1 srptthread=1 iops=67073.48 type=randwrite bs=4k drives=1 scst_threads=1 srptthread=1 iops=54876.82 type=randwrite bs=512 drives=2 scst_threads=1 srptthread=1 iops=74858.00 type=randwrite bs=4k drives=2 scst_threads=1 srptthread=1 iops=75357.15 type=randwrite bs=512 drives=3 scst_threads=1 srptthread=1 iops=83257.72 type=randwrite bs=4k drives=3 scst_threads=1 srptthread=1 iops=82186.79 type=randwrite bs=512 drives=1 scst_threads=2 srptthread=1 iops=59908.06 type=randwrite bs=4k drives=1 scst_threads=2 srptthread=1 iops=50982.91 type=randwrite bs=512 drives=2 scst_threads=2 srptthread=1 iops=99243.07 type=randwrite bs=4k drives=2 scst_threads=2 srptthread=1 iops=79670.62 type=randwrite bs=512 drives=3 scst_threads=2 srptthread=1 iops=102898.37 type=randwrite bs=4k drives=3 scst_threads=2 srptthread=1 iops=92248.25 type=randwrite bs=512 drives=1 scst_threads=3 srptthread=1 iops=63086.77 type=randwrite bs=4k drives=1 scst_threads=3 srptthread=1 iops=53020.41 type=randwrite bs=512 drives=2 scst_threads=3 srptthread=1 iops=95990.06 type=randwrite bs=4k drives=2 scst_threads=3 srptthread=1 iops=77487.26 type=randwrite bs=512 drives=3 scst_threads=3 srptthread=1 iops=105945.85 type=randwrite bs=4k drives=3 scst_threads=3 srptthread=1 iops=95389.01 type=randwrite bs=512 drives=1 scst_threads=1 srptthread=0 iops=50299.36 type=randwrite bs=4k drives=1 scst_threads=1 srptthread=0 iops=48070.11 type=randwrite bs=512 drives=2 scst_threads=1 srptthread=0 iops=54017.21 type=randwrite bs=4k drives=2 scst_threads=1 srptthread=0 iops=50407.20 type=randwrite bs=512 drives=3 scst_threads=1 srptthread=0 iops=55822.11 type=randwrite bs=4k drives=3 scst_threads=1 srptthread=0 iops=50447.82 type=randwrite bs=512 drives=1 scst_threads=2 srptthread=0 iops=60672.48 type=randwrite bs=4k drives=1 scst_threads=2 srptthread=0 iops=48811.93 type=randwrite bs=512 drives=2 scst_threads=2 srptthread=0 iops=81919.87 type=randwrite bs=4k drives=2 scst_threads=2 srptthread=0 iops=72912.99 type=randwrite bs=512 drives=3 scst_threads=2 srptthread=0 iops=91036.45 type=randwrite bs=4k drives=3 scst_threads=2 srptthread=0 iops=88994.63 type=randwrite bs=512 drives=1 scst_threads=3 srptthread=0 iops=58929.21 type=randwrite bs=4k drives=1 scst_threads=3 srptthread=0 iops=48698.90 type=randwrite bs=512 drives=2 scst_threads=3 srptthread=0 iops=83967.58 type=randwrite bs=4k drives=2 scst_threads=3 srptthread=0 iops=73932.36 type=randwrite bs=512 drives=3 scst_threads=3 srptthread=0 iops=96686.46 type=randwrite bs=4k drives=3 scst_threads=3 srptthread=0 iops=88689.27 From a.beregalov at gmail.com Tue Nov 4 12:43:57 2008 From: a.beregalov at gmail.com (Alexander Beregalov) Date: Tue, 4 Nov 2008 23:43:57 +0300 Subject: [ofa-general] Re: [PATCH] mlx4/profile.c: fix warning res_name defined but not used In-Reply-To: References: <20081023233255.GB14519@orion> Message-ID: 2008/11/4 Roland Dreier : > Thanks. What if we fix this like the following instead -- change > mlx4_dbg so it always looks to the compiler like it uses all its > parameters? This generates the same code for me, and looks cleaner in > that it actually reduces the amount of #ifdef'ed stuff. Yes, it looks better. > --- > drivers/net/mlx4/mlx4.h | 9 +++------ > 1 files changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h > index fa431fa..56a2e21 100644 > --- a/drivers/net/mlx4/mlx4.h > +++ b/drivers/net/mlx4/mlx4.h > @@ -87,6 +87,9 @@ enum { > > #ifdef CONFIG_MLX4_DEBUG > extern int mlx4_debug_level; > +#else /* CONFIG_MLX4_DEBUG */ > +#define mlx4_debug_level (0) > +#endif /* CONFIG_MLX4_DEBUG */ > > #define mlx4_dbg(mdev, format, arg...) \ > do { \ > @@ -94,12 +97,6 @@ extern int mlx4_debug_level; > dev_printk(KERN_DEBUG, &mdev->pdev->dev, format, ## arg); \ > } while (0) > > -#else /* CONFIG_MLX4_DEBUG */ > - > -#define mlx4_dbg(mdev, format, arg...) do { (void) mdev; } while (0) > - > -#endif /* CONFIG_MLX4_DEBUG */ > - > #define mlx4_err(mdev, format, arg...) \ > dev_err(&mdev->pdev->dev, format, ## arg) > #define mlx4_info(mdev, format, arg...) \ > From cameron at harr.org Tue Nov 4 13:01:32 2008 From: cameron at harr.org (Cameron Harr) Date: Tue, 04 Nov 2008 14:01:32 -0700 Subject: [ofa-general] SRP/mlx4 interrupts throttling performance In-Reply-To: <4910A49B.1050004@harr.org> References: <48E386F6.5040502@fusionio.com> <48E38BAF.5000801@harr.org> <48E6498A.3070002@mellanox.com> <48E65FE0.2060602@harr.org> <48E67ACC.1020903@harr.org> <48E695F9.80703@harr.org> <48E9E681.8090600@vlnb.net> <48EA2F42.80008@harr.org> <48EB8CBC.30303@harr.org> <48EB96C5.2060202@vlnb.net> <48EBA581.4040301@mellanox.com> <48EBA72B.4000909@harr.org> <48EBBDB1.1080203@harr.org> <48EBE6B6.4060804@mellanox.com> <48ECEA4D.7080504@harr.org> <48F79CA9.8090806@vlnb.net> <49022438.9030903@harr.org> <490B45B0.7030208@vlnb.net> <4910A49B.1050004@harr.org> Message-ID: <4910B82C.6070904@harr.org> Cameron Harr wrote: > I tried with the latest scst and srpt as of this morning. Previously, > I had used srpt-1.0.0. The following results are with BLOCKIO, and > I'll have a NULLIO in a bit. You can see from here that I don't hang > any more, but the srpt thread=0 are a little lower. > > As before this run was done with ioengine=libaio and iodepth=16. I > pretty much always get significantly better performance with libaio > than with sync or other engines. Also, the iodepth setting tended to > give me better results. > ---------------------------------------------- > type=randwrite bs=512 drives=1 scst_threads=1 srptthread=1 > iops=67073.48 > type=randwrite bs=4k drives=1 scst_threads=1 srptthread=1 > iops=54876.82 > type=randwrite bs=512 drives=2 scst_threads=1 srptthread=1 > iops=74858.00 > type=randwrite bs=4k drives=2 scst_threads=1 srptthread=1 > iops=75357.15 > type=randwrite bs=512 drives=3 scst_threads=1 srptthread=1 > iops=83257.72 > type=randwrite bs=4k drives=3 scst_threads=1 srptthread=1 > iops=82186.79 > type=randwrite bs=512 drives=1 scst_threads=2 srptthread=1 > iops=59908.06 > type=randwrite bs=4k drives=1 scst_threads=2 srptthread=1 > iops=50982.91 > type=randwrite bs=512 drives=2 scst_threads=2 srptthread=1 > iops=99243.07 > type=randwrite bs=4k drives=2 scst_threads=2 srptthread=1 > iops=79670.62 > type=randwrite bs=512 drives=3 scst_threads=2 srptthread=1 > iops=102898.37 > type=randwrite bs=4k drives=3 scst_threads=2 srptthread=1 > iops=92248.25 > type=randwrite bs=512 drives=1 scst_threads=3 srptthread=1 > iops=63086.77 > type=randwrite bs=4k drives=1 scst_threads=3 srptthread=1 > iops=53020.41 > type=randwrite bs=512 drives=2 scst_threads=3 srptthread=1 > iops=95990.06 > type=randwrite bs=4k drives=2 scst_threads=3 srptthread=1 > iops=77487.26 > type=randwrite bs=512 drives=3 scst_threads=3 srptthread=1 > iops=105945.85 > type=randwrite bs=4k drives=3 scst_threads=3 srptthread=1 > iops=95389.01 > type=randwrite bs=512 drives=1 scst_threads=1 srptthread=0 > iops=50299.36 > type=randwrite bs=4k drives=1 scst_threads=1 srptthread=0 > iops=48070.11 > type=randwrite bs=512 drives=2 scst_threads=1 srptthread=0 > iops=54017.21 > type=randwrite bs=4k drives=2 scst_threads=1 srptthread=0 > iops=50407.20 > type=randwrite bs=512 drives=3 scst_threads=1 srptthread=0 > iops=55822.11 > type=randwrite bs=4k drives=3 scst_threads=1 srptthread=0 > iops=50447.82 > type=randwrite bs=512 drives=1 scst_threads=2 srptthread=0 > iops=60672.48 > type=randwrite bs=4k drives=1 scst_threads=2 srptthread=0 > iops=48811.93 > type=randwrite bs=512 drives=2 scst_threads=2 srptthread=0 > iops=81919.87 > type=randwrite bs=4k drives=2 scst_threads=2 srptthread=0 > iops=72912.99 > type=randwrite bs=512 drives=3 scst_threads=2 srptthread=0 > iops=91036.45 > type=randwrite bs=4k drives=3 scst_threads=2 srptthread=0 > iops=88994.63 > type=randwrite bs=512 drives=1 scst_threads=3 srptthread=0 > iops=58929.21 > type=randwrite bs=4k drives=1 scst_threads=3 srptthread=0 > iops=48698.90 > type=randwrite bs=512 drives=2 scst_threads=3 srptthread=0 > iops=83967.58 > type=randwrite bs=4k drives=2 scst_threads=3 srptthread=0 > iops=73932.36 > type=randwrite bs=512 drives=3 scst_threads=3 srptthread=0 > iops=96686.46 > type=randwrite bs=4k drives=3 scst_threads=3 srptthread=0 > iops=88689.27 > And here are the results with NULLIO, sorted by block size. Having the SRPT thread=0 actually shows some benefit here: ------------------------------------------- type=randwrite bs=4k drives=1 scst_threads=1 srptthread=0 iops=140700.40 type=randwrite bs=4k drives=1 scst_threads=1 srptthread=1 iops=89167.67 type=randwrite bs=4k drives=1 scst_threads=2 srptthread=0 iops=125166.68 type=randwrite bs=4k drives=1 scst_threads=2 srptthread=1 iops=136699.05 type=randwrite bs=4k drives=1 scst_threads=3 srptthread=0 iops=127363.18 type=randwrite bs=4k drives=1 scst_threads=3 srptthread=1 iops=91205.03 type=randwrite bs=4k drives=2 scst_threads=1 srptthread=0 iops=94412.46 type=randwrite bs=4k drives=2 scst_threads=1 srptthread=1 iops=84354.34 type=randwrite bs=4k drives=2 scst_threads=2 srptthread=0 iops=155053.30 type=randwrite bs=4k drives=2 scst_threads=2 srptthread=1 iops=102480.27 type=randwrite bs=4k drives=2 scst_threads=3 srptthread=0 iops=141045.50 type=randwrite bs=4k drives=2 scst_threads=3 srptthread=1 iops=99681.15 type=randwrite bs=4k drives=3 scst_threads=1 srptthread=0 iops=173182.91 type=randwrite bs=4k drives=3 scst_threads=1 srptthread=1 iops=117629.27 type=randwrite bs=4k drives=3 scst_threads=2 srptthread=0 iops=99960.51 type=randwrite bs=4k drives=3 scst_threads=2 srptthread=1 iops=103412.00 type=randwrite bs=4k drives=3 scst_threads=3 srptthread=0 iops=120926.77 type=randwrite bs=4k drives=3 scst_threads=3 srptthread=1 iops=100368.32 type=randwrite bs=512 drives=1 scst_threads=1 srptthread=0 iops=102232.77 type=randwrite bs=512 drives=1 scst_threads=1 srptthread=1 iops=139095.94 type=randwrite bs=512 drives=1 scst_threads=2 srptthread=0 iops=130327.29 type=randwrite bs=512 drives=1 scst_threads=2 srptthread=1 iops=159158.20 type=randwrite bs=512 drives=1 scst_threads=3 srptthread=0 iops=136153.84 type=randwrite bs=512 drives=1 scst_threads=3 srptthread=1 iops=92417.19 type=randwrite bs=512 drives=2 scst_threads=1 srptthread=0 iops=126892.60 type=randwrite bs=512 drives=2 scst_threads=1 srptthread=1 iops=99436.74 type=randwrite bs=512 drives=2 scst_threads=2 srptthread=0 iops=101566.13 type=randwrite bs=512 drives=2 scst_threads=2 srptthread=1 iops=142292.97 type=randwrite bs=512 drives=2 scst_threads=3 srptthread=0 iops=166114.78 type=randwrite bs=512 drives=2 scst_threads=3 srptthread=1 iops=155634.89 type=randwrite bs=512 drives=3 scst_threads=1 srptthread=0 iops=131368.01 type=randwrite bs=512 drives=3 scst_threads=1 srptthread=1 iops=186550.24 type=randwrite bs=512 drives=3 scst_threads=2 srptthread=0 iops=139813.79 type=randwrite bs=512 drives=3 scst_threads=2 srptthread=1 iops=162499.08 type=randwrite bs=512 drives=3 scst_threads=3 srptthread=0 iops=154777.28 type=randwrite bs=512 drives=3 scst_threads=3 srptthread=1 iops=187425.87 From kelly at tradebotsystems.com Tue Nov 4 13:24:31 2008 From: kelly at tradebotsystems.com (Kelly Burkhart) Date: Tue, 4 Nov 2008 15:24:31 -0600 Subject: [ofa-general] infiniband multicast (libibverbs) Message-ID: <98B0CDCB28A5EE4CB3678CD99406644E343488@tbmail2.tradebot.com> > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > > > If I specify more than about 6500 messages/sec, I skip > some sequences > > and receive others multiple times. I always receive the > same number of > > messages the client sent. It appears as though all of the > messages come > > through, but I'm missing some and reading others twice. > > Sounds like a bug in your code -- I don't know why you would see > duplicate messages unless you are somehow processing the same receive > buffer twice or something like that. I am (or was) processing the same buffer over and over. I ripped from ib_send_lat which does the same thing. The difference is send_lat waits for a reply before sending a second message. I'm sending rapidly without waiting for a reply. The surprising thing to me was that my recv buffer received data ahead of me waiting on the cq. I modified my code to read into a circular list of buffers which appears to have solved the problem at the cost of more memory usage. Thanks, -K From rdreier at cisco.com Tue Nov 4 13:37:00 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 04 Nov 2008 13:37:00 -0800 Subject: [ofa-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get the following changes; mostly low-level hardware changes plus a couple of small patches that fix IPoIB crashes. Chien Tung (2): RDMA/nes: Correct handling of PBL resources RDMA/nes: Mitigate compatibility issue regarding PCIe write credits Ilpo Järvinen (1): RDMA/nes: Reindent mis-indented spinlocks Ralph Campbell (1): IB/ipath: Fix RDMA write with immediate copy of last packet Roland Dreier (3): RDMA/cxgb3: Fix too-big reserved field zeroing in iwch_post_zb_read() mlx4_core: Fix unused variable warning Merge branches 'cxgb3', 'ehca', 'ipath', 'ipoib', 'mlx4' and 'nes' into for-next Stefan Roscher (1): IB/ehca: Remove reference to special QP in case of port activation failure Vadim Makhervaks (1): RDMA/nes: Fix CQ allocation scheme for multicast receive queue apps Yossi Etigin (3): IPoIB: Don't enable napi when it's already enabled IPoIB: Fix hang in ipoib_flush_paths() IPoIB: Fix crash in path_rec_completion() drivers/infiniband/hw/cxgb3/iwch_qp.c | 1 - drivers/infiniband/hw/ehca/ehca_irq.c | 7 ++- drivers/infiniband/hw/ehca/ehca_qp.c | 5 ++ drivers/infiniband/hw/ipath/ipath_ruc.c | 10 ++-- drivers/infiniband/hw/nes/nes.c | 16 +++++++ drivers/infiniband/hw/nes/nes_hw.h | 1 + drivers/infiniband/hw/nes/nes_verbs.c | 64 +++++++++++++++++++--------- drivers/infiniband/ulp/ipoib/ipoib_main.c | 6 ++- drivers/net/mlx4/mlx4.h | 9 +--- 9 files changed, 82 insertions(+), 37 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index 3e4585c..19661b2 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -745,7 +745,6 @@ int iwch_post_zb_read(struct iwch_qp *qhp) wqe->read.rdmaop = T3_READ_REQ; wqe->read.reserved[0] = 0; wqe->read.reserved[1] = 0; - wqe->read.reserved[2] = 0; wqe->read.rem_stag = cpu_to_be32(1); wqe->read.rem_to = cpu_to_be64(1); wqe->read.local_stag = cpu_to_be32(1); diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index cb55be0..9e43459 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -370,6 +370,10 @@ static void parse_ec(struct ehca_shca *shca, u64 eqe) switch (ec) { case 0x30: /* port availability change */ if (EHCA_BMASK_GET(NEQE_PORT_AVAILABILITY, eqe)) { + /* only for autodetect mode important */ + if (ehca_nr_ports >= 0) + break; + int suppress_event; /* replay modify_qp for sqps */ spin_lock_irqsave(&sport->mod_sqp_lock, flags); @@ -387,8 +391,7 @@ static void parse_ec(struct ehca_shca *shca, u64 eqe) sport->port_state = IB_PORT_ACTIVE; dispatch_port_event(shca, port, IB_EVENT_PORT_ACTIVE, "is active"); - ehca_query_sma_attr(shca, port, - &sport->saved_attr); + ehca_query_sma_attr(shca, port, &sport->saved_attr); } else { sport->port_state = IB_PORT_DOWN; dispatch_port_event(shca, port, IB_EVENT_PORT_ERR, diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c index 4d54b9f..9e05ee2 100644 --- a/drivers/infiniband/hw/ehca/ehca_qp.c +++ b/drivers/infiniband/hw/ehca/ehca_qp.c @@ -860,6 +860,11 @@ static struct ehca_qp *internal_create_qp( if (qp_type == IB_QPT_GSI) { h_ret = ehca_define_sqp(shca, my_qp, init_attr); if (h_ret != H_SUCCESS) { + kfree(my_qp->mod_qp_parm); + my_qp->mod_qp_parm = NULL; + /* the QP pointer is no longer valid */ + shca->sport[init_attr->port_num - 1].ibqp_sqp[qp_type] = + NULL; ret = ehca2ib_return_code(h_ret); goto create_qp_exit6; } diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c index fc0f6d9..2296832 100644 --- a/drivers/infiniband/hw/ipath/ipath_ruc.c +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c @@ -156,7 +156,7 @@ bail: /** * ipath_get_rwqe - copy the next RWQE into the QP's RWQE * @qp: the QP - * @wr_id_only: update wr_id only, not SGEs + * @wr_id_only: update qp->r_wr_id only, not qp->r_sge * * Return 0 if no RWQE is available, otherwise return 1. * @@ -173,8 +173,6 @@ int ipath_get_rwqe(struct ipath_qp *qp, int wr_id_only) u32 tail; int ret; - qp->r_sge.sg_list = qp->r_sg_list; - if (qp->ibqp.srq) { srq = to_isrq(qp->ibqp.srq); handler = srq->ibsrq.event_handler; @@ -206,8 +204,10 @@ int ipath_get_rwqe(struct ipath_qp *qp, int wr_id_only) wqe = get_rwqe_ptr(rq, tail); if (++tail >= rq->size) tail = 0; - } while (!wr_id_only && !ipath_init_sge(qp, wqe, &qp->r_len, - &qp->r_sge)); + if (wr_id_only) + break; + qp->r_sge.sg_list = qp->r_sg_list; + } while (!ipath_init_sge(qp, wqe, &qp->r_len, &qp->r_sge)); qp->r_wr_id = wqe->wr_id; wq->tail = tail; diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c index a2b04d6..aa1dc41 100644 --- a/drivers/infiniband/hw/nes/nes.c +++ b/drivers/infiniband/hw/nes/nes.c @@ -95,6 +95,10 @@ unsigned int wqm_quanta = 0x10000; module_param(wqm_quanta, int, 0644); MODULE_PARM_DESC(wqm_quanta, "WQM quanta"); +static unsigned int limit_maxrdreqsz; +module_param(limit_maxrdreqsz, bool, 0644); +MODULE_PARM_DESC(limit_maxrdreqsz, "Limit max read request size to 256 Bytes"); + LIST_HEAD(nes_adapter_list); static LIST_HEAD(nes_dev_list); @@ -588,6 +592,18 @@ static int __devinit nes_probe(struct pci_dev *pcidev, const struct pci_device_i nesdev->nesadapter->port_count; } + if ((limit_maxrdreqsz || + ((nesdev->nesadapter->phy_type[0] == NES_PHY_TYPE_GLADIUS) && + (hw_rev == NE020_REV1))) && + (pcie_get_readrq(pcidev) > 256)) { + if (pcie_set_readrq(pcidev, 256)) + printk(KERN_ERR PFX "Unable to set max read request" + " to 256 bytes\n"); + else + nes_debug(NES_DBG_INIT, "Max read request size set" + " to 256 bytes\n"); + } + tasklet_init(&nesdev->dpc_tasklet, nes_dpc, (unsigned long)nesdev); /* bring up the Control QP */ diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h index 610b9d8..bc0b4de 100644 --- a/drivers/infiniband/hw/nes/nes_hw.h +++ b/drivers/infiniband/hw/nes/nes_hw.h @@ -40,6 +40,7 @@ #define NES_PHY_TYPE_ARGUS 4 #define NES_PHY_TYPE_PUMA_1G 5 #define NES_PHY_TYPE_PUMA_10G 6 +#define NES_PHY_TYPE_GLADIUS 7 #define NES_MULTICAST_PF_MAX 8 diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index 932e56f..d36c9a0 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -220,14 +220,14 @@ static int nes_bind_mw(struct ib_qp *ibqp, struct ib_mw *ibmw, if (nesqp->ibqp_state > IB_QPS_RTS) return -EINVAL; - spin_lock_irqsave(&nesqp->lock, flags); + spin_lock_irqsave(&nesqp->lock, flags); head = nesqp->hwqp.sq_head; qsize = nesqp->hwqp.sq_tail; /* Check for SQ overflow */ if (((head + (2 * qsize) - nesqp->hwqp.sq_tail) % qsize) == (qsize - 1)) { - spin_unlock_irqrestore(&nesqp->lock, flags); + spin_unlock_irqrestore(&nesqp->lock, flags); return -EINVAL; } @@ -269,7 +269,7 @@ static int nes_bind_mw(struct ib_qp *ibqp, struct ib_mw *ibmw, nes_write32(nesdev->regs+NES_WQE_ALLOC, (1 << 24) | 0x00800000 | nesqp->hwqp.qp_id); - spin_unlock_irqrestore(&nesqp->lock, flags); + spin_unlock_irqrestore(&nesqp->lock, flags); return 0; } @@ -349,7 +349,7 @@ static struct ib_fmr *nes_alloc_fmr(struct ib_pd *ibpd, if (nesfmr->nesmr.pbls_used > nesadapter->free_4kpbl) { spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); ret = -ENOMEM; - goto failed_vpbl_alloc; + goto failed_vpbl_avail; } else { nesadapter->free_4kpbl -= nesfmr->nesmr.pbls_used; } @@ -357,7 +357,7 @@ static struct ib_fmr *nes_alloc_fmr(struct ib_pd *ibpd, if (nesfmr->nesmr.pbls_used > nesadapter->free_256pbl) { spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); ret = -ENOMEM; - goto failed_vpbl_alloc; + goto failed_vpbl_avail; } else { nesadapter->free_256pbl -= nesfmr->nesmr.pbls_used; } @@ -391,14 +391,14 @@ static struct ib_fmr *nes_alloc_fmr(struct ib_pd *ibpd, goto failed_vpbl_alloc; } - nesfmr->root_vpbl.leaf_vpbl = kzalloc(sizeof(*nesfmr->root_vpbl.leaf_vpbl)*1024, GFP_KERNEL); + nesfmr->leaf_pbl_cnt = nesfmr->nesmr.pbls_used-1; + nesfmr->root_vpbl.leaf_vpbl = kzalloc(sizeof(*nesfmr->root_vpbl.leaf_vpbl)*1024, GFP_ATOMIC); if (!nesfmr->root_vpbl.leaf_vpbl) { spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); ret = -ENOMEM; goto failed_leaf_vpbl_alloc; } - nesfmr->leaf_pbl_cnt = nesfmr->nesmr.pbls_used-1; nes_debug(NES_DBG_MR, "two level pbl, root_vpbl.pbl_vbase=%p" " leaf_pbl_cnt=%d root_vpbl.leaf_vpbl=%p\n", nesfmr->root_vpbl.pbl_vbase, nesfmr->leaf_pbl_cnt, nesfmr->root_vpbl.leaf_vpbl); @@ -519,6 +519,16 @@ static struct ib_fmr *nes_alloc_fmr(struct ib_pd *ibpd, nesfmr->root_vpbl.pbl_pbase); failed_vpbl_alloc: + if (nesfmr->nesmr.pbls_used != 0) { + spin_lock_irqsave(&nesadapter->pbl_lock, flags); + if (nesfmr->nesmr.pbl_4k) + nesadapter->free_4kpbl += nesfmr->nesmr.pbls_used; + else + nesadapter->free_256pbl += nesfmr->nesmr.pbls_used; + spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); + } + +failed_vpbl_avail: kfree(nesfmr); failed_fmr_alloc: @@ -534,18 +544,14 @@ static struct ib_fmr *nes_alloc_fmr(struct ib_pd *ibpd, */ static int nes_dealloc_fmr(struct ib_fmr *ibfmr) { + unsigned long flags; struct nes_mr *nesmr = to_nesmr_from_ibfmr(ibfmr); struct nes_fmr *nesfmr = to_nesfmr(nesmr); struct nes_vnic *nesvnic = to_nesvnic(ibfmr->device); struct nes_device *nesdev = nesvnic->nesdev; - struct nes_mr temp_nesmr = *nesmr; + struct nes_adapter *nesadapter = nesdev->nesadapter; int i = 0; - temp_nesmr.ibmw.device = ibfmr->device; - temp_nesmr.ibmw.pd = ibfmr->pd; - temp_nesmr.ibmw.rkey = ibfmr->rkey; - temp_nesmr.ibmw.uobject = NULL; - /* free the resources */ if (nesfmr->leaf_pbl_cnt == 0) { /* single PBL case */ @@ -561,8 +567,24 @@ static int nes_dealloc_fmr(struct ib_fmr *ibfmr) pci_free_consistent(nesdev->pcidev, 8192, nesfmr->root_vpbl.pbl_vbase, nesfmr->root_vpbl.pbl_pbase); } + nesmr->ibmw.device = ibfmr->device; + nesmr->ibmw.pd = ibfmr->pd; + nesmr->ibmw.rkey = ibfmr->rkey; + nesmr->ibmw.uobject = NULL; - return nes_dealloc_mw(&temp_nesmr.ibmw); + if (nesfmr->nesmr.pbls_used != 0) { + spin_lock_irqsave(&nesadapter->pbl_lock, flags); + if (nesfmr->nesmr.pbl_4k) { + nesadapter->free_4kpbl += nesfmr->nesmr.pbls_used; + WARN_ON(nesadapter->free_4kpbl > nesadapter->max_4kpbl); + } else { + nesadapter->free_256pbl += nesfmr->nesmr.pbls_used; + WARN_ON(nesadapter->free_256pbl > nesadapter->max_256pbl); + } + spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); + } + + return nes_dealloc_mw(&nesmr->ibmw); } @@ -1595,7 +1617,7 @@ static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries, nes_ucontext->mcrqf = req.mcrqf; if (nes_ucontext->mcrqf) { if (nes_ucontext->mcrqf & 0x80000000) - nescq->hw_cq.cq_number = nesvnic->nic.qp_id + 12 + (nes_ucontext->mcrqf & 0xf) - 1; + nescq->hw_cq.cq_number = nesvnic->nic.qp_id + 28 + 2 * ((nes_ucontext->mcrqf & 0xf) - 1); else if (nes_ucontext->mcrqf & 0x40000000) nescq->hw_cq.cq_number = nes_ucontext->mcrqf & 0xffff; else @@ -3212,7 +3234,7 @@ static int nes_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr, if (nesqp->ibqp_state > IB_QPS_RTS) return -EINVAL; - spin_lock_irqsave(&nesqp->lock, flags); + spin_lock_irqsave(&nesqp->lock, flags); head = nesqp->hwqp.sq_head; @@ -3337,7 +3359,7 @@ static int nes_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr, (counter << 24) | 0x00800000 | nesqp->hwqp.qp_id); } - spin_unlock_irqrestore(&nesqp->lock, flags); + spin_unlock_irqrestore(&nesqp->lock, flags); if (err) *bad_wr = ib_wr; @@ -3368,7 +3390,7 @@ static int nes_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *ib_wr, if (nesqp->ibqp_state > IB_QPS_RTS) return -EINVAL; - spin_lock_irqsave(&nesqp->lock, flags); + spin_lock_irqsave(&nesqp->lock, flags); head = nesqp->hwqp.rq_head; @@ -3421,7 +3443,7 @@ static int nes_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *ib_wr, nes_write32(nesdev->regs+NES_WQE_ALLOC, (counter<<24) | nesqp->hwqp.qp_id); } - spin_unlock_irqrestore(&nesqp->lock, flags); + spin_unlock_irqrestore(&nesqp->lock, flags); if (err) *bad_wr = ib_wr; @@ -3453,7 +3475,7 @@ static int nes_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry) nes_debug(NES_DBG_CQ, "\n"); - spin_lock_irqsave(&nescq->lock, flags); + spin_lock_irqsave(&nescq->lock, flags); head = nescq->hw_cq.cq_head; cq_size = nescq->hw_cq.cq_size; @@ -3562,7 +3584,7 @@ static int nes_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry) nes_debug(NES_DBG_CQ, "Reporting %u completions for CQ%u.\n", cqe_count, nescq->hw_cq.cq_number); - spin_unlock_irqrestore(&nescq->lock, flags); + spin_unlock_irqrestore(&nescq->lock, flags); return cqe_count; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index fddded7..85257f6 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -106,12 +106,13 @@ int ipoib_open(struct net_device *dev) ipoib_dbg(priv, "bringing up interface\n"); - napi_enable(&priv->napi); set_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags); if (ipoib_pkey_dev_delay_open(dev)) return 0; + napi_enable(&priv->napi); + if (ipoib_ib_dev_open(dev)) { napi_disable(&priv->napi); return -EINVAL; @@ -546,6 +547,7 @@ static int path_rec_start(struct net_device *dev, if (path->query_id < 0) { ipoib_warn(priv, "ib_sa_path_rec_get failed: %d\n", path->query_id); path->query = NULL; + complete(&path->done); return path->query_id; } @@ -662,7 +664,7 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, skb_push(skb, sizeof *phdr); __skb_queue_tail(&path->queue, skb); - if (path_rec_start(dev, path)) { + if (!path->query && path_rec_start(dev, path)) { spin_unlock_irqrestore(&priv->lock, flags); path_free(dev, path); return; diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index fa431fa..56a2e21 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -87,6 +87,9 @@ enum { #ifdef CONFIG_MLX4_DEBUG extern int mlx4_debug_level; +#else /* CONFIG_MLX4_DEBUG */ +#define mlx4_debug_level (0) +#endif /* CONFIG_MLX4_DEBUG */ #define mlx4_dbg(mdev, format, arg...) \ do { \ @@ -94,12 +97,6 @@ extern int mlx4_debug_level; dev_printk(KERN_DEBUG, &mdev->pdev->dev, format, ## arg); \ } while (0) -#else /* CONFIG_MLX4_DEBUG */ - -#define mlx4_dbg(mdev, format, arg...) do { (void) mdev; } while (0) - -#endif /* CONFIG_MLX4_DEBUG */ - #define mlx4_err(mdev, format, arg...) \ dev_err(&mdev->pdev->dev, format, ## arg) #define mlx4_info(mdev, format, arg...) \ From kelly at tradebotsystems.com Tue Nov 4 13:37:36 2008 From: kelly at tradebotsystems.com (Kelly Burkhart) Date: Tue, 4 Nov 2008 15:37:36 -0600 Subject: [ofa-general] infiniband multicast (libibverbs) Message-ID: <98B0CDCB28A5EE4CB3678CD99406644E343489@tbmail2.tradebot.com> > -----Original Message----- > From: Dotan Barak [mailto:dotanba at gmail.com] > > Kelly Burkhart wrote: > > If I specify more than about 6500 messages/sec, I skip some > sequences > > and receive others multiple times. I always receive the > same number of > > messages the client sent. It appears as though all of the > messages come > > through, but I'm missing some and reading others twice. > > > Do you use the "volatile" when you access the pointed memory buffer? I do not. I noticed this with the post_buf and poll_buf variables in pingpong_context, but they're not used in send_lat. I assumed they only replied to RDMA. Do I need to be using volatile anywhere with UD send? > > I suspect that there is some trick to more reliable > multicast messaging > > that I don't know about. Does anyone have hints for > multicasting high > > message rates with a small percentage of drops or misses? > > > Do you have worst results than the ib_send_bw.c? > Can you try to send unicast messages (with minimum changes) to see if > the issue is related to multicast send? > > Anyway, you should remember that multicast messages are being > sent over > UD QPs and messages can be dropped. I solved (or hid) my problem by recv-ing into multiple buffers. I do realize that multicast messages can be dropped, but I want to know what level of one-way reliability and message rate I can achieve. Since I was receiving all messages before, I don't think my results were different than ib_send_bw. My problem was not realizing that my buffer could be clobbered prior to me polling the cq for the work completion. Thanks, -K From devesh28 at gmail.com Tue Nov 4 21:46:36 2008 From: devesh28 at gmail.com (Devesh Sharma) Date: Wed, 5 Nov 2008 11:16:36 +0530 Subject: [ofa-general] infiniband multicast (libibverbs) In-Reply-To: <98B0CDCB28A5EE4CB3678CD99406644E343488@tbmail2.tradebot.com> References: <98B0CDCB28A5EE4CB3678CD99406644E343488@tbmail2.tradebot.com> Message-ID: <309a667c0811042146m56c1a1d4od0e03a823c4ff098@mail.gmail.com> are you taking care that ibv_poll_cq is not a blocking call, I mean you are not considering it as blocking call and just going ahead with the sequence number check? On 11/5/08, Kelly Burkhart wrote: > > > > > -----Original Message----- > > From: Roland Dreier [mailto:rdreier at cisco.com] > > > > > If I specify more than about 6500 messages/sec, I skip > > some sequences > > > and receive others multiple times. I always receive the > > same number of > > > messages the client sent. It appears as though all of the > > messages come > > > through, but I'm missing some and reading others twice. > > > > Sounds like a bug in your code -- I don't know why you would see > > duplicate messages unless you are somehow processing the same receive > > buffer twice or something like that. > > I am (or was) processing the same buffer over and over. I ripped > from ib_send_lat which does the same thing. The difference is > send_lat waits for a reply before sending a second message. I'm > sending rapidly without waiting for a reply. The surprising thing > to me was that my recv buffer received data ahead of me waiting on > the cq. > > I modified my code to read into a circular list of buffers which > appears to have solved the problem at the cost of more memory usage. > > Thanks, > > -K > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > -------------- next part -------------- An HTML attachment was scrubbed... URL: From devesh28 at gmail.com Tue Nov 4 21:49:19 2008 From: devesh28 at gmail.com (Devesh Sharma) Date: Wed, 5 Nov 2008 11:19:19 +0530 Subject: [ofa-general] infiniband multicast (libibverbs) In-Reply-To: <309a667c0811042146m56c1a1d4od0e03a823c4ff098@mail.gmail.com> References: <98B0CDCB28A5EE4CB3678CD99406644E343488@tbmail2.tradebot.com> <309a667c0811042146m56c1a1d4od0e03a823c4ff098@mail.gmail.com> Message-ID: <309a667c0811042149x498f4f9fhfa74330a94ce59ea@mail.gmail.com> Correction in my post : I mean you are not considering it as non-blocking call (not taking care of this behaviour) and just going ahead with the sequence number check? On 11/5/08, Devesh Sharma wrote: > > are you taking care that ibv_poll_cq is not a blocking call, I mean you are > not considering it as blocking call and just going ahead with the sequence > number check? > > On 11/5/08, Kelly Burkhart wrote: >> >> >> >> > -----Original Message----- >> > From: Roland Dreier [mailto:rdreier at cisco.com] >> > >> > > If I specify more than about 6500 messages/sec, I skip >> > some sequences >> > > and receive others multiple times. I always receive the >> > same number of >> > > messages the client sent. It appears as though all of the >> > messages come >> > > through, but I'm missing some and reading others twice. >> > >> > Sounds like a bug in your code -- I don't know why you would see >> > duplicate messages unless you are somehow processing the same receive >> > buffer twice or something like that. >> >> I am (or was) processing the same buffer over and over. I ripped >> from ib_send_lat which does the same thing. The difference is >> send_lat waits for a reply before sending a second message. I'm >> sending rapidly without waiting for a reply. The surprising thing >> to me was that my recv buffer received data ahead of me waiting on >> the cq. >> >> I modified my code to read into a circular list of buffers which >> appears to have solved the problem at the cost of more memory usage. >> >> Thanks, >> >> -K >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at lists.openfabrics.org Wed Nov 5 03:20:53 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Wed, 5 Nov 2008 03:20:53 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081105-0200 daily build status Message-ID: <20081105112053.5EAACE60CB2@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.27 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From jackm at dev.mellanox.co.il Wed Nov 5 04:44:01 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Wed, 5 Nov 2008 14:44:01 +0200 Subject: [ofa-general] [PATCH V2] mlx4: save default port ib capabilities, and use when setting port type to IB. In-Reply-To: References: <200811041214.39085.jackm@dev.mellanox.co.il> Message-ID: <200811051444.02306.jackm@dev.mellanox.co.il> mlx4: save default port ib capabilities, and use when setting port type to IB. Commit 7ff93f8b7... introduced support for different port types. As part of that support, SET_PORT is invoked to set the port type during driver startup. However, as a side-effect, for IB ports the invocation of this command also sets the port capability mask to zero (losing the default configuration values set by FW). To fix this, get the default ib port capabilities (via a MAD_IFC Port Info query) during driver startup, and save them for use in the mlx4_SET_PORT command when setting the port-type to Infiniband. This patch does not require a firmware modification to the ConnectX SET_PORT command (per Roland's feedback on previous proposed fix). This patch fixes bugzilla 1183 (which occurred because the IsTrapSupported bit in the capability mask was zeroed). Signed-off-by: Jack Morgenstein diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 468921b..90a0281 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -753,6 +753,7 @@ static int mlx4_setup_hca(struct mlx4_dev *dev) struct mlx4_priv *priv = mlx4_priv(dev); int err; int port; + __be32 ib_port_default_caps; err = mlx4_init_uar_table(dev); if (err) { @@ -852,6 +853,13 @@ static int mlx4_setup_hca(struct mlx4_dev *dev) } for (port = 1; port <= dev->caps.num_ports; port++) { + ib_port_default_caps = 0; + err = mlx4_get_port_ib_caps(dev, port, &ib_port_default_caps); + if (err) + mlx4_warn(dev, "failed to get port %d default " + "ib capabilities (%d). Continuing with " + "caps = 0\n", port, err); + dev->caps.ib_port_def_cap[port] = ib_port_default_caps; err = mlx4_SET_PORT(dev, port); if (err) { mlx4_err(dev, "Failed to set port %d, aborting\n", diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index fa431fa..183ab9d 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -388,5 +388,6 @@ void mlx4_init_mac_table(struct mlx4_dev *dev, struct mlx4_mac_table *table); void mlx4_init_vlan_table(struct mlx4_dev *dev, struct mlx4_vlan_table *table); int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port); +int mlx4_get_port_ib_caps(struct mlx4_dev *dev, u8 port, __be32 *caps); #endif /* MLX4_H */ diff --git a/drivers/net/mlx4/port.c b/drivers/net/mlx4/port.c index e2fdab4..0a057e5 100644 --- a/drivers/net/mlx4/port.c +++ b/drivers/net/mlx4/port.c @@ -258,6 +258,42 @@ out: } EXPORT_SYMBOL_GPL(mlx4_unregister_vlan); +int mlx4_get_port_ib_caps(struct mlx4_dev *dev, u8 port, __be32 *caps) +{ + struct mlx4_cmd_mailbox *inmailbox, *outmailbox; + u8 *inbuf, *outbuf; + int err; + + inmailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(inmailbox)) + return PTR_ERR(inmailbox); + + outmailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(outmailbox)) { + mlx4_free_cmd_mailbox(dev, inmailbox); + return PTR_ERR(outmailbox); + } + + inbuf = inmailbox->buf; + outbuf = outmailbox->buf; + memset(inbuf, 0, 256); + memset(outbuf, 0, 256); + inbuf[0] = 1; + inbuf[1] = 1; + inbuf[2] = 1; + inbuf[3] = 1; + *(__be16 *) (&inbuf[16]) = cpu_to_be16(0x0015); + *(__be32 *) (&inbuf[20]) = cpu_to_be32(port); + + err = mlx4_cmd_box(dev, inmailbox->dma, outmailbox->dma, port, 3, + MLX4_CMD_MAD_IFC, MLX4_CMD_TIME_CLASS_C); + if (!err) + *caps = *(__be32 *) (outbuf + 84); + mlx4_free_cmd_mailbox(dev, inmailbox); + mlx4_free_cmd_mailbox(dev, outmailbox); + return err; +} + int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port) { struct mlx4_cmd_mailbox *mailbox; @@ -273,7 +309,8 @@ int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port) ((u8 *) mailbox->buf)[3] = 6; ((__be16 *) mailbox->buf)[4] = cpu_to_be16(1 << 15); ((__be16 *) mailbox->buf)[6] = cpu_to_be16(1 << 15); - } + } else + ((__be32 *) mailbox->buf)[1] = dev->caps.ib_port_def_cap[port]; err = mlx4_cmd(dev, mailbox->dma, port, is_eth, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index bd9977b..371086f 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -179,6 +179,7 @@ struct mlx4_caps { int num_ports; int vl_cap[MLX4_MAX_PORTS + 1]; int ib_mtu_cap[MLX4_MAX_PORTS + 1]; + __be32 ib_port_def_cap[MLX4_MAX_PORTS + 1]; u64 def_mac[MLX4_MAX_PORTS + 1]; int eth_mtu_cap[MLX4_MAX_PORTS + 1]; int gid_table_len[MLX4_MAX_PORTS + 1]; From vlad at mellanox.co.il Wed Nov 5 04:57:34 2008 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 5 Nov 2008 14:57:34 +0200 Subject: [ofa-general] [PATCH] IB/mlx4: Set umem field to NULL in mlx4_ib_alloc_fast_reg_mr to avoid oops Message-ID: <20081105125734.GA23862@mellanox.co.il> Set mr->umem to NULL in mlx4_ib_alloc_fast_reg_mr, to avoid invoking ib_umem_release() during ib_dereg_mr() and getting a kernel oops. Signed-off-by: Vladimir Sokolovsky --- drivers/infiniband/hw/mlx4/mr.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index 87f5c5a..8e4d26d 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -205,6 +205,7 @@ struct ib_mr *mlx4_ib_alloc_fast_reg_mr(struct ib_pd *pd, goto err_mr; mr->ibmr.rkey = mr->ibmr.lkey = mr->mmr.key; + mr->umem = NULL; return &mr->ibmr; -- 1.5.6.3 From kelly at tradebotsystems.com Wed Nov 5 05:53:15 2008 From: kelly at tradebotsystems.com (Kelly Burkhart) Date: Wed, 5 Nov 2008 07:53:15 -0600 Subject: [ofa-general] infiniband multicast (libibverbs) Message-ID: <98B0CDCB28A5EE4CB3678CD99406644E34348A@tbmail2.tradebot.com> It is non-blocking. I spin, calling ibv_poll_cq until it returns a non-zero. ________________________________ From: Devesh Sharma [mailto:devesh28 at gmail.com] Sent: Tuesday, November 04, 2008 11:49 PM To: Kelly Burkhart Cc: Roland Dreier; general at lists.openfabrics.org Subject: Re: [ofa-general] infiniband multicast (libibverbs) Correction in my post : I mean you are not considering it as non-blocking call (not taking care of this behaviour) and just going ahead with the sequence number check? On 11/5/08, Devesh Sharma wrote: are you taking care that ibv_poll_cq is not a blocking call, I mean you are not considering it as blocking call and just going ahead with the sequence number check? -------------- next part -------------- An HTML attachment was scrubbed... URL: From yevgenyp at mellanox.co.il Wed Nov 5 06:48:36 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 05 Nov 2008 16:48:36 +0200 Subject: [ofa-general] [PATCH] mlx4_en: Pause parameters per port Message-ID: <4911B244.30205@mellanox.co.il> Before the change the driver reported the same pause parameters for all the ports, even only one of them was modified. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/en_netdev.c | 8 ++++---- drivers/net/mlx4/en_params.c | 30 ++++++++++++++++-------------- drivers/net/mlx4/mlx4_en.h | 8 ++++---- 3 files changed, 24 insertions(+), 22 deletions(-) diff --git a/drivers/net/mlx4/en_netdev.c b/drivers/net/mlx4/en_netdev.c index a339afb..12d736a 100644 --- a/drivers/net/mlx4/en_netdev.c +++ b/drivers/net/mlx4/en_netdev.c @@ -656,10 +656,10 @@ static int mlx4_en_start_port(struct net_device *dev) /* Configure port */ err = mlx4_SET_PORT_general(mdev->dev, priv->port, priv->rx_skb_size + ETH_FCS_LEN, - mdev->profile.tx_pause, - mdev->profile.tx_ppp, - mdev->profile.rx_pause, - mdev->profile.rx_ppp); + priv->prof->tx_pause, + priv->prof->tx_ppp, + priv->prof->rx_pause, + priv->prof->rx_ppp); if (err) { mlx4_err(mdev, "Failed setting port general configurations" " for port %d, with error %d\n", priv->port, err); diff --git a/drivers/net/mlx4/en_params.c b/drivers/net/mlx4/en_params.c index c2e69b1..95706ee 100644 --- a/drivers/net/mlx4/en_params.c +++ b/drivers/net/mlx4/en_params.c @@ -90,6 +90,7 @@ MLX4_EN_PARM_INT(rx_ring_size2, MLX4_EN_AUTO_CONF, "Rx ring size for port 2"); int mlx4_en_get_profile(struct mlx4_en_dev *mdev) { struct mlx4_en_profile *params = &mdev->profile; + int i; params->rx_moder_cnt = min_t(int, rx_moder_cnt, MLX4_EN_AUTO_CONF); params->rx_moder_time = min_t(int, rx_moder_time, MLX4_EN_AUTO_CONF); @@ -97,11 +98,13 @@ int mlx4_en_get_profile(struct mlx4_en_dev *mdev) params->rss_xor = (rss_xor != 0); params->rss_mask = rss_mask & 0x1f; params->num_lro = min_t(int, num_lro , MLX4_EN_MAX_LRO_DESCRIPTORS); - params->rx_pause = pprx; - params->rx_ppp = pfcrx; - params->tx_pause = pptx; - params->tx_ppp = pfctx; - if (params->rx_ppp || params->tx_ppp) { + for (i = 1; i <= MLX4_MAX_PORTS; i++) { + params->prof[i].rx_pause = pprx; + params->prof[i].rx_ppp = pfcrx; + params->prof[i].tx_pause = pptx; + params->prof[i].tx_ppp = pfctx; + } + if (pfcrx || pfctx) { params->prof[1].tx_ring_num = MLX4_EN_TX_RING_NUM; params->prof[2].tx_ring_num = MLX4_EN_TX_RING_NUM; } else { @@ -407,14 +410,14 @@ static int mlx4_en_set_pauseparam(struct net_device *dev, struct mlx4_en_dev *mdev = priv->mdev; int err; - mdev->profile.tx_pause = pause->tx_pause != 0; - mdev->profile.rx_pause = pause->rx_pause != 0; + priv->prof->tx_pause = pause->tx_pause != 0; + priv->prof->rx_pause = pause->rx_pause != 0; err = mlx4_SET_PORT_general(mdev->dev, priv->port, priv->rx_skb_size + ETH_FCS_LEN, - mdev->profile.tx_pause, - mdev->profile.tx_ppp, - mdev->profile.rx_pause, - mdev->profile.rx_ppp); + priv->prof->tx_pause, + priv->prof->tx_ppp, + priv->prof->rx_pause, + priv->prof->rx_ppp); if (err) mlx4_err(mdev, "Failed setting pause params to\n"); @@ -425,10 +428,9 @@ static void mlx4_en_get_pauseparam(struct net_device *dev, struct ethtool_pauseparam *pause) { struct mlx4_en_priv *priv = netdev_priv(dev); - struct mlx4_en_dev *mdev = priv->mdev; - pause->tx_pause = mdev->profile.tx_pause; - pause->rx_pause = mdev->profile.rx_pause; + pause->tx_pause = priv->prof->tx_pause; + pause->rx_pause = priv->prof->rx_pause; } static void mlx4_en_get_ringparam(struct net_device *dev, diff --git a/drivers/net/mlx4/mlx4_en.h b/drivers/net/mlx4/mlx4_en.h index 11fb17c..98ddc08 100644 --- a/drivers/net/mlx4/mlx4_en.h +++ b/drivers/net/mlx4/mlx4_en.h @@ -322,6 +322,10 @@ struct mlx4_en_port_profile { u32 rx_ring_num; u32 tx_ring_size; u32 rx_ring_size; + u8 rx_pause; + u8 rx_ppp; + u8 tx_pause; + u8 tx_ppp; }; struct mlx4_en_profile { @@ -333,10 +337,6 @@ struct mlx4_en_profile { int rx_moder_cnt; int rx_moder_time; int auto_moder; - u8 rx_pause; - u8 rx_ppp; - u8 tx_pause; - u8 tx_ppp; u8 no_reset; struct mlx4_en_port_profile prof[MLX4_MAX_PORTS + 1]; }; -- 1.5.4 From yevgenyp at mellanox.co.il Wed Nov 5 06:53:50 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Wed, 05 Nov 2008 16:53:50 +0200 Subject: [ofa-general] [PATCH] mlx4_en: Start port error flow bug fix Message-ID: <4911B37E.3020900@mellanox.co.il> Tried to deactivate rx ring that wasn't activated, used wrong index. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/en_netdev.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/mlx4/en_netdev.c b/drivers/net/mlx4/en_netdev.c index 12d736a..96e709d 100644 --- a/drivers/net/mlx4/en_netdev.c +++ b/drivers/net/mlx4/en_netdev.c @@ -706,7 +706,7 @@ tx_err: mlx4_en_release_rss_steer(priv); rx_err: for (i = 0; i < priv->rx_ring_num; i++) - mlx4_en_deactivate_rx_ring(priv, &priv->rx_ring[rx_index]); + mlx4_en_deactivate_rx_ring(priv, &priv->rx_ring[i]); cq_err: while (rx_index--) mlx4_en_deactivate_cq(priv, &priv->rx_cq[rx_index]); -- 1.5.4 From rdreier at cisco.com Wed Nov 5 09:58:58 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 05 Nov 2008 09:58:58 -0800 Subject: [ofa-general] Re: [PATCH] mlx4_en: Pause parameters per port In-Reply-To: <4911B244.30205@mellanox.co.il> (Yevgeny Petrilin's message of "Wed, 05 Nov 2008 16:48:36 +0200") References: <4911B244.30205@mellanox.co.il> Message-ID: Jeff, please go ahead and merge both of these mlx4_en patches. Yevgeny, I think it would be helpful for you to say (after the --- line, so the git tools strip it automatically) whether I should merge the patch or if it's something for Jeff, just to make things smoother and clearer. - R. From rdreier at cisco.com Wed Nov 5 10:57:22 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 05 Nov 2008 10:57:22 -0800 Subject: [ofa-general] Re: [PATCH] IB/mlx4: Set umem field to NULL in mlx4_ib_alloc_fast_reg_mr to avoid oops In-Reply-To: <20081105125734.GA23862@mellanox.co.il> (Vladimir Sokolovsky's message of "Wed, 5 Nov 2008 14:57:34 +0200") References: <20081105125734.GA23862@mellanox.co.il> Message-ID: thanks, applied From kelly at tradebotsystems.com Wed Nov 5 11:16:56 2008 From: kelly at tradebotsystems.com (Kelly Burkhart) Date: Wed, 5 Nov 2008 13:16:56 -0600 Subject: [ofa-general] Managing work completions (libibverbs) Message-ID: <98B0CDCB28A5EE4CB3678CD99406644E343497@tbmail2.tradebot.com> I'm now trying to work out the best approach for managing completions without spinning on ibv_poll_cq. I think what I want to do is create a completion channel and operate similarly to the last example in the ibv_ack_cq_events man page. The man page states that ibv_ack_cq_events is mandatory, however, the examples in perftest don't ack when in event mode. Is this a bug in the perftest programs or a bug in the man page? Is it possible use epoll to block on struct ibv_comp_channel::fd then use ibv_poll_cq to grab completions when epoll wakes up the process? Then (I hope) it would be unnecessary to call ibv_get/ack_cq_event(s). Or is it necessary to call these functions in place of ibv_poll_cq when a completion channel is used? Again, thanks for your advice, -K From rdreier at cisco.com Wed Nov 5 11:35:57 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 05 Nov 2008 11:35:57 -0800 Subject: [ofa-general] Managing work completions (libibverbs) In-Reply-To: <98B0CDCB28A5EE4CB3678CD99406644E343497@tbmail2.tradebot.com> (Kelly Burkhart's message of "Wed, 5 Nov 2008 13:16:56 -0600") References: <98B0CDCB28A5EE4CB3678CD99406644E343497@tbmail2.tradebot.com> Message-ID: > The man page states that ibv_ack_cq_events is mandatory, however, the > examples in perftest don't ack when in event mode. Is this a bug in > the perftest programs or a bug in the man page? I guess it would be a bug in the perftest programs, but the only need to call ibv_ack_cq_events() is when destroying a CQ -- ibv_destroy_cq() will wait until all CQ events are ACKed before returning. > Is it possible use epoll to block on struct ibv_comp_channel::fd then > use ibv_poll_cq to grab completions when epoll wakes up the process? > Then (I hope) it would be unnecessary to call > ibv_get/ack_cq_event(s). Or is it necessary to call these functions > in place of ibv_poll_cq when a completion channel is used? You can use epoll to get comp channel events, but you'll need to collect the event with ibv_get_cq_event() to rearm things. epoll tells you when the fd becomes readable, but you'll need to actually read all the events queued on the fd before waiting again. The overhead of ibv_get_cq_event() should not be too high compared to the overhead of sleeping and getting woken up again by an interrupt, and you can always amortize ibv_ack_cq_events() by just keeping a counter of the number of events you read and only calling ibv_ack_cq_events() occasionally. - R. From kelly at tradebotsystems.com Wed Nov 5 13:04:00 2008 From: kelly at tradebotsystems.com (Kelly Burkhart) Date: Wed, 5 Nov 2008 15:04:00 -0600 Subject: [ofa-general] Managing work completions (libibverbs) Message-ID: <98B0CDCB28A5EE4CB3678CD99406644E343499@tbmail2.tradebot.com> > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > > You can use epoll to get comp channel events, but you'll need > to collect > the event with ibv_get_cq_event() to rearm things. epoll > tells you when > the fd becomes readable, but you'll need to actually read all the > events queued on the fd before waiting again. The overhead of > ibv_get_cq_event() should not be too high compared to the overhead of > sleeping and getting woken up again by an interrupt, and you > can always > amortize ibv_ack_cq_events() by just keeping a counter of the > number of > events you read and only calling ibv_ack_cq_events() occasionally. Digging through the code to see what resource I hog if I don't ack frequently enough: It appears that ibv_ack_cq_events only increments an integer in the CQ (and doesn't free or return some resource). So I could just count gets and ack them all immediately prior to destructing the CQ. Why be so picky about matching acks with gets? -K From roland.list at gmail.com Wed Nov 5 13:17:10 2008 From: roland.list at gmail.com (Roland Dreier) Date: Wed, 5 Nov 2008 13:17:10 -0800 Subject: [ofa-general] Managing work completions (libibverbs) In-Reply-To: <98B0CDCB28A5EE4CB3678CD99406644E343499@tbmail2.tradebot.com> References: <98B0CDCB28A5EE4CB3678CD99406644E343499@tbmail2.tradebot.com> Message-ID: Yes, exactly: by keeping your own count, you avoid the pthread lock overhead in ack_events. The acking of events is required to avoid a race where a consumer gets an event for a CQ after destroying that CQ. - R. On 11/5/08, Kelly Burkhart wrote: > > >> -----Original Message----- >> From: Roland Dreier [mailto:rdreier at cisco.com] >> >> You can use epoll to get comp channel events, but you'll need >> to collect >> the event with ibv_get_cq_event() to rearm things. epoll >> tells you when >> the fd becomes readable, but you'll need to actually read all the >> events queued on the fd before waiting again. The overhead of >> ibv_get_cq_event() should not be too high compared to the overhead of >> sleeping and getting woken up again by an interrupt, and you >> can always >> amortize ibv_ack_cq_events() by just keeping a counter of the >> number of >> events you read and only calling ibv_ack_cq_events() occasionally. > > > Digging through the code to see what resource I hog if I don't ack > frequently enough: It appears that ibv_ack_cq_events only increments > an integer in the CQ (and doesn't free or return some resource). So I > could just count gets and ack them all immediately prior to > destructing the CQ. > > Why be so picky about matching acks with gets? > > -K > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > -- Sent from my mobile device From John.Marshall at ec.gc.ca Wed Nov 5 14:34:47 2008 From: John.Marshall at ec.gc.ca (John Marshall) Date: Wed, 05 Nov 2008 22:34:47 +0000 Subject: [ofa-general] OOM problem with ib_ipoib? In-Reply-To: References: <48FF6DFA.9080409@ec.gc.ca> <48FFA62D.3030305@ec.gc.ca> <490083D0.5000807@ec.gc.ca> <490876DF.2020705@ec.gc.ca> Message-ID: <49121F87.6010204@ec.gc.ca> Roland Dreier wrote: >> The curious thing is that the OOM occurs even when the ib interfaces >> are _not even UP_, although the ib_ipoib module is loaded. So, I cannot >> see how it can be an allocation issue in such a case related to usage. Am I >> missing something here? >> > > The IPoIB CM code allocates receive buffers even before the interface is brought > up. Maybe the wrong thing to do, but that's how the code is now at least. > > >> As well, shouldn't the OS handle this transparently via the pdflush which >> will write out the data and free up memory? Or does the pdflush not >> distinguish between total memory and low memory so that a problem >> occurs (yet the OOM happens even when the interfaces are not UP!)? >> > > You may really have no free lowmem... keep in mind that the linux mm really > does not behave well with 32G of RAM and a 32-bit kernel. It's fundamentally > and insane config and so no one tunes for it. > Progress! 1) I have done further tests and am comfortable that they do not happen on the x86-64 platform. 2) More tests using the same equipment but again with bigmem and, given your pointer on lowmem, have found that if I tweak the system with sysctl setting of: vm.lowmem_reserve_ratio=128 128 32 things seem to work well. I do this on _both_ the server and the client sides (lowmem issues also pop up on the client side when using nfs). Thanks, John From John.Marshall at ec.gc.ca Wed Nov 5 14:47:37 2008 From: John.Marshall at ec.gc.ca (John Marshall) Date: Wed, 05 Nov 2008 22:47:37 +0000 Subject: [ofa-general] nfs/rdma slow with uncached data Message-ID: <49122289.5030107@ec.gc.ca> Hi, I have done some nfs/rdma tests and found impressive transfer rates, but only when the file data is in cache. On the other hand, using straight nfs over ipoib I am able to get decent transfer rates one first read (not cached) when I tweak: echo 128 > /proc/fs/nfsd/pool_threads echo 1024 > /sys/block/sd?/queue/nr_requests echo 16384 > /sys/block/sd?/queue/read_ahead_kb (or using blockdev with 8192) My question: Does the nfs/rdma setup bypass the readahead mechanism in the kernel? If it does, 1) this may account for the major difference described above, 2) explain why transfers are _very_ fast only on the second go around--because it is in cached (assuming it fits in the cache) For both cases, I am using a 2.6.26 bigmem kernel with the necessary tweaks. Thanks, John From akepner at sgi.com Wed Nov 5 17:23:07 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Wed, 5 Nov 2008 17:23:07 -0800 Subject: [ofa-general] [PATCH] ipoib: null tx/rx_ring skb pointers on free Message-ID: <20081106012307.GP31163@sgi.com> Way back in: http://lists.openfabrics.org/pipermail/general/2008-May/050196.html I described an IPoIB-related panic we were seeing on large clusters. The signature was a backtrace like this: skb_over_panic :ib_ipoib:ipoib_ib_handle_rx_wc :ib_ipoib:ipoib_poll net_rx_action ..... The bug is difficult to reproduce, but we finally got a crashdump, and the problem appears to be that stale skb pointers on the tx_ring were left pointing to skbs that had been since reused, so that the skb's data region was now unexpectedly short, etc. Recently LLNL reported something similar: http://lists.openfabrics.org/pipermail/general/2008-October/054824.html A patch similar to the following seems to fix thing up. Ira, Al, if this looks OK, can you please sign off on it? Signed-off-by: Arthur Kepner --- ipoib_cm.c | 5 +++++ ipoib_ib.c | 4 ++++ 2 files changed, 9 insertions(+) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 7b14c2c..8f8650b 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -200,6 +200,7 @@ static void ipoib_cm_free_rx_ring(struct net_device *dev, ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, rx_ring[i].mapping); dev_kfree_skb_any(rx_ring[i].skb); + rx_ring[i].skb = NULL; } vfree(rx_ring); @@ -736,6 +737,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ if (unlikely(ib_dma_mapping_error(priv->ca, addr))) { ++dev->stats.tx_errors; dev_kfree_skb_any(skb); + tx_req->skb = NULL; return; } @@ -747,6 +749,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ ++dev->stats.tx_errors; ib_dma_unmap_single(priv->ca, addr, skb->len, DMA_TO_DEVICE); dev_kfree_skb_any(skb); + tx_req->skb = NULL; } else { dev->trans_start = jiffies; ++tx->tx_head; @@ -785,6 +788,7 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) dev->stats.tx_bytes += tx_req->skb->len; dev_kfree_skb_any(tx_req->skb); + tx_req->skb = NULL; netif_tx_lock(dev); @@ -1179,6 +1183,7 @@ timeout: ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->skb->len, DMA_TO_DEVICE); dev_kfree_skb_any(tx_req->skb); + tx_req->skb = NULL; ++p->tx_tail; netif_tx_lock_bh(p->dev); if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 28eb6f0..f7e3497 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -383,6 +383,7 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) dev->stats.tx_bytes += tx_req->skb->len; dev_kfree_skb_any(tx_req->skb); + tx_req->skb = NULL; ++priv->tx_tail; if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && @@ -572,6 +573,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, if (unlikely(ipoib_dma_map_tx(priv->ca, tx_req))) { ++dev->stats.tx_errors; dev_kfree_skb_any(skb); + tx_req->skb = NULL; return; } @@ -594,6 +596,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, --priv->tx_outstanding; ipoib_dma_unmap_tx(priv->ca, tx_req); dev_kfree_skb_any(skb); + tx_req->skb = NULL; if (netif_queue_stopped(dev)) netif_wake_queue(dev); } else { @@ -833,6 +836,7 @@ int ipoib_ib_dev_stop(struct net_device *dev, int flush) (ipoib_sendq_size - 1)]; ipoib_dma_unmap_tx(priv->ca, tx_req); dev_kfree_skb_any(tx_req->skb); + tx_req->skb = NULL; ++priv->tx_tail; --priv->tx_outstanding; } From chu11 at llnl.gov Wed Nov 5 17:46:03 2008 From: chu11 at llnl.gov (Al Chu) Date: Wed, 05 Nov 2008 17:46:03 -0800 Subject: [ofa-general] Re: [PATCH] ipoib: null tx/rx_ring skb pointers on free In-Reply-To: <20081106012307.GP31163@sgi.com> References: <20081106012307.GP31163@sgi.com> Message-ID: <1225935964.13371.5.camel@cardanus.llnl.gov> Hey Arthur, On Wed, 2008-11-05 at 17:23 -0800, akepner at sgi.com wrote: > Way back in: > > http:// lists.openfabrics.org/pipermail/general/2008-May/050196.html > > I described an IPoIB-related panic we were seeing on large > clusters. The signature was a backtrace like this: > > skb_over_panic > :ib_ipoib:ipoib_ib_handle_rx_wc > :ib_ipoib:ipoib_poll > net_rx_action > ..... > > The bug is difficult to reproduce, but we finally got a crashdump, > and the problem appears to be that stale skb pointers on the tx_ring > were left pointing to skbs that had been since reused, so that the > skb's data region was now unexpectedly short, etc. > > Recently LLNL reported something similar: > > http:// lists.openfabrics.org/pipermail/general/2008-October/054824.html > > A patch similar to the following seems to fix thing up. > > Ira, Al, if this looks OK, can you please sign off on it? Looks good to me. Al > Signed-off-by: Arthur Kepner > > --- > > ipoib_cm.c | 5 +++++ > ipoib_ib.c | 4 ++++ > 2 files changed, 9 insertions(+) > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > index 7b14c2c..8f8650b 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > @@ -200,6 +200,7 @@ static void ipoib_cm_free_rx_ring(struct net_device *dev, > ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, > rx_ring[i].mapping); > dev_kfree_skb_any(rx_ring[i].skb); > + rx_ring[i].skb = NULL; > } > > vfree(rx_ring); > @@ -736,6 +737,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ > if (unlikely(ib_dma_mapping_error(priv->ca, addr))) { > ++dev->stats.tx_errors; > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; > return; > } > > @@ -747,6 +749,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ > ++dev->stats.tx_errors; > ib_dma_unmap_single(priv->ca, addr, skb->len, DMA_TO_DEVICE); > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; > } else { > dev->trans_start = jiffies; > ++tx->tx_head; > @@ -785,6 +788,7 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) > dev->stats.tx_bytes += tx_req->skb->len; > > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; > > netif_tx_lock(dev); > > @@ -1179,6 +1183,7 @@ timeout: > ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->skb->len, > DMA_TO_DEVICE); > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; > ++p->tx_tail; > netif_tx_lock_bh(p->dev); > if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > index 28eb6f0..f7e3497 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > @@ -383,6 +383,7 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) > dev->stats.tx_bytes += tx_req->skb->len; > > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; > > ++priv->tx_tail; > if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && > @@ -572,6 +573,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, > if (unlikely(ipoib_dma_map_tx(priv->ca, tx_req))) { > ++dev->stats.tx_errors; > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; > return; > } > > @@ -594,6 +596,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, > --priv->tx_outstanding; > ipoib_dma_unmap_tx(priv->ca, tx_req); > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; > if (netif_queue_stopped(dev)) > netif_wake_queue(dev); > } else { > @@ -833,6 +836,7 @@ int ipoib_ib_dev_stop(struct net_device *dev, int flush) > (ipoib_sendq_size - 1)]; > ipoib_dma_unmap_tx(priv->ca, tx_req); > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; > ++priv->tx_tail; > --priv->tx_outstanding; > } > -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From chu11 at llnl.gov Wed Nov 5 17:46:03 2008 From: chu11 at llnl.gov (Al Chu) Date: Wed, 05 Nov 2008 17:46:03 -0800 Subject: [ofa-general] Re: [PATCH] ipoib: null tx/rx_ring skb pointers on free In-Reply-To: <20081106012307.GP31163@sgi.com> References: <20081106012307.GP31163@sgi.com> Message-ID: <1225935964.13371.5.camel@cardanus.llnl.gov> Hey Arthur, On Wed, 2008-11-05 at 17:23 -0800, akepner at sgi.com wrote: > Way back in: > > http:// lists.openfabrics.org/pipermail/general/2008-May/050196.html > > I described an IPoIB-related panic we were seeing on large > clusters. The signature was a backtrace like this: > > skb_over_panic > :ib_ipoib:ipoib_ib_handle_rx_wc > :ib_ipoib:ipoib_poll > net_rx_action > ..... > > The bug is difficult to reproduce, but we finally got a crashdump, > and the problem appears to be that stale skb pointers on the tx_ring > were left pointing to skbs that had been since reused, so that the > skb's data region was now unexpectedly short, etc. > > Recently LLNL reported something similar: > > http:// lists.openfabrics.org/pipermail/general/2008-October/054824.html > > A patch similar to the following seems to fix thing up. > > Ira, Al, if this looks OK, can you please sign off on it? Looks good to me. Al > Signed-off-by: Arthur Kepner > > --- > > ipoib_cm.c | 5 +++++ > ipoib_ib.c | 4 ++++ > 2 files changed, 9 insertions(+) > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > index 7b14c2c..8f8650b 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > @@ -200,6 +200,7 @@ static void ipoib_cm_free_rx_ring(struct net_device *dev, > ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, > rx_ring[i].mapping); > dev_kfree_skb_any(rx_ring[i].skb); > + rx_ring[i].skb = NULL; > } > > vfree(rx_ring); > @@ -736,6 +737,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ > if (unlikely(ib_dma_mapping_error(priv->ca, addr))) { > ++dev->stats.tx_errors; > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; > return; > } > > @@ -747,6 +749,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ > ++dev->stats.tx_errors; > ib_dma_unmap_single(priv->ca, addr, skb->len, DMA_TO_DEVICE); > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; > } else { > dev->trans_start = jiffies; > ++tx->tx_head; > @@ -785,6 +788,7 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) > dev->stats.tx_bytes += tx_req->skb->len; > > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; > > netif_tx_lock(dev); > > @@ -1179,6 +1183,7 @@ timeout: > ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->skb->len, > DMA_TO_DEVICE); > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; > ++p->tx_tail; > netif_tx_lock_bh(p->dev); > if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > index 28eb6f0..f7e3497 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > @@ -383,6 +383,7 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) > dev->stats.tx_bytes += tx_req->skb->len; > > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; > > ++priv->tx_tail; > if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && > @@ -572,6 +573,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, > if (unlikely(ipoib_dma_map_tx(priv->ca, tx_req))) { > ++dev->stats.tx_errors; > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; > return; > } > > @@ -594,6 +596,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, > --priv->tx_outstanding; > ipoib_dma_unmap_tx(priv->ca, tx_req); > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; > if (netif_queue_stopped(dev)) > netif_wake_queue(dev); > } else { > @@ -833,6 +836,7 @@ int ipoib_ib_dev_stop(struct net_device *dev, int flush) > (ipoib_sendq_size - 1)]; > ipoib_dma_unmap_tx(priv->ca, tx_req); > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; > ++priv->tx_tail; > --priv->tx_outstanding; > } > -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From weiny2 at llnl.gov Wed Nov 5 17:53:52 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Wed, 5 Nov 2008 17:53:52 -0800 Subject: [ofa-general] Re: [PATCH] ipoib: null tx/rx_ring skb pointers on free In-Reply-To: <20081106012307.GP31163@sgi.com> References: <20081106012307.GP31163@sgi.com> Message-ID: <20081105175352.476ac69e.weiny2@llnl.gov> On Wed, 5 Nov 2008 17:23:07 -0800 akepner at sgi.com wrote: > > Way back in: > > http:// lists.openfabrics.org/pipermail/general/2008-May/050196.html > > I described an IPoIB-related panic we were seeing on large > clusters. The signature was a backtrace like this: > > skb_over_panic > :ib_ipoib:ipoib_ib_handle_rx_wc > :ib_ipoib:ipoib_poll > net_rx_action > ..... > > The bug is difficult to reproduce, but we finally got a crashdump, > and the problem appears to be that stale skb pointers on the tx_ring > were left pointing to skbs that had been since reused, so that the > skb's data region was now unexpectedly short, etc. > > Recently LLNL reported something similar: > > http:// lists.openfabrics.org/pipermail/general/2008-October/054824.html > > A patch similar to the following seems to fix thing up. > > Ira, Al, if this looks OK, can you please sign off on it? Yep, looks good. Ira > > Signed-off-by: Arthur Kepner > > --- > > ipoib_cm.c | 5 +++++ > ipoib_ib.c | 4 ++++ > 2 files changed, 9 insertions(+) > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > index 7b14c2c..8f8650b 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > @@ -200,6 +200,7 @@ static void ipoib_cm_free_rx_ring(struct net_device *dev, > ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, > rx_ring[i].mapping); > dev_kfree_skb_any(rx_ring[i].skb); > + rx_ring[i].skb = NULL; > } > > vfree(rx_ring); > @@ -736,6 +737,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ > if (unlikely(ib_dma_mapping_error(priv->ca, addr))) { > ++dev->stats.tx_errors; > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; > return; > } > > @@ -747,6 +749,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ > ++dev->stats.tx_errors; > ib_dma_unmap_single(priv->ca, addr, skb->len, DMA_TO_DEVICE); > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; > } else { > dev->trans_start = jiffies; > ++tx->tx_head; > @@ -785,6 +788,7 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) > dev->stats.tx_bytes += tx_req->skb->len; > > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; > > netif_tx_lock(dev); > > @@ -1179,6 +1183,7 @@ timeout: > ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->skb->len, > DMA_TO_DEVICE); > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; > ++p->tx_tail; > netif_tx_lock_bh(p->dev); > if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > index 28eb6f0..f7e3497 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > @@ -383,6 +383,7 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) > dev->stats.tx_bytes += tx_req->skb->len; > > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; > > ++priv->tx_tail; > if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && > @@ -572,6 +573,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, > if (unlikely(ipoib_dma_map_tx(priv->ca, tx_req))) { > ++dev->stats.tx_errors; > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; > return; > } > > @@ -594,6 +596,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, > --priv->tx_outstanding; > ipoib_dma_unmap_tx(priv->ca, tx_req); > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; > if (netif_queue_stopped(dev)) > netif_wake_queue(dev); > } else { > @@ -833,6 +836,7 @@ int ipoib_ib_dev_stop(struct net_device *dev, int flush) > (ipoib_sendq_size - 1)]; > ipoib_dma_unmap_tx(priv->ca, tx_req); > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; > ++priv->tx_tail; > --priv->tx_outstanding; > } > From jgarzik at pobox.com Wed Nov 5 21:43:31 2008 From: jgarzik at pobox.com (Jeff Garzik) Date: Thu, 06 Nov 2008 00:43:31 -0500 Subject: [ofa-general] Re: [PATCH] mlx4_en: Start port error flow bug fix In-Reply-To: <4911B37E.3020900@mellanox.co.il> References: <4911B37E.3020900@mellanox.co.il> Message-ID: <49128403.6000205@pobox.com> Yevgeny Petrilin wrote: > Tried to deactivate rx ring that wasn't activated, > used wrong index. > > Signed-off-by: Yevgeny Petrilin > --- > drivers/net/mlx4/en_netdev.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/net/mlx4/en_netdev.c b/drivers/net/mlx4/en_netdev.c > index 12d736a..96e709d 100644 > --- a/drivers/net/mlx4/en_netdev.c > +++ b/drivers/net/mlx4/en_netdev.c > @@ -706,7 +706,7 @@ tx_err: > mlx4_en_release_rss_steer(priv); > rx_err: > for (i = 0; i < priv->rx_ring_num; i++) > - mlx4_en_deactivate_rx_ring(priv, &priv->rx_ring[rx_index]); > + mlx4_en_deactivate_rx_ring(priv, &priv->rx_ring[i]); > cq_err: applied From jgarzik at pobox.com Wed Nov 5 21:45:22 2008 From: jgarzik at pobox.com (Jeff Garzik) Date: Thu, 06 Nov 2008 00:45:22 -0500 Subject: [ofa-general] Re: [PATCH] mlx4_en: Pause parameters per port In-Reply-To: <4911B244.30205@mellanox.co.il> References: <4911B244.30205@mellanox.co.il> Message-ID: <49128472.7080607@pobox.com> Yevgeny Petrilin wrote: > Before the change the driver reported the same pause parameters > for all the ports, even only one of them was modified. > > Signed-off-by: Yevgeny Petrilin > --- > drivers/net/mlx4/en_netdev.c | 8 ++++---- > drivers/net/mlx4/en_params.c | 30 ++++++++++++++++-------------- > drivers/net/mlx4/mlx4_en.h | 8 ++++---- > 3 files changed, 24 insertions(+), 22 deletions(-) Is this a regression fix? It doesn't look like one to me, so I am planning to hold this for 2.6.29 (davem/net-next-2.6.git), unless there are problems with this plan? Jeff From yevgenyp at mellanox.co.il Wed Nov 5 22:53:40 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Thu, 06 Nov 2008 08:53:40 +0200 Subject: [ofa-general] Re: Re: [PATCH] mlx4_en: Pause parameters per port In-Reply-To: <49128472.7080607@pobox.com> References: <4911B244.30205@mellanox.co.il> <49128472.7080607@pobox.com> Message-ID: <49129474.1030607@mellanox.co.il> Jeff Garzik wrote: > Is this a regression fix? It doesn't look like one to me, so I am > planning to hold this for 2.6.29 (davem/net-next-2.6.git), unless there > are problems with this plan? > This is regression fix. When setting pause parameters for one port, they would only change for that port, but both of the ports report the same parameters. It means that the second port reports wrong pause parameters. Yevgeny From eli at dev.mellanox.co.il Thu Nov 6 00:40:32 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Thu, 6 Nov 2008 10:40:32 +0200 Subject: [ofa-general] [PATCH] ipoib: null tx/rx_ring skb pointers on free In-Reply-To: <20081106012307.GP31163@sgi.com> References: <20081106012307.GP31163@sgi.com> Message-ID: <20081106084031.GA25354@mtls03> On Wed, Nov 05, 2008 at 05:23:07PM -0800, akepner at sgi.com wrote: Hi Arthur, looking a the patch I don't understand why it should fix the problem you're seeing. I suspect we may be hiding the problem. > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > index 7b14c2c..8f8650b 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > @@ -200,6 +200,7 @@ static void ipoib_cm_free_rx_ring(struct net_device *dev, > ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, > rx_ring[i].mapping); > dev_kfree_skb_any(rx_ring[i].skb); > + rx_ring[i].skb = NULL; > } > > vfree(rx_ring); This is not needed since the ring is being freed. > @@ -736,6 +737,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ > if (unlikely(ib_dma_mapping_error(priv->ca, addr))) { > ++dev->stats.tx_errors; > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; > return; > } Here we will never get completion so why do we need this? > > @@ -747,6 +749,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ > ++dev->stats.tx_errors; > ib_dma_unmap_single(priv->ca, addr, skb->len, DMA_TO_DEVICE); > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; Also here, we don't get a completion. > } else { > dev->trans_start = jiffies; > ++tx->tx_head; > @@ -785,6 +788,7 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) > dev->stats.tx_bytes += tx_req->skb->len; > > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; And here we already got the completion so we shouldn't exptect another free of the SKB. > > netif_tx_lock(dev); > > @@ -1179,6 +1183,7 @@ timeout: > ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->skb->len, > DMA_TO_DEVICE); > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; and here we're freeing the ring > ++p->tx_tail; > netif_tx_lock_bh(p->dev); > if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > index 28eb6f0..f7e3497 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > @@ -383,6 +383,7 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) > dev->stats.tx_bytes += tx_req->skb->len; > > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; > > ++priv->tx_tail; > if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && > @@ -572,6 +573,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, > if (unlikely(ipoib_dma_map_tx(priv->ca, tx_req))) { > ++dev->stats.tx_errors; > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; > return; > } > > @@ -594,6 +596,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, > --priv->tx_outstanding; > ipoib_dma_unmap_tx(priv->ca, tx_req); > dev_kfree_skb_any(skb); > + tx_req->skb = NULL; > if (netif_queue_stopped(dev)) > netif_wake_queue(dev); > } else { > @@ -833,6 +836,7 @@ int ipoib_ib_dev_stop(struct net_device *dev, int flush) > (ipoib_sendq_size - 1)]; > ipoib_dma_unmap_tx(priv->ca, tx_req); > dev_kfree_skb_any(tx_req->skb); > + tx_req->skb = NULL; > ++priv->tx_tail; > --priv->tx_outstanding; > } > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From devesh28 at gmail.com Thu Nov 6 01:09:01 2008 From: devesh28 at gmail.com (Devesh Sharma) Date: Thu, 6 Nov 2008 14:39:01 +0530 Subject: ***SPAM*** Re: [ofa-general] infiniband multicast (libibverbs) In-Reply-To: <98B0CDCB28A5EE4CB3678CD99406644E34348A@tbmail2.tradebot.com> References: <98B0CDCB28A5EE4CB3678CD99406644E34348A@tbmail2.tradebot.com> Message-ID: <309a667c0811060109g7b75d7dan28ee098c34c6c18d@mail.gmail.com> ok, try to do sequence number check after a slight delay say after 100ns delay. Is it possible that DMA latancies are comming into picture? Roland or Dotan can comment on this! On 11/5/08, Kelly Burkhart wrote: > > It is non-blocking. I spin, calling ibv_poll_cq until it returns a > non-zero. > > > ------------------------------ > *From:* Devesh Sharma [mailto:devesh28 at gmail.com] > *Sent:* Tuesday, November 04, 2008 11:49 PM > *To:* Kelly Burkhart > *Cc:* Roland Dreier; general at lists.openfabrics.org > *Subject:* Re: [ofa-general] infiniband multicast (libibverbs) > > > Correction in my post : I mean you are not considering it as non-blocking > call (not taking care of this behaviour) and just going ahead with the > sequence number check? > > On 11/5/08, Devesh Sharma wrote: >> >> are you taking care that ibv_poll_cq is not a blocking call, I mean you >> are not considering it as blocking call and just going ahead with the >> sequence number check? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dorons at Voltaire.COM Thu Nov 6 01:21:29 2008 From: dorons at Voltaire.COM (Doron Shoham) Date: Thu, 06 Nov 2008 11:21:29 +0200 Subject: [ofa-general] [PATCH 0/2] add and install default configuration files Message-ID: <4912B719.3040907@Voltaire.COM> The following patches will add default configuration files and install them by opensm rpm. From dorons at Voltaire.COM Thu Nov 6 01:24:26 2008 From: dorons at Voltaire.COM (Doron Shoham) Date: Thu, 06 Nov 2008 11:24:26 +0200 Subject: [ofa-general] [PATCH 1/2] add default configuration files In-Reply-To: <4912B719.3040907@Voltaire.COM> References: <4912B719.3040907@Voltaire.COM> Message-ID: <4912B7CA.9080508@Voltaire.COM> add default configuration files: opensm.conf partitions.conf qos-policy.conf root-nodes.conf Signed-off-by: Doron Shoham --- opensm/scripts/opensm.conf | 331 ++++++++++++++++++++++++++++++++++++++++ opensm/scripts/partitions.conf | 100 ++++++++++++ opensm/scripts/qos-policy.conf | 2 + opensm/scripts/root-nodes.conf | 3 + 4 files changed, 436 insertions(+), 0 deletions(-) create mode 100644 opensm/scripts/opensm.conf create mode 100644 opensm/scripts/partitions.conf create mode 100644 opensm/scripts/qos-policy.conf create mode 100644 opensm/scripts/root-nodes.conf diff --git a/opensm/scripts/opensm.conf b/opensm/scripts/opensm.conf new file mode 100644 index 0000000..89e4145 --- /dev/null +++ b/opensm/scripts/opensm.conf @@ -0,0 +1,331 @@ +# +# DEVICE ATTRIBUTES OPTIONS +# +# The port GUID on which the OpenSM is running +guid 0x0000000000000000 + +# M_Key value sent to all ports qualifying all Set(PortInfo) +m_key 0x0000000000000000 + +# The lease period used for the M_Key on this subnet in [sec] +m_key_lease_period 0 + +# SM_Key value of the SM used for SM authentication +sm_key 0x0000000000000001 + +# SM_Key value to qualify rcv SA queries as 'trusted' +sa_key 0x0000000000000001 + +# Note that for both values above (sm_key and sa_key) +# OpenSM version 3.2.1 and below used the default value '1' +# in a host byte order, it is fixed now but you may need to +# change the values to interoperate with old OpenSM running +# on a little endian machine. + +# Subnet prefix used on this subnet +subnet_prefix 0xfe80000000000000 + +# The LMC value used on this subnet +lmc 0 + +# lmc_esp0 determines whether LMC value used on subnet is used for +# enhanced switch port 0. If TRUE, LMC value for subnet is used for +# ESP0. Otherwise, LMC value for ESP0s is 0. +lmc_esp0 FALSE + +# The code of maximal time a packet can live in a switch +# The actual time is 4.096usec * 2^ +# The value 0x14 disables this mechanism +packet_life_time 0x12 + +# The number of sequential packets dropped that cause the port +# to enter the VLStalled state. The result of setting this value to +# zero is undefined. +vl_stall_count 0x07 + +# The number of sequential packets dropped that cause the port +# to enter the VLStalled state. This value is for switch ports +# driving a CA or router port. The result of setting this value +# to zero is undefined. +leaf_vl_stall_count 0x07 + +# The code of maximal time a packet can wait at the head of +# transmission queue. +# The actual time is 4.096usec * 2^ +# The value 0x14 disables this mechanism +head_of_queue_lifetime 0x12 + +# The maximal time a packet can wait at the head of queue on +# switch port connected to a CA or router port +leaf_head_of_queue_lifetime 0x10 + +# Limit the maximal operational VLs +max_op_vls 5 + +# Force PortInfo:LinkSpeedEnabled on switch ports +# If 0, don't modify PortInfo:LinkSpeedEnabled on switch port +# Otherwise, use value for PortInfo:LinkSpeedEnabled on switch port +# Values are (IB Spec 1.2.1, 14.2.5.6 Table 146 "PortInfo") +# 1: 2.5 Gbps +# 3: 2.5 or 5.0 Gbps +# 5: 2.5 or 10.0 Gbps +# 7: 2.5 or 5.0 or 10.0 Gbps +# 2,4,6,8-14 Reserved +# Default 15: set to PortInfo:LinkSpeedSupported +force_link_speed 15 + +# The subnet_timeout code that will be set for all the ports +# The actual timeout is 4.096usec * 2^ +subnet_timeout 18 + +# Threshold of local phy errors for sending Trap 129 +local_phy_errors_threshold 0x08 + +# Threshold of credit overrun errors for sending Trap 130 +overrun_errors_threshold 0x08 + +# +# PARTITIONING OPTIONS +# +# Partition configuration file to be used +partition_config_file /etc/opensm/partitions.conf + +# Disable partition enforcement by switches +no_partition_enforcement FALSE + +# +# SWEEP OPTIONS +# +# The number of seconds between subnet sweeps (0 disables it) +sweep_interval 10 + +# If TRUE cause all lids to be reassigned +reassign_lids FALSE + +# If TRUE forces every sweep to be a heavy sweep +force_heavy_sweep FALSE + +# If TRUE every trap will cause a heavy sweep. +# NOTE: successive identical traps (>10) are suppressed +sweep_on_trap TRUE + +# +# ROUTING OPTIONS +# +# If TRUE count switches as link subscriptions +port_profile_switch_nodes FALSE + +# Name of file with port guids to be ignored by port profiling +port_prof_ignore_file (null) + +# Routing engine +# Multiple routing engines can be specified separated by +# commas so that specific ordering of routing algorithms will +# be tried if earlier routing engines fail. +# Supported engines: minhop, updn, file, ftree, lash, dor +routing_engine minhop + +# Connect roots (use FALSE if unsure) +connect_roots FALSE + +# Use unicast routing cache (use FALSE if unsure) +use_ucast_cache FALSE + +# Lid matrix dump file name +lid_matrix_dump_file (null) + +# LFTs file name +lfts_file (null) + +# The file holding the root node guids (for fat-tree or Up/Down) +# One guid in each line +root_guid_file (null) + +# The file holding the fat-tree compute node guids +# One guid in each line +cn_guid_file (null) + +# The file holding the node ids which will be used by Up/Down algorithm instead +# of GUIDs (one guid and id in each line) +ids_guid_file (null) + +# The file holding guid routing order guids (for MinHop and Up/Down) +guid_routing_order_file (null) + +# SA database file name +sa_db_file (null) + +# +# HANDOVER - MULTIPLE SMs OPTIONS +# +# SM priority used for deciding who is the master +# Range goes from 0 (lowest priority) to 15 (highest). +sm_priority 14 + +# If TRUE other SMs on the subnet should be ignored +ignore_other_sm FALSE + +# Timeout in [msec] between two polls of active master SM +sminfo_polling_timeout 10000 + +# Number of failing polls of remote SM that declares it dead +polling_retry_number 4 + +# If TRUE honor the guid2lid file when coming out of standby +# state, if such file exists and is valid +honor_guid2lid_file FALSE + +# +# TIMING AND THREADING OPTIONS +# +# Maximum number of SMPs sent in parallel +max_wire_smps 4 + +# The maximum time in [msec] allowed for a transaction to complete +transaction_timeout 200 + +# Maximal time in [msec] a message can stay in the incoming message queue. +# If there is more than one message in the queue and the last message +# stayed in the queue more than this value, any SA request will be +# immediately returned with a BUSY status. +max_msg_fifo_timeout 10000 + +# Use a single thread for handling SA queries +single_thread FALSE + +# +# MISC OPTIONS +# +# Daemon mode +daemon FALSE + +# SM Inactive +sm_inactive FALSE + +# Babbling Port Policy +babbling_port_policy FALSE + +# +# Performance Manager Options +# +# perfmgr enable +perfmgr FALSE + +# perfmgr redirection enable +perfmgr_redir TRUE + +# sweep time in seconds +perfmgr_sweep_time_s 180 + +# Max outstanding queries +perfmgr_max_outstanding_queries 500 + +# +# Event DB Options +# +# Dump file to dump the events to +event_db_dump_file (null) + +# +# Event Plugin Options +# +event_plugin_name (null) + +# +# Node name map for mapping node's to more descriptive node descriptions +# (man ibnetdiscover for more information) +# +node_name_map_name (null) + +# +# DEBUG FEATURES +# +# The log flags used +log_flags 0x03 + +# Force flush of the log file after each log message +force_log_flush FALSE + +# Log file to be used +log_file /var/log/opensm.log + +# Limit the size(MB) of the log file. If overrun, log is restarted +log_max_size 4096 + +# If TRUE will accumulate the log over multiple OpenSM sessions +accum_log_file TRUE + +# The directory to hold the file OpenSM dumps +dump_files_dir /var/log/ + +# If TRUE enables new high risk options and hardware specific quirks +enable_quirks FALSE + +# If TRUE disables client reregistration +no_clients_rereg FALSE + +# If TRUE OpenSM should disable multicast support and +# no multicast routing is performed if TRUE +disable_multicast FALSE + +# If TRUE opensm will exit on fatal initialization issues +exit_on_fatal TRUE + +# console [off|local] +console off + +# Telnet port for console (default 10000) +console_port 10000 + +# +# QoS OPTIONS +# +# Enable QoS setup +qos FALSE + +# QoS policy file to be used +qos_policy_file /etc/opensm/qos-policy.conf + +# QoS default options +qos_max_vls 15 +qos_high_limit 0 +qos_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 +qos_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 +qos_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 + +# QoS CA options +qos_ca_max_vls 15 +qos_ca_high_limit 0 +qos_ca_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 +qos_ca_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 +qos_ca_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 + +# QoS Switch Port 0 options +qos_sw0_max_vls 15 +qos_sw0_high_limit 0 +qos_sw0_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 +qos_sw0_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 +qos_sw0_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 + +# QoS Switch external ports options +qos_swe_max_vls 15 +qos_swe_high_limit 0 +qos_swe_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 +qos_swe_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 +qos_swe_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 + +# QoS Router ports options +qos_rtr_max_vls 15 +qos_rtr_high_limit 0 +qos_rtr_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 +qos_rtr_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 +qos_rtr_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 + +# Prefix routes file name +prefix_routes_file /etc/opensm/prefix-routes.conf + +# +# IPv6 Solicited Node Multicast (SNM) Options +# +consolidate_ipv6_snm_req FALSE + diff --git a/opensm/scripts/partitions.conf b/opensm/scripts/partitions.conf new file mode 100644 index 0000000..868a26a --- /dev/null +++ b/opensm/scripts/partitions.conf @@ -0,0 +1,100 @@ +# Default partition configuration file for OpenSM +# +# The default name of OpenSM partitions configuration file is /etc/opensm/partitions.conf. The default may be changed by using --Pconfig (-P) +# option with OpenSM. +# +# The default partition will be created by OpenSM unconditionally even when partition configuration file does not exist or cannot be accessed. +# +# The default partition has P_Key value 0x7fff. OpenSM´s port will have full membership in default partition. All other end ports will have par‐ +# tial membership. +# +# File Format +# +# Comments: +# +# Line content followed after ´#´ character is comment and ignored by parser. +# +# General file format: +# +# : ; +# +# Partition Definition: +# +# [PartitionName][=PKey][,flag[=value]][,defmember=full|limited] +# +# PartitionName - string, will be used with logging. When omitted +# empty string will be used. +# PKey - P_Key value for this partition. Only low 15 bits will +# be used. When omitted will be autogenerated. +# flag - used to indicate IPoIB capability of this partition. +# defmember=full|limited - specifies default membership for port guid +# list. Default is limited. +# +# Currently recognized flags are: +# +# ipoib - indicates that this partition may be used for IPoIB, as +# result IPoIB capable MC group will be created. +# rate= - specifies rate for this IPoIB MC group +# (default is 3 (10GBps)) +# mtu= - specifies MTU for this IPoIB MC group +# (default is 4 (2048)) +# sl= - specifies SL for this IPoIB MC group +# (default is 0) +# scope= - specifies scope for this IPoIB MC group +# (default is 2 (link local)). Multiple scope settings +# are permitted for a partition. +# +# Note that values for rate, mtu, and scope should be specified as defined in the IBTA specification (for example, mtu=4 for 2048). +# +# PortGUIDs list: +# +# PortGUID - GUID of partition member EndPort. Hexadecimal +# numbers should start from 0x, decimal numbers +# are accepted too. +# full or limited - indicates full or limited membership for this +# port. When omitted (or unrecognized) limited +# membership is assumed. +# +# There are two useful keywords for PortGUID definition: +# +# - 'ALL' means all end ports in this subnet. +# - 'SELF' means subnet manager's port. +# +# Empty list means no ports in this partition. +# +# Notes: +# +# White space is permitted between delimiters ('=', ',',':',';'). +# +# The line can be wrapped after ':' followed after Partition Definition and between. +# +# PartitionName does not need to be unique, PKey does need to be unique. If PKey is repeated then those partition configurations will be merged +# and first PartitionName will be used (see also next note). +# +# It is possible to split partition configuration in more than one definition, but then PKey should be explicitly specified (otherwise different +# PKey values will be generated for those definitions). +# +# Examples: +# +# Default=0x7fff : ALL, SELF=full ; +# +# NewPartition , ipoib : 0x123456=full, 0x3456789034=limi, 0x2134af2306 ; +# +# YetAnotherOne = 0x300 : SELF=full ; +# YetAnotherOne = 0x300 : ALL=limited ; +# +# ShareIO = 0x80 , defmember=full : 0x123451, 0x123452; +# # 0x123453, 0x123454 will be limited +# ShareIO = 0x80 : 0x123453, 0x123454, 0x123455=full; +# # 0x123456, 0x123457 will be limited +# ShareIO = 0x80 : defmember=limited : 0x123456, 0x123457, 0x123458=full; +# ShareIO = 0x80 , defmember=full : 0x123459, 0x12345a; +# ShareIO = 0x80 , defmember=full : 0x12345b, 0x12345c=limited, 0x12345d; +# +# +# Note: +# +# The following rule is equivalent to how OpenSM used to run prior to the partition manager: +# + Default=0x7fff,ipoib:ALL=full; +# diff --git a/opensm/scripts/qos-policy.conf b/opensm/scripts/qos-policy.conf new file mode 100644 index 0000000..42a88c0 --- /dev/null +++ b/opensm/scripts/qos-policy.conf @@ -0,0 +1,2 @@ +# Default Quality of Service policy configuration file +# For further details see /usr/share/doc/opensm-/QoS_management_in_OpenSM.txt diff --git a/opensm/scripts/root-nodes.conf b/opensm/scripts/root-nodes.conf new file mode 100644 index 0000000..d84d732 --- /dev/null +++ b/opensm/scripts/root-nodes.conf @@ -0,0 +1,3 @@ +# Default root node GUIDs configuration file for OpenSM +# List of GUIDs in hex, one per line +# 0x8f10002322134567 -- 1.5.3.8 From dorons at Voltaire.COM Thu Nov 6 01:25:19 2008 From: dorons at Voltaire.COM (Doron Shoham) Date: Thu, 06 Nov 2008 11:25:19 +0200 Subject: [ofa-general] [PATCH 2/2] install the configuration files by the rpm In-Reply-To: <4912B719.3040907@Voltaire.COM> References: <4912B719.3040907@Voltaire.COM> Message-ID: <4912B7FF.5030900@Voltaire.COM> install the configuration files by the rpm Signed-off-by: Doron Shoham --- opensm/opensm.spec.in | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/opensm/opensm.spec.in b/opensm/opensm.spec.in index f8cecf1..b5d5b2c 100644 --- a/opensm/opensm.spec.in +++ b/opensm/opensm.spec.in @@ -98,6 +98,10 @@ mkdir -p $etc/{init.d,logrotate.d} $etc/@OPENSM_CONFIG_SUB_DIR@ install -m 755 scripts/${REDHAT}opensm.init $etc/init.d/opensmd install -D -m 644 scripts/opensm.logrotate $etc/logrotate.d/opensm install -m 755 scripts/sldd.sh $RPM_BUILD_ROOT%{_sbindir}/sldd.sh +install -m 644 scripts/opensm.conf $etc/opensm/ +install -m 644 scripts/partitions.conf $etc/opensm/ +install -m 644 scripts/qos-policy.conf $etc/opensm/ +install -m 644 scripts/root-nodes.conf $etc/opensm/ %clean rm -rf $RPM_BUILD_ROOT @@ -130,6 +134,7 @@ fi %config(noreplace) %{_sysconfdir}/logrotate.d/opensm %dir /var/cache/opensm %dir %{_sysconfdir}/@OPENSM_CONFIG_SUB_DIR@ +%{_sysconfdir}/opensm/* %files libs %defattr(-,root,root,-) -- 1.5.3.8 From dorons at Voltaire.COM Thu Nov 6 01:46:36 2008 From: dorons at Voltaire.COM (Doron Shoham) Date: Thu, 06 Nov 2008 11:46:36 +0200 Subject: [ofa-general] [PATCH 0/2] update and install QoS_management_in_OpenSM.txt Message-ID: <4912BCFC.8030407@Voltaire.COM> The following patches will fix the default configuration files path in QoS_management_in_OpenSM.txt and install the file via the rpm. From dorons at Voltaire.COM Thu Nov 6 01:48:44 2008 From: dorons at Voltaire.COM (Doron Shoham) Date: Thu, 06 Nov 2008 11:48:44 +0200 Subject: [ofa-general] [PATCH 1/2] fix default configuration files path In-Reply-To: <4912BCFC.8030407@Voltaire.COM> References: <4912BCFC.8030407@Voltaire.COM> Message-ID: <4912BD7C.1030603@Voltaire.COM> fix default configuration files path in QoS_management_in_OpenSM.txt file from /usr/local/etc/opensm/ to /etc/opensm/ Signed-off-by: Doron Shoham --- opensm/doc/QoS_management_in_OpenSM.txt | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/opensm/doc/QoS_management_in_OpenSM.txt b/opensm/doc/QoS_management_in_OpenSM.txt index ba1b4b1..1a48b1a 100644 --- a/opensm/doc/QoS_management_in_OpenSM.txt +++ b/opensm/doc/QoS_management_in_OpenSM.txt @@ -20,7 +20,7 @@ When QoS in OpenSM is enabled (-Q or --qos), OpenSM looks for QoS Policy file. The default name of OpenSM QoS policy file is -/usr/local/etc/opensm/qos-policy.conf. The default may be changed by using -Y +/etc/opensm/qos-policy.conf. The default may be changed by using -Y or --qos_policy_file option with OpenSM. During fabric initialization and at every heavy sweep OpenSM parses the QoS @@ -67,7 +67,7 @@ This section describes how to set up SL2VL and VL Arbitration tables on various nodes in the fabric. However, this is not supported in OpenSM currently. SL2VL and VLArb tables should be configured in the OpenSM options file -(default location - /usr/local/etc/opensm/opensm.conf). +(default location - /etc/opensm/opensm.conf). III) QoS Levels (denoted by qos-levels). Each QoS Level defines Service Level (SL) and a few optional fields: @@ -205,7 +205,7 @@ policy file and their syntax: # Arbitration tables on various nodes in the fabric. # However, this is not supported in OpenSM currently - the section is # parsed and ignored. SL2VL and VLArb tables should be configured in the - # OpenSM options file (by default - /usr/local/etc/opensm/opensm.conf). + # OpenSM options file (by default - /etc/opensm/opensm.conf). end-qos-setup qos-levels -- 1.5.3.8 From dorons at Voltaire.COM Thu Nov 6 01:49:31 2008 From: dorons at Voltaire.COM (Doron Shoham) Date: Thu, 06 Nov 2008 11:49:31 +0200 Subject: [ofa-general] [PATCH 2/2] install QoS_management_in_OpenSM.txt In-Reply-To: <4912BCFC.8030407@Voltaire.COM> References: <4912BCFC.8030407@Voltaire.COM> Message-ID: <4912BDAB.5040704@Voltaire.COM> install QoS_management_in_OpenSM.txt via the rpm Signed-off-by: Doron Shoham --- opensm/opensm.spec.in | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/opensm/opensm.spec.in b/opensm/opensm.spec.in index f8cecf1..da07a73 100644 --- a/opensm/opensm.spec.in +++ b/opensm/opensm.spec.in @@ -124,7 +124,7 @@ fi %{_sbindir}/opensm %{_sbindir}/osmtest %{_mandir}/man8/* -%doc AUTHORS COPYING README doc/performance-manager-HOWTO.txt +%doc AUTHORS COPYING README doc/performance-manager-HOWTO.txt doc/QoS_management_in_OpenSM.txt %{_sysconfdir}/init.d/opensmd %{_sbindir}/sldd.sh %config(noreplace) %{_sysconfdir}/logrotate.d/opensm -- 1.5.3.8 From jackm at dev.mellanox.co.il Thu Nov 6 01:54:01 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 6 Nov 2008 11:54:01 +0200 Subject: [ofa-general] ib_mthca catastrophic error detected In-Reply-To: <490763D0.5020002@ucla.edu> References: <4906645D.6010101@ucla.edu> <4907054E.9080205@mellanox.co.il> <490763D0.5020002@ucla.edu> Message-ID: <200811061154.02260.jackm@dev.mellanox.co.il> On Tuesday 28 October 2008 21:11, Scott A. Friedman wrote: > Hi > > This cluster has OFED 1.2.5.4 running on it. The ib_mthca kernel module > reports the following on startup: > > ib_mthca: Mellanox InfiniBand HCA driver v1.0 (February 28, 2008) > > The cards in all (22) of the nodes we have seen this error on are as > follows: > > hca_id: mthca0 > fw_ver: 1.2.0 > vendor_id: 0x02c9 > vendor_part_id: 25204 > hw_ver: 0xA0 > board_id: MT_03B0140001 > phys_port_cnt: 1 > > It appears that when this happens the driver restarts (loads?) itself > however the job running at the time of the error is, of course, killed. > > Scott Scott, We are trying to reproduce this here. It would help if you could supply the following info: Host model for hosts which are experiencing the failure: Console output from the following linux commands: cat /etc/*rel* cat /etc/lilo.conf , or: cat /boot/grub/menu.lst (if you are using grub) uname -a cat /proc/cpuinfo cat /proc/meminfo Also, what sort of job was running when the failure occurred: -- which MPI are you using? -- do you have a test example which we can run here to reproduce the problem? Thanks in advance for your help! Jack Morgenstein Senior Software Development Engineer Mellanox From vlad at lists.openfabrics.org Thu Nov 6 03:19:51 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Thu, 6 Nov 2008 03:19:51 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081106-0200 daily build status Message-ID: <20081106111951.40BE3E60D8C@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From dorons at Voltaire.COM Thu Nov 6 03:59:44 2008 From: dorons at Voltaire.COM (Doron Shoham) Date: Thu, 06 Nov 2008 13:59:44 +0200 Subject: [ofa-general] [PATCH] export osm_log_max in MB In-Reply-To: <49101D1F.4040605@Voltaire.COM> References: <49101D1F.4040605@Voltaire.COM> Message-ID: <4912DC30.40309@Voltaire.COM> export the osm_log_max in MB when using 'opensm -c Signed-off-by: Doron Shoham --- opensm/opensm/osm_subnet.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 0422d0f..c130c0d 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -1668,7 +1668,7 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) "force_log_flush %s\n\n" "# Log file to be used\n" "log_file %s\n\n" - "# Limit the size of the log file. If overrun, log is restarted\n" + "# Limit the size of the log file in MB. If overrun, log is restarted\n" "log_max_size %lu\n\n" "# If TRUE will accumulate the log over multiple OpenSM sessions\n" "accum_log_file %s\n\n" @@ -1694,7 +1694,7 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) p_opts->log_flags, p_opts->force_log_flush ? "TRUE" : "FALSE", p_opts->log_file, - p_opts->log_max_size, + p_opts->log_max_size/1024/1024, p_opts->accum_log_file ? "TRUE" : "FALSE", p_opts->dump_files_dir, p_opts->enable_quirks ? "TRUE" : "FALSE", -- 1.5.3.8 From ogerlitz at voltaire.com Thu Nov 6 04:57:56 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 6 Nov 2008 14:57:56 +0200 (IST) Subject: [ofa-general] [PATCH] opensm: fix iser service-id used for SL assignment Message-ID: RFC3720 says: The well-known user TCP port number for iSCSI connections assigned by IANA is 3260 and this is the default iSCSI port. Implementations needing a system TCP port number may use port 860, the port assigned by IANA as the iSCSI system port; however in order to use port 860, it MUST be explicitly specified - implementations MUST NOT default to use of port 860, as 3260 is the only allowed default. Hence the SID used by iser is 0x0000000001060CBC and not 0x000000000106035C Signed-off-by: Or Gerlitz Signed-off-by: Eli Dorfman Index: opensm-3.2.2/doc/QoS_management_in_OpenSM.txt =================================================================== --- opensm-3.2.2.orig/doc/QoS_management_in_OpenSM.txt +++ opensm-3.2.2/doc/QoS_management_in_OpenSM.txt @@ -378,12 +378,12 @@ equivalent: 6.4 iSER Similar to RDS, iSER query is matched by Service ID, where the the Service ID -is also 0x000000000106PPPP. Default port number for iSER is 0x035C, which makes -a default Service-ID 0x000000000106035C. The following two match rules are +is also 0x000000000106PPPP. Default port number for iSER is 0x0CBC, which makes +a default Service-ID 0x0000000001060CBC. The following two match rules are equivalent: iser : - any, service-id 0x000000000106035C : + any, service-id 0x0000000001060CBC : 6.5 SRP Service ID for SRP varies from storage vendor to vendor, thus SRP query is Index: opensm-3.2.2/include/opensm/osm_qos_policy.h =================================================================== --- opensm-3.2.2.orig/include/opensm/osm_qos_policy.h +++ opensm-3.2.2/include/opensm/osm_qos_policy.h @@ -58,7 +58,7 @@ #define OSM_QOS_POLICY_ULP_RDS_SERVICE_ID 0x0000000001060000ULL #define OSM_QOS_POLICY_ULP_RDS_PORT 0x48CA #define OSM_QOS_POLICY_ULP_ISER_SERVICE_ID 0x0000000001060000ULL -#define OSM_QOS_POLICY_ULP_ISER_PORT 0x035C +#define OSM_QOS_POLICY_ULP_ISER_PORT 0x0CBC #define OSM_QOS_POLICY_NODE_TYPE_CA (((uint8_t)1)< References: Message-ID: On Thu, Nov 6, 2008 at 7:57 AM, Or Gerlitz wrote: > RFC3720 says: > > The well-known user TCP port number for iSCSI connections assigned by IANA is 3260 > and this is the default iSCSI port. Implementations needing a system TCP port number > may use port 860, the port assigned by IANA as the iSCSI system port; however in > order to use port 860, it MUST be explicitly specified - implementations MUST NOT > default to use of port 860, as 3260 is the only allowed default. > > Hence the SID used by iser is 0x0000000001060CBC and not 0x000000000106035C > > Signed-off-by: Or Gerlitz > Signed-off-by: Eli Dorfman > > Index: opensm-3.2.2/doc/QoS_management_in_OpenSM.txt > =================================================================== > --- opensm-3.2.2.orig/doc/QoS_management_in_OpenSM.txt > +++ opensm-3.2.2/doc/QoS_management_in_OpenSM.txt > @@ -378,12 +378,12 @@ equivalent: > > 6.4 iSER > Similar to RDS, iSER query is matched by Service ID, where the the Service ID > -is also 0x000000000106PPPP. Default port number for iSER is 0x035C, which makes > -a default Service-ID 0x000000000106035C. The following two match rules are > +is also 0x000000000106PPPP. Default port number for iSER is 0x0CBC, which makes > +a default Service-ID 0x0000000001060CBC. Should some mention of the prestandard port number be mentioned here for backward compatibility ? >The following two match rules are > equivalent: > > iser : > - any, service-id 0x000000000106035C : > + any, service-id 0x0000000001060CBC : > > 6.5 SRP > Service ID for SRP varies from storage vendor to vendor, thus SRP query is > Index: opensm-3.2.2/include/opensm/osm_qos_policy.h > =================================================================== > --- opensm-3.2.2.orig/include/opensm/osm_qos_policy.h > +++ opensm-3.2.2/include/opensm/osm_qos_policy.h > @@ -58,7 +58,7 @@ > #define OSM_QOS_POLICY_ULP_RDS_SERVICE_ID 0x0000000001060000ULL > #define OSM_QOS_POLICY_ULP_RDS_PORT 0x48CA > #define OSM_QOS_POLICY_ULP_ISER_SERVICE_ID 0x0000000001060000ULL > -#define OSM_QOS_POLICY_ULP_ISER_PORT 0x035C > +#define OSM_QOS_POLICY_ULP_ISER_PORT 0x0CBC > > #define OSM_QOS_POLICY_NODE_TYPE_CA (((uint8_t)1)< #define OSM_QOS_POLICY_NODE_TYPE_SWITCH (((uint8_t)1)< _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From hal.rosenstock at gmail.com Thu Nov 6 05:31:34 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 6 Nov 2008 08:31:34 -0500 Subject: [ofa-general] [PATCH] export osm_log_max in MB In-Reply-To: <4912DC30.40309@Voltaire.COM> References: <49101D1F.4040605@Voltaire.COM> <4912DC30.40309@Voltaire.COM> Message-ID: On Thu, Nov 6, 2008 at 6:59 AM, Doron Shoham wrote: > export the osm_log_max in MB when using 'opensm -c > > Signed-off-by: Doron Shoham > --- > opensm/opensm/osm_subnet.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c > index 0422d0f..c130c0d 100644 > --- a/opensm/opensm/osm_subnet.c > +++ b/opensm/opensm/osm_subnet.c > @@ -1668,7 +1668,7 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) > "force_log_flush %s\n\n" > "# Log file to be used\n" > "log_file %s\n\n" > - "# Limit the size of the log file. If overrun, log is restarted\n" > + "# Limit the size of the log file in MB. If overrun, log is restarted\n" > "log_max_size %lu\n\n" > "# If TRUE will accumulate the log over multiple OpenSM sessions\n" > "accum_log_file %s\n\n" > @@ -1694,7 +1694,7 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) > p_opts->log_flags, > p_opts->force_log_flush ? "TRUE" : "FALSE", > p_opts->log_file, > - p_opts->log_max_size, > + p_opts->log_max_size/1024/1024, > p_opts->accum_log_file ? "TRUE" : "FALSE", > p_opts->dump_files_dir, > p_opts->enable_quirks ? "TRUE" : "FALSE", Should your patch for adding opensm.conf to scripts should be updated to v2 ? -- Hal > -- > 1.5.3.8 > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From hal.rosenstock at gmail.com Thu Nov 6 05:33:03 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 6 Nov 2008 08:33:03 -0500 Subject: [ofa-general] [PATCH 1/2] fix default configuration files path In-Reply-To: <4912BD7C.1030603@Voltaire.COM> References: <4912BCFC.8030407@Voltaire.COM> <4912BD7C.1030603@Voltaire.COM> Message-ID: On Thu, Nov 6, 2008 at 4:48 AM, Doron Shoham wrote: > fix default configuration files path in QoS_management_in_OpenSM.txt file > from /usr/local/etc/opensm/ to /etc/opensm/ > > Signed-off-by: Doron Shoham > --- > opensm/doc/QoS_management_in_OpenSM.txt | 6 +++--- > 1 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/opensm/doc/QoS_management_in_OpenSM.txt b/opensm/doc/QoS_management_in_OpenSM.txt > index ba1b4b1..1a48b1a 100644 > --- a/opensm/doc/QoS_management_in_OpenSM.txt > +++ b/opensm/doc/QoS_management_in_OpenSM.txt > @@ -20,7 +20,7 @@ > > When QoS in OpenSM is enabled (-Q or --qos), OpenSM looks for QoS Policy file. > The default name of OpenSM QoS policy file is > -/usr/local/etc/opensm/qos-policy.conf. The default may be changed by using -Y > +/etc/opensm/qos-policy.conf. The default may be changed by using -Y > or --qos_policy_file option with OpenSM. > > During fabric initialization and at every heavy sweep OpenSM parses the QoS > @@ -67,7 +67,7 @@ This section describes how to set up SL2VL and VL Arbitration tables on > various nodes in the fabric. > However, this is not supported in OpenSM currently. > SL2VL and VLArb tables should be configured in the OpenSM options file > -(default location - /usr/local/etc/opensm/opensm.conf). > +(default location - /etc/opensm/opensm.conf). If this needs changing, aren't there similar changes needed in the opensm man page ? -- Hal > III) QoS Levels (denoted by qos-levels). > Each QoS Level defines Service Level (SL) and a few optional fields: > @@ -205,7 +205,7 @@ policy file and their syntax: > # Arbitration tables on various nodes in the fabric. > # However, this is not supported in OpenSM currently - the section is > # parsed and ignored. SL2VL and VLArb tables should be configured in the > - # OpenSM options file (by default - /usr/local/etc/opensm/opensm.conf). > + # OpenSM options file (by default - /etc/opensm/opensm.conf). > end-qos-setup > > qos-levels > -- > 1.5.3.8 > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From ogerlitz at voltaire.com Thu Nov 6 05:33:30 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 06 Nov 2008 15:33:30 +0200 Subject: [ofa-general] [PATCH] opensm: fix iser service-id used for SL assignment In-Reply-To: References: Message-ID: <4912F22A.4040802@voltaire.com> Hal Rosenstock wrote: >> --- opensm-3.2.2.orig/doc/QoS_management_in_OpenSM.txt >> +++ opensm-3.2.2/doc/QoS_management_in_OpenSM.txt >> @@ -378,12 +378,12 @@ equivalent: >> >> 6.4 iSER >> Similar to RDS, iSER query is matched by Service ID, where the the Service ID >> -is also 0x000000000106PPPP. Default port number for iSER is 0x035C, which makes >> -a default Service-ID 0x000000000106035C. The following two match rules are >> +is also 0x000000000106PPPP. Default port number for iSER is 0x0CBC, which makes >> +a default Service-ID 0x0000000001060CBC. > Should some mention of the prestandard port number be mentioned here for backward compatibility ? I don't think so as all the iser targets I'm aware too use port 3260, but if you want to, feel free to patch the patch Or. From ogerlitz at voltaire.com Thu Nov 6 05:38:04 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 6 Nov 2008 15:38:04 +0200 (IST) Subject: [ofa-general] Re: [PATCH] opensm: fix iser service-id used for SL assignment In-Reply-To: References: Message-ID: On Thu, 6 Nov 2008, Or Gerlitz wrote: > The well-known user TCP port number for iSCSI connections assigned by IANA is 3260 > and this is the default iSCSI port. Implementations needing a system TCP port number > may use port 860, the port assigned by IANA as the iSCSI system port; however in > order to use port 860, it MUST be explicitly specified - implementations MUST NOT > default to use of port 860, as 3260 is the only allowed default. > > Hence the SID used by iser is 0x0000000001060CBC and not 0x000000000106035C > Index: opensm-3.2.2/include/opensm/osm_qos_policy.h > =================================================================== > --- opensm-3.2.2.orig/include/opensm/osm_qos_policy.h > +++ opensm-3.2.2/include/opensm/osm_qos_policy.h > @@ -58,7 +58,7 @@ > #define OSM_QOS_POLICY_ULP_RDS_SERVICE_ID 0x0000000001060000ULL > #define OSM_QOS_POLICY_ULP_RDS_PORT 0x48CA > #define OSM_QOS_POLICY_ULP_ISER_SERVICE_ID 0x0000000001060000ULL > -#define OSM_QOS_POLICY_ULP_ISER_PORT 0x035C > +#define OSM_QOS_POLICY_ULP_ISER_PORT 0x0CBC BTW - while doing this fix, I noted that the port assumed by opensm for RDS is 18634 (0x48CA) which is the ones used in the rds code deployed in ofed 1.3.x, where the rds code based deployed into ofed 1.4.y uses port 18635 Andy, Rick, can you guys revert to 18634 to make things simpler wrt RDS/QoS configuration? Or. From dorons at Voltaire.COM Thu Nov 6 05:54:43 2008 From: dorons at Voltaire.COM (Doron Shoham) Date: Thu, 06 Nov 2008 15:54:43 +0200 Subject: [ofa-general] [PATCH 1/2] fix default configuration files path In-Reply-To: References: <4912BCFC.8030407@Voltaire.COM> <4912BD7C.1030603@Voltaire.COM> Message-ID: <4912F723.3090203@Voltaire.COM> Hal Rosenstock wrote: > On Thu, Nov 6, 2008 at 4:48 AM, Doron Shoham wrote: >> fix default configuration files path in QoS_management_in_OpenSM.txt file >> from /usr/local/etc/opensm/ to /etc/opensm/ >> >> Signed-off-by: Doron Shoham >> --- >> opensm/doc/QoS_management_in_OpenSM.txt | 6 +++--- >> 1 files changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/opensm/doc/QoS_management_in_OpenSM.txt b/opensm/doc/QoS_management_in_OpenSM.txt >> index ba1b4b1..1a48b1a 100644 >> --- a/opensm/doc/QoS_management_in_OpenSM.txt >> +++ b/opensm/doc/QoS_management_in_OpenSM.txt >> @@ -20,7 +20,7 @@ >> >> When QoS in OpenSM is enabled (-Q or --qos), OpenSM looks for QoS Policy file. >> The default name of OpenSM QoS policy file is >> -/usr/local/etc/opensm/qos-policy.conf. The default may be changed by using -Y >> +/etc/opensm/qos-policy.conf. The default may be changed by using -Y >> or --qos_policy_file option with OpenSM. >> >> During fabric initialization and at every heavy sweep OpenSM parses the QoS >> @@ -67,7 +67,7 @@ This section describes how to set up SL2VL and VL Arbitration tables on >> various nodes in the fabric. >> However, this is not supported in OpenSM currently. >> SL2VL and VLArb tables should be configured in the OpenSM options file >> -(default location - /usr/local/etc/opensm/opensm.conf). >> +(default location - /etc/opensm/opensm.conf). > > If this needs changing, aren't there similar changes needed in the > opensm man page ? > > -- Hal > No, in the man page: /etc/opensm/qos-policy.conf default QOS policy config file >> III) QoS Levels (denoted by qos-levels). >> Each QoS Level defines Service Level (SL) and a few optional fields: >> @@ -205,7 +205,7 @@ policy file and their syntax: >> # Arbitration tables on various nodes in the fabric. >> # However, this is not supported in OpenSM currently - the section is >> # parsed and ignored. SL2VL and VLArb tables should be configured in the >> - # OpenSM options file (by default - /usr/local/etc/opensm/opensm.conf). >> + # OpenSM options file (by default - /etc/opensm/opensm.conf). >> end-qos-setup >> >> qos-levels >> -- >> 1.5.3.8 >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >> From dorons at Voltaire.COM Thu Nov 6 05:57:11 2008 From: dorons at Voltaire.COM (Doron Shoham) Date: Thu, 06 Nov 2008 15:57:11 +0200 Subject: [ofa-general] [PATCH] export osm_log_max in MB In-Reply-To: References: <49101D1F.4040605@Voltaire.COM> <4912DC30.40309@Voltaire.COM> Message-ID: <4912F7B7.1000109@Voltaire.COM> Hal Rosenstock wrote: > On Thu, Nov 6, 2008 at 6:59 AM, Doron Shoham wrote: >> export the osm_log_max in MB when using 'opensm -c >> >> Signed-off-by: Doron Shoham >> --- >> opensm/opensm/osm_subnet.c | 4 ++-- >> 1 files changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c >> index 0422d0f..c130c0d 100644 >> --- a/opensm/opensm/osm_subnet.c >> +++ b/opensm/opensm/osm_subnet.c >> @@ -1668,7 +1668,7 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) >> "force_log_flush %s\n\n" >> "# Log file to be used\n" >> "log_file %s\n\n" >> - "# Limit the size of the log file. If overrun, log is restarted\n" >> + "# Limit the size of the log file in MB. If overrun, log is restarted\n" >> "log_max_size %lu\n\n" >> "# If TRUE will accumulate the log over multiple OpenSM sessions\n" >> "accum_log_file %s\n\n" >> @@ -1694,7 +1694,7 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) >> p_opts->log_flags, >> p_opts->force_log_flush ? "TRUE" : "FALSE", >> p_opts->log_file, >> - p_opts->log_max_size, >> + p_opts->log_max_size/1024/1024, >> p_opts->accum_log_file ? "TRUE" : "FALSE", >> p_opts->dump_files_dir, >> p_opts->enable_quirks ? "TRUE" : "FALSE", > > Should your patch for adding opensm.conf to scripts should be updated to v2 ? > > -- Hal > Can you please explain? Thanks, Doron >> -- >> 1.5.3.8 >> >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >> From hal.rosenstock at gmail.com Thu Nov 6 06:03:03 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 6 Nov 2008 09:03:03 -0500 Subject: [ofa-general] [PATCH] export osm_log_max in MB In-Reply-To: <4912F7B7.1000109@Voltaire.COM> References: <49101D1F.4040605@Voltaire.COM> <4912DC30.40309@Voltaire.COM> <4912F7B7.1000109@Voltaire.COM> Message-ID: On Thu, Nov 6, 2008 at 8:57 AM, Doron Shoham wrote: > Hal Rosenstock wrote: >> On Thu, Nov 6, 2008 at 6:59 AM, Doron Shoham wrote: >>> export the osm_log_max in MB when using 'opensm -c >>> >>> Signed-off-by: Doron Shoham >>> --- >>> opensm/opensm/osm_subnet.c | 4 ++-- >>> 1 files changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c >>> index 0422d0f..c130c0d 100644 >>> --- a/opensm/opensm/osm_subnet.c >>> +++ b/opensm/opensm/osm_subnet.c >>> @@ -1668,7 +1668,7 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) >>> "force_log_flush %s\n\n" >>> "# Log file to be used\n" >>> "log_file %s\n\n" >>> - "# Limit the size of the log file. If overrun, log is restarted\n" >>> + "# Limit the size of the log file in MB. If overrun, log is restarted\n" >>> "log_max_size %lu\n\n" >>> "# If TRUE will accumulate the log over multiple OpenSM sessions\n" >>> "accum_log_file %s\n\n" >>> @@ -1694,7 +1694,7 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) >>> p_opts->log_flags, >>> p_opts->force_log_flush ? "TRUE" : "FALSE", >>> p_opts->log_file, >>> - p_opts->log_max_size, >>> + p_opts->log_max_size/1024/1024, >>> p_opts->accum_log_file ? "TRUE" : "FALSE", >>> p_opts->dump_files_dir, >>> p_opts->enable_quirks ? "TRUE" : "FALSE", >> >> Should your patch for adding opensm.conf to scripts should be updated to v2 ? >> >> -- Hal >> > > Can you please explain? Doesn't this change these lines (a comment and the value of log_max_size) in the opensm.conf file which you are proposing to be added into scripts ? -- Hal > > Thanks, > Doron > >>> -- >>> 1.5.3.8 >>> >>> >>> _______________________________________________ >>> general mailing list >>> general at lists.openfabrics.org >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>> >>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >>> > > From dorons at voltaire.com Thu Nov 6 06:22:03 2008 From: dorons at voltaire.com (Doron Shoham) Date: Thu, 06 Nov 2008 16:22:03 +0200 Subject: [ofa-general] [PATCH] export osm_log_max in MB In-Reply-To: References: <49101D1F.4040605@Voltaire.COM> <4912DC30.40309@Voltaire.COM> <4912F7B7.1000109@Voltaire.COM> Message-ID: <4912FD8B.9070304@voltaire.com> Hal Rosenstock wrote: > On Thu, Nov 6, 2008 at 8:57 AM, Doron Shoham wrote: >> Hal Rosenstock wrote: >>> On Thu, Nov 6, 2008 at 6:59 AM, Doron Shoham wrote: >>>> export the osm_log_max in MB when using 'opensm -c >>>> >>>> Signed-off-by: Doron Shoham >>>> --- >>>> opensm/opensm/osm_subnet.c | 4 ++-- >>>> 1 files changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c >>>> index 0422d0f..c130c0d 100644 >>>> --- a/opensm/opensm/osm_subnet.c >>>> +++ b/opensm/opensm/osm_subnet.c >>>> @@ -1668,7 +1668,7 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) >>>> "force_log_flush %s\n\n" >>>> "# Log file to be used\n" >>>> "log_file %s\n\n" >>>> - "# Limit the size of the log file. If overrun, log is restarted\n" >>>> + "# Limit the size of the log file in MB. If overrun, log is restarted\n" >>>> "log_max_size %lu\n\n" >>>> "# If TRUE will accumulate the log over multiple OpenSM sessions\n" >>>> "accum_log_file %s\n\n" >>>> @@ -1694,7 +1694,7 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) >>>> p_opts->log_flags, >>>> p_opts->force_log_flush ? "TRUE" : "FALSE", >>>> p_opts->log_file, >>>> - p_opts->log_max_size, >>>> + p_opts->log_max_size/1024/1024, >>>> p_opts->accum_log_file ? "TRUE" : "FALSE", >>>> p_opts->dump_files_dir, >>>> p_opts->enable_quirks ? "TRUE" : "FALSE", >>> Should your patch for adding opensm.conf to scripts should be updated to v2 ? >>> >>> -- Hal >>> >> Can you please explain? > > Doesn't this change these lines (a comment and the value of > log_max_size) in the opensm.conf file which you are proposing to be > added into scripts ? > > -- Hal > The first patch converts the log_size from opensm.conf to MB. The second one converts in the opposite direction when opensm dump its configuration. >> Thanks, >> Doron >> >>>> -- >>>> 1.5.3.8 >>>> >>>> >>>> _______________________________________________ >>>> general mailing list >>>> general at lists.openfabrics.org >>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>>> >>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >>>> >> From kelly at tradebotsystems.com Thu Nov 6 06:24:57 2008 From: kelly at tradebotsystems.com (Kelly Burkhart) Date: Thu, 6 Nov 2008 08:24:57 -0600 Subject: [ofa-general] infiniband multicast (libibverbs) Message-ID: <98B0CDCB28A5EE4CB3678CD99406644E34349F@tbmail2.tradebot.com> I believe that the problem was that prior to receiving any messages, I was posting many recvs using the the same buffer. I was expecting that the buffer wouldn't be filled until I polled the cq for the completion. Instead, it appears that my buffer was being filled and then over filled as fast as messages came in. So when I polled for the completion of the fifth message, the buffer may already contain the tenth. To resolve the issue, I created larger memory region and considered it a circular buffer. When I advance posted my recvs, each WR pointed to a different portion of the MR. Now I should only have problems If I can't process messages fast enough and my buffer wraps. Thanks, -K ________________________________ From: Devesh Sharma [mailto:devesh28 at gmail.com] Sent: Thursday, November 06, 2008 3:09 AM To: Kelly Burkhart Cc: Roland Dreier; general at lists.openfabrics.org Subject: Re: [ofa-general] infiniband multicast (libibverbs) ok, try to do sequence number check after a slight delay say after 100ns delay. Is it possible that DMA latancies are comming into picture? Roland or Dotan can comment on this! On 11/5/08, Kelly Burkhart wrote: It is non-blocking. I spin, calling ibv_poll_cq until it returns a non-zero. From dorons at Voltaire.COM Thu Nov 6 06:26:43 2008 From: dorons at Voltaire.COM (Doron Shoham) Date: Thu, 06 Nov 2008 16:26:43 +0200 Subject: [ofa-general] [PATCH] limit log records number and size Message-ID: <4912FEA3.3090409@Voltaire.COM> limit log records number and size Signed-off-by: Doron Shoham --- opensm/scripts/opensm.logrotate | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/opensm/scripts/opensm.logrotate b/opensm/scripts/opensm.logrotate index e16e227..e0f4125 100644 --- a/opensm/scripts/opensm.logrotate +++ b/opensm/scripts/opensm.logrotate @@ -4,4 +4,6 @@ copytruncate weekly compress + rotate 10 + size 100M } -- 1.5.3.8 From hal.rosenstock at gmail.com Thu Nov 6 06:29:09 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 6 Nov 2008 09:29:09 -0500 Subject: [ofa-general] [PATCH] export osm_log_max in MB In-Reply-To: <4912FD8B.9070304@voltaire.com> References: <49101D1F.4040605@Voltaire.COM> <4912DC30.40309@Voltaire.COM> <4912F7B7.1000109@Voltaire.COM> <4912FD8B.9070304@voltaire.com> Message-ID: On Thu, Nov 6, 2008 at 9:22 AM, Doron Shoham wrote: > Hal Rosenstock wrote: >> >> On Thu, Nov 6, 2008 at 8:57 AM, Doron Shoham wrote: >>> >>> Hal Rosenstock wrote: >>>> >>>> On Thu, Nov 6, 2008 at 6:59 AM, Doron Shoham >>>> wrote: >>>>> >>>>> export the osm_log_max in MB when using 'opensm -c >>>>> >>>>> Signed-off-by: Doron Shoham >>>>> --- >>>>> opensm/opensm/osm_subnet.c | 4 ++-- >>>>> 1 files changed, 2 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c >>>>> index 0422d0f..c130c0d 100644 >>>>> --- a/opensm/opensm/osm_subnet.c >>>>> +++ b/opensm/opensm/osm_subnet.c >>>>> @@ -1668,7 +1668,7 @@ int osm_subn_write_conf_file(char *file_name, IN >>>>> osm_subn_opt_t *const p_opts) >>>>> "force_log_flush %s\n\n" >>>>> "# Log file to be used\n" >>>>> "log_file %s\n\n" >>>>> - "# Limit the size of the log file. If overrun, log is >>>>> restarted\n" >>>>> + "# Limit the size of the log file in MB. If overrun, >>>>> log is restarted\n" >>>>> "log_max_size %lu\n\n" >>>>> "# If TRUE will accumulate the log over multiple OpenSM >>>>> sessions\n" >>>>> "accum_log_file %s\n\n" >>>>> @@ -1694,7 +1694,7 @@ int osm_subn_write_conf_file(char *file_name, IN >>>>> osm_subn_opt_t *const p_opts) >>>>> p_opts->log_flags, >>>>> p_opts->force_log_flush ? "TRUE" : "FALSE", >>>>> p_opts->log_file, >>>>> - p_opts->log_max_size, >>>>> + p_opts->log_max_size/1024/1024, >>>>> p_opts->accum_log_file ? "TRUE" : "FALSE", >>>>> p_opts->dump_files_dir, >>>>> p_opts->enable_quirks ? "TRUE" : "FALSE", >>>> >>>> Should your patch for adding opensm.conf to scripts should be updated to >>>> v2 ? >>>> >>>> -- Hal >>>> >>> Can you please explain? >> >> Doesn't this change these lines (a comment and the value of >> log_max_size) in the opensm.conf file which you are proposing to be >> added into scripts ? Understood. It's a nit but I was referring to "[PATCH 1/2] add default configuration files" where in opensm.conf there is: +# Limit the size(MB) of the log file. If overrun, log is restarted +log_max_size 4096 -- Hal >> >> -- Hal >> > > The first patch converts the log_size from opensm.conf to MB. > The second one converts in the opposite direction when opensm dump > its configuration. > > >>> Thanks, >>> Doron >>> >>>>> -- >>>>> 1.5.3.8 >>>>> >>>>> >>>>> _______________________________________________ >>>>> general mailing list >>>>> general at lists.openfabrics.org >>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>>>> >>>>> To unsubscribe, please visit >>>>> http://openib.org/mailman/listinfo/openib-general >>>>> >>> > > From jackm at dev.mellanox.co.il Thu Nov 6 07:12:50 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 6 Nov 2008 17:12:50 +0200 Subject: [ofa-general] [PATCH] ipoib: null tx/rx_ring skb pointers on free In-Reply-To: <20081106012307.GP31163@sgi.com> References: <20081106012307.GP31163@sgi.com> Message-ID: <200811061712.50605.jackm@dev.mellanox.co.il> On Thursday 06 November 2008 03:23, akepner at sgi.com wrote: > I described an IPoIB-related panic we were seeing on large > clusters. The signature was a backtrace like this: > >         skb_over_panic >         :ib_ipoib:ipoib_ib_handle_rx_wc >         :ib_ipoib:ipoib_poll >         net_rx_action >         ..... > > The bug is difficult to reproduce, but we finally got a crashdump, > and the problem appears to be that stale skb pointers on the tx_ring > were left pointing to skbs that had been since reused, so that the > skb's data region was now unexpectedly short, etc. > How does ipoib_ib_handle_rx_wc() involve the tx_ring? This is receive processing. - Jack From akepner at sgi.com Thu Nov 6 08:04:28 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Thu, 6 Nov 2008 08:04:28 -0800 Subject: [ofa-general] [PATCH] ipoib: null tx/rx_ring skb pointers on free In-Reply-To: <20081106084031.GA25354@mtls03> References: <20081106012307.GP31163@sgi.com> <20081106084031.GA25354@mtls03> Message-ID: <20081106160428.GR31163@sgi.com> On Thu, Nov 06, 2008 at 10:40:32AM +0200, Eli Cohen wrote: > On Wed, Nov 05, 2008 at 05:23:07PM -0800, akepner at sgi.com wrote: > ... > looking a the patch I don't understand why it should fix the problem > you're seeing. I suspect we may be hiding the problem. > I think that may be correct. For the stale skb pointers to be reused by the ipoib driver, it looks like we'd need to get 'unexpected' completions. -- Arthur From akepner at sgi.com Thu Nov 6 08:40:05 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Thu, 6 Nov 2008 08:40:05 -0800 Subject: [ofa-general] [PATCH] ipoib: null tx/rx_ring skb pointers on free In-Reply-To: <200811061712.50605.jackm@dev.mellanox.co.il> References: <20081106012307.GP31163@sgi.com> <200811061712.50605.jackm@dev.mellanox.co.il> Message-ID: <20081106164005.GS31163@sgi.com> On Thu, Nov 06, 2008 at 05:12:50PM +0200, Jack Morgenstein wrote: > On Thursday 06 November 2008 03:23, akepner at sgi.com wrote: > > I described an IPoIB-related panic we were seeing on large > > clusters. The signature was a backtrace like this: > > > >         skb_over_panic > >         :ib_ipoib:ipoib_ib_handle_rx_wc > >         :ib_ipoib:ipoib_poll > >         net_rx_action > >         ..... > > > > The bug is difficult to reproduce, but we finally got a crashdump, > > and the problem appears to be that stale skb pointers on the tx_ring > > were left pointing to skbs that had been since reused, so that the > > skb's data region was now unexpectedly short, etc. > > > How does ipoib_ib_handle_rx_wc() involve the tx_ring? This is > receive processing. > What I surmise may be happening is something like this: - tx skb is freed, but a stale pointer remains on tx_ring - the same skb is reallocated, and added to the rx_ring - now we get an 'unexpected' tx completion, and use the stale skb pointer on the tx_ring to again free the skb (this step seems to invoke a f/w bug) - another driver, say an ethernet driver, reallocates the skb, reducing the extent of the data region (leading to the skb_over_panic once it's processed by ipoib_ib_handle_rx_wc) This bug leaves the tx and rx rings corrupted in many ways, including: - different rx_ring members refer to the same skb - different skbs on the rx_ring have identical data, head, end, tail ptrs - skbs on the rx_ring have sizes inconsistent with what the ipoib driver allocates (which causes the skb_over_panic, of course) - rx skbs have 'dev' pointers to ethernet devices - dma mappings in rx_ring aren't consistent with what's in skb - some skbs are simultaneously on the tx and rx rings -- Arthur From weiny2 at llnl.gov Thu Nov 6 09:11:13 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 6 Nov 2008 09:11:13 -0800 Subject: [ofa-general] [PATCH] ipoib: null tx/rx_ring skb pointers on free In-Reply-To: <20081106160428.GR31163@sgi.com> References: <20081106012307.GP31163@sgi.com> <20081106084031.GA25354@mtls03> <20081106160428.GR31163@sgi.com> Message-ID: <20081106091113.66bcff92.weiny2@llnl.gov> On Thu, 6 Nov 2008 08:04:28 -0800 akepner at sgi.com wrote: > On Thu, Nov 06, 2008 at 10:40:32AM +0200, Eli Cohen wrote: > > On Wed, Nov 05, 2008 at 05:23:07PM -0800, akepner at sgi.com wrote: > > ... > > looking a the patch I don't understand why it should fix the problem > > you're seeing. I suspect we may be hiding the problem. > > > > I think that may be correct. > > For the stale skb pointers to be reused by the ipoib driver, it > looks like we'd need to get 'unexpected' completions. > If this is the case we could use a debug patch which Al developed here which simply flags the skb as "invalid" but leaves the pointer. Then we could use that flag to determine when these "unexpected" completions are occuring. I can get the patch from Al if you would like. Ira From chu11 at llnl.gov Thu Nov 6 09:23:56 2008 From: chu11 at llnl.gov (Al Chu) Date: Thu, 06 Nov 2008 09:23:56 -0800 Subject: [ofa-general] [PATCH] ipoib: null tx/rx_ring skb pointers on free In-Reply-To: <20081106160428.GR31163@sgi.com> References: <20081106012307.GP31163@sgi.com> <20081106084031.GA25354@mtls03> <20081106160428.GR31163@sgi.com> Message-ID: <1225992236.13371.19.camel@cardanus.llnl.gov> On Thu, 2008-11-06 at 08:04 -0800, akepner at sgi.com wrote: > On Thu, Nov 06, 2008 at 10:40:32AM +0200, Eli Cohen wrote: > > On Wed, Nov 05, 2008 at 05:23:07PM -0800, akepner at sgi.com wrote: > > ... > > looking a the patch I don't understand why it should fix the problem > > you're seeing. I suspect we may be hiding the problem. > > > > I think that may be correct. > > For the stale skb pointers to be reused by the ipoib driver, it > looks like we'd need to get 'unexpected' completions. I implemented the attached cheapo-debug-patch and installed it on one of our clusters. We hit the error condition (the "Oh crap" error message) several times right before the same crashes. So I think Arthur's patch fixes something, although there may be a more core underlying issue yet to be solved. Al P.S. I should note that when debugging this, I was looking at a different stack trace than Arthur and Ira, but believed it to be the same core issue. -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: verify_skb_reset.patch Type: text/x-patch Size: 3196 bytes Desc: not available URL: From chu11 at llnl.gov Thu Nov 6 09:31:47 2008 From: chu11 at llnl.gov (Al Chu) Date: Thu, 06 Nov 2008 09:31:47 -0800 Subject: [ofa-general] [PATCH] ipoib: null tx/rx_ring skb pointers on free In-Reply-To: <20081106091113.66bcff92.weiny2@llnl.gov> References: <20081106012307.GP31163@sgi.com> <20081106084031.GA25354@mtls03> <20081106160428.GR31163@sgi.com> <20081106091113.66bcff92.weiny2@llnl.gov> Message-ID: <1225992707.13371.22.camel@cardanus.llnl.gov> On Thu, 2008-11-06 at 09:11 -0800, Ira Weiny wrote: > On Thu, 6 Nov 2008 08:04:28 -0800 > akepner at sgi.com wrote: > > > On Thu, Nov 06, 2008 at 10:40:32AM +0200, Eli Cohen wrote: > > > On Wed, Nov 05, 2008 at 05:23:07PM -0800, akepner at sgi.com wrote: > > > ... > > > looking a the patch I don't understand why it should fix the problem > > > you're seeing. I suspect we may be hiding the problem. > > > > > > > I think that may be correct. > > > > For the stale skb pointers to be reused by the ipoib driver, it > > looks like we'd need to get 'unexpected' completions. > > > > If this is the case we could use a debug patch which Al developed here which > simply flags the skb as "invalid" but leaves the pointer. Then we could use > that flag to determine when these "unexpected" completions are occuring. FYI, this is the patch I just posted. Al > I can get the patch from Al if you would like. > > Ira > -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From friedman at ucla.edu Thu Nov 6 10:34:57 2008 From: friedman at ucla.edu (Scott A. Friedman) Date: Thu, 06 Nov 2008 10:34:57 -0800 Subject: [ofa-general] ib_mthca catastrophic error detected In-Reply-To: <200811061154.02260.jackm@dev.mellanox.co.il> References: <4906645D.6010101@ucla.edu> <4907054E.9080205@mellanox.co.il> <490763D0.5020002@ucla.edu> <200811061154.02260.jackm@dev.mellanox.co.il> Message-ID: <491338D1.8050205@ucla.edu> Hi We have been working with Matthew Finlay on this recently - you/we might pull all of this together. We are able to make any of our sdr cards have a catastrophic error - and are unable to do the same with our ddr cards. Matt has suggested that there is a firmware fix possibly? Anyway, to answer your questions: The hosts are Sun X2200M, but we have swapped a few around with some hosts we have from Aspen systems and the problem remains. I suppose the similarity is that they are all nForce based. The MPI used was the latest OpenMPI - I will find the version, but I do not think it matters whether we are using OpenMPI or MVAPICH. The job itself does not seem to matter either. The situation is after a node comes up it takes a very long time for the card to become ACTIVE. It seems to ocsillate between ACTIVE and INIT. We have waited several minutes sometimes but can never be sure of when it will settle down. The queue certainly doesn't know and a job submitted to such a node will die as the cards will have a catastrophic error. Scott > Console output from the following linux commands: > cat /etc/*rel* Not a good idea...maybe this #cat /etc/redhat-release CentOS release 5 (Final) > cat /etc/lilo.conf , or: cat /boot/grub/menu.lst (if you are using grub) # grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file # NOTICE: You have a /boot partition. This means that # all kernel and initrd paths are relative to /boot/, eg. # root (hd0,0) # kernel /vmlinuz-version ro root=/dev/hda3 # initrd /initrd-version.img #boot=/dev/hda default=0 timeout=5 splashimage=(hd0,0)/grub/splash.xpm.gz hiddenmenu title CentOS (2.6.18-92.1.6.el5) root (hd0,0) kernel /vmlinuz-2.6.18-92.1.6.el5 ro root=LABEL=/ rhgb quiet initrd /initrd-2.6.18-92.1.6.el5.img > uname -a Linux n141 2.6.18-92.1.6.el5 #1 SMP Wed Jun 25 13:45:47 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux > cat /proc/cpuinfo > cat /proc/meminfo processor : 0 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4424.75 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 1 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4426.22 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 2 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4421.37 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 3 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4421.65 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 4 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 1 siblings : 4 core id : 0 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4422.36 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 5 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 1 siblings : 4 core id : 1 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4422.71 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 6 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 1 siblings : 4 core id : 2 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4422.17 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 7 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 1 siblings : 4 core id : 3 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4422.17 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] MemTotal: 8182568 kB MemFree: 4535892 kB Buffers: 318232 kB Cached: 1583772 kB SwapCached: 0 kB Active: 2714400 kB Inactive: 730260 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 8182568 kB LowFree: 4535892 kB SwapTotal: 8289532 kB SwapFree: 8289380 kB Dirty: 340 kB Writeback: 0 kB AnonPages: 1542636 kB Mapped: 14588 kB Slab: 139788 kB PageTables: 7208 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 12380816 kB Committed_AS: 1679420 kB VmallocTotal: 34359738367 kB VmallocUsed: 4600 kB VmallocChunk: 34359733707 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB Jack Morgenstein wrote: > On Tuesday 28 October 2008 21:11, Scott A. Friedman wrote: >> Hi >> >> This cluster has OFED 1.2.5.4 running on it. The ib_mthca kernel module >> reports the following on startup: >> >> ib_mthca: Mellanox InfiniBand HCA driver v1.0 (February 28, 2008) >> >> The cards in all (22) of the nodes we have seen this error on are as >> follows: >> >> hca_id: mthca0 >> fw_ver: 1.2.0 >> vendor_id: 0x02c9 >> vendor_part_id: 25204 >> hw_ver: 0xA0 >> board_id: MT_03B0140001 >> phys_port_cnt: 1 >> >> It appears that when this happens the driver restarts (loads?) itself >> however the job running at the time of the error is, of course, killed. >> >> Scott > > Scott, > We are trying to reproduce this here. It would help if you could supply > the following info: > > Host model for hosts which are experiencing the failure: > > Console output from the following linux commands: > cat /etc/*rel* > cat /etc/lilo.conf , or: cat /boot/grub/menu.lst (if you are using grub) > uname -a > cat /proc/cpuinfo > cat /proc/meminfo > > Also, what sort of job was running when the failure occurred: > -- which MPI are you using? > -- do you have a test example which we can run here to reproduce the problem? > > Thanks in advance for your help! > > Jack Morgenstein > Senior Software Development Engineer > Mellanox From andy.grover at oracle.com Thu Nov 6 10:58:24 2008 From: andy.grover at oracle.com (Andy Grover) Date: Thu, 06 Nov 2008 10:58:24 -0800 Subject: [ofa-general] Re: [PATCH] opensm: fix iser service-id used for SL assignment In-Reply-To: References: Message-ID: <49133E50.7090508@oracle.com> Or Gerlitz wrote: > BTW - while doing this fix, I noted that the port assumed by opensm for RDS is 18634 > (0x48CA) which is the ones used in the rds code deployed in ofed > 1.3.x, where the rds code based deployed into ofed 1.4.y uses port > 18635 > > Andy, Rick, can you guys revert to 18634 to make things simpler wrt > RDS/QoS configuration? It appears this is a fix for multiple rds transports each trying to bind to that port with INADDR_ANY, see commit f0af6566. I think the correct fix is to use a single port but have transports listen on their specific interfaces only. I think this is too big a fix for 1.4.0 so I will simply disable TCP transport there (leaving just IB transport, thus no problem) and move the port back to 18634. For 1.4.1 we will have multiple transports again and will need to fix this by not using INADDR_ANY, as described above. Regards -- Andy From swise at opengridcomputing.com Thu Nov 6 11:06:45 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 06 Nov 2008 13:06:45 -0600 Subject: [ofa-general] Re: [ewg] OFED Nov 3 2008 meeting summary on OFED 1.4 status In-Reply-To: <49102064.7080004@dev.mellanox.co.il> References: <49102064.7080004@dev.mellanox.co.il> Message-ID: <49134045.2080705@opengridcomputing.com> Hey Vlad, I opened a few critical bugs against cxgb3 for rhel4.x backport issues. We're trying to resolve them asap. When is the cutoff for making rc4? Thanks, Steve. Vladimir Sokolovsky wrote: > Meeting minutes on the web: > http://www.openfabrics.org/txt/documentation/linux/EWG_meeting_minutes/ > > Meeting Summary: > ============== > RC4 is delayed - will be released on Thursday Nov 6. > > Details: > ======= > Bugs to be fixed in RC4: > > 1283 blocker P1 RHEL 5 yannick.cote at qlogic.com NEW > Intel MPI fails on Qlogc HCA > 1326 blocker P1 RHEL 4 yannick.cote at qlogic.com NEW > ipath driver fails to build on IA64 in the 10/28/08 daily build > 1335 major P3 Other monis at voltaire.com NEW > Bonding: packet lost during failover > 1301 major P3 RHEL 4 olgas at voltaire.com NEW > Can not load rds module on RH4 up7 > 1323 blocker P1 All stefan.roscher at de.ibm.com REOPENED > IB/ehca: possibillity of kernel panic under certain circumstances > 1242 critical P2 RHEL 4 yannick.cote at qlogic.com NEW > kernel panic while running mpi2007 against ofed1.4 -- ib_ipath: > ipath_sdma_verbs_send > 1336 critical P1 RHEL 5 bugzilla at openib.org NEW > Can't to unloading the mlx4_ib module on ppc64 > > Regards, > Vladimir > > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg From jon at opengridcomputing.com Thu Nov 6 12:23:22 2008 From: jon at opengridcomputing.com (Jon Mason) Date: Thu, 6 Nov 2008 14:23:22 -0600 Subject: [ofa-general] Re: [PATCH] opensm: fix iser service-id used for SL assignment In-Reply-To: <49133E50.7090508@oracle.com> References: <49133E50.7090508@oracle.com> Message-ID: <20081106202322.GE15978@opengridcomputing.com> On Thu, Nov 06, 2008 at 10:58:24AM -0800, Andy Grover wrote: > Or Gerlitz wrote: > > BTW - while doing this fix, I noted that the port assumed by opensm > for RDS is 18634 > > (0x48CA) which is the ones used in the rds code deployed in ofed > > 1.3.x, where the rds code based deployed into ofed 1.4.y uses port > > 18635 > > > > Andy, Rick, can you guys revert to 18634 to make things simpler wrt > > RDS/QoS configuration? > > It appears this is a fix for multiple rds transports each trying to bind > to that port with INADDR_ANY, see commit f0af6566. I think the correct > fix is to use a single port but have transports listen on their specific > interfaces only. > > I think this is too big a fix for 1.4.0 so I will simply disable TCP > transport there (leaving just IB transport, thus no problem) and move > the port back to 18634. For 1.4.1 we will have multiple transports again > and will need to fix this by not using INADDR_ANY, as described above. There needs to be a separate port for all the interfaces. IIRC, each RDS transport type is listening on a specific port for incoming connections. With each one squatting, the other ones will receive incoming connections. So for the existing iWARP setup in RDS, they must be separate. If they are migrated to a specific physical port or IP address/port tuple, then this is not an issue. Also, there should be a standard port to listen on (and not squat on an ephemeral port, as this can cause problems). Thanks, Jon > > Regards -- Andy From or.gerlitz at gmail.com Thu Nov 6 13:25:46 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Thu, 6 Nov 2008 23:25:46 +0200 Subject: [ofa-general] Re: [PATCH] opensm: fix iser service-id used for SL assignment In-Reply-To: <49133E50.7090508@oracle.com> References: <49133E50.7090508@oracle.com> Message-ID: <15ddcffd0811061325m2dcad0e9i789fa873d76aaa7f@mail.gmail.com> On Thu, Nov 6, 2008 at 8:58 PM, Andy Grover wrote: >> can you guys revert to 18634 to make things simpler wrt RDS/QoS configuration? > It appears this is a fix for multiple rds transports each trying to bind to that port with > INADDR_ANY, see commit f0af6566. I think the correct fix is to use a single port but have > transports listen on their specific interfaces only. Andy, commit f0af6566 came to handle the case where RDS iWARP and RDS TCP listeners co-exist on the same node. In the general case, there's would be no special interface nor IP address for iWARP, the same IP is used for both TCP connections served by the OS stack and iWARP connections serves by the NIC TOE stack. This creates the "TOE port space problem" since when the NIC gets TCP connection request for port X it has no clue if it need to be served by the TOE stack or the OS stack, so RDS iWARP connection request to port 18635 could be routed to iperf server that was spawned to listen on that port. A possible solution that was suggested by the iWARP guys was to have the TCP port space being shared between TCP and RDMA listeners, currently the Linux kernel netdev maintainers are not willing to accept such patch, and the current suggestion was applied in ofed 1.4 see cma_0100_unified_tcp_ports.patch under kernel_patches/fixes > I think this is too big a fix for 1.4.0 so I will simply disable TCP transport there (leaving just > IB transport, thus no problem) and move the port back to 18634. For 1.4.1 we will have > multiple transports again and will need to fix this by not using INADDR_ANY, as described > above. Yes, lets have the IB transport use port 18634. As I explained above the INADDR_ANY usage is not the problem. Or. From andy.grover at oracle.com Thu Nov 6 13:41:01 2008 From: andy.grover at oracle.com (Andy Grover) Date: Thu, 06 Nov 2008 13:41:01 -0800 Subject: [ofa-general] Re: [PATCH] opensm: fix iser service-id used for SL assignment In-Reply-To: <15ddcffd0811061325m2dcad0e9i789fa873d76aaa7f@mail.gmail.com> References: <49133E50.7090508@oracle.com> <15ddcffd0811061325m2dcad0e9i789fa873d76aaa7f@mail.gmail.com> Message-ID: <4913646D.6080308@oracle.com> Hi Vlad, please pull the below trees. They remove the unused RDS TCP transport in 1.3 and 1.4. I've verified they do not break the build: OFED 1.3: Andy Grover (1): RDS: Remove TCP transport www.openfabrics.org:/pub/scm/~agrover/ofed_1_3/linux-2.6.git code-drop/20081106 OFED 1.4: Andy Grover (2): RDS: Remove TCP transport RDS: Change listen port back to 18634 www.openfabrics.org:/pub/scm/~agrover/ofed_1_4/linux-2.6.git code-drop/20081106 Thanks -- Andy From swise at opengridcomputing.com Thu Nov 6 15:06:42 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 06 Nov 2008 17:06:42 -0600 Subject: [ofa-general] [PATCH 2.6.28] RDMA/cxgb3: deadlock in iw_cxgb3 can cause hang when configuring interface. Message-ID: <20081106230642.28808.66765.stgit@dell3.ogc.int> From: Steve Wise When the iw_cxgb3 module's cxgb3_client "add" func gets called by the cxgb3 module, the iwarp driver ends up calling the ethtool ops get_drvinfo function in cxgb3 to get the fw version and other info. Currently the iwarp driver grabs the rtnl lock around this down call to serialize. As of 2.6.27 or so, things changed such that the rtnl lock is held around the call to the netdev driver open function. Also the cxgb3_client "add" function doesn't get called if the device is down. So, if you load cxgb3, then load iw_cxgb3, then ifconfig up the device, the iw_cxgb3 add func gets called with the rtnl_lock held. If you load cxgb3, ifconfig up the device, then load iw_cxgb3, the add func gets called without the rtnl_lock held. The former causes the deadlock, the latter does not. In addition, there are iw_cxgb3 sysfs handlers that also can call down into cxgb3 to gather the fw and hw versions. These can be called concurrently on different processors and at any time. Thus we need to push this serialization down in the cxgb3 driver get_drvinfo func. The fix is to remove rtnl lock usage, and use a per-device lock in cxgb3. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 6 ------ drivers/net/cxgb3/cxgb3_main.c | 2 ++ 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index ecff980..160ef48 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -1102,9 +1102,7 @@ static u64 fw_vers_string_to_u64(struct iwch_dev *iwch_dev) char *cp, *next; unsigned fw_maj, fw_min, fw_mic; - rtnl_lock(); lldev->ethtool_ops->get_drvinfo(lldev, &info); - rtnl_unlock(); next = info.fw_version + 1; cp = strsep(&next, "."); @@ -1192,9 +1190,7 @@ static ssize_t show_fw_ver(struct device *dev, struct device_attribute *attr, ch struct net_device *lldev = iwch_dev->rdev.t3cdev_p->lldev; PDBG("%s dev 0x%p\n", __func__, dev); - rtnl_lock(); lldev->ethtool_ops->get_drvinfo(lldev, &info); - rtnl_unlock(); return sprintf(buf, "%s\n", info.fw_version); } @@ -1207,9 +1203,7 @@ static ssize_t show_hca(struct device *dev, struct device_attribute *attr, struct net_device *lldev = iwch_dev->rdev.t3cdev_p->lldev; PDBG("%s dev 0x%p\n", __func__, dev); - rtnl_lock(); lldev->ethtool_ops->get_drvinfo(lldev, &info); - rtnl_unlock(); return sprintf(buf, "%s\n", info.driver); } diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c index 1ace41a..5e663cc 100644 --- a/drivers/net/cxgb3/cxgb3_main.c +++ b/drivers/net/cxgb3/cxgb3_main.c @@ -1307,8 +1307,10 @@ static void get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) u32 fw_vers = 0; u32 tp_vers = 0; + spin_lock(&adapter->stats_lock); t3_get_fw_version(adapter, &fw_vers); t3_get_tp_version(adapter, &tp_vers); + spin_unlock(&adapter->stats_lock); strcpy(info->driver, DRV_NAME); strcpy(info->version, DRV_VERSION); From divy at chelsio.com Thu Nov 6 15:27:21 2008 From: divy at chelsio.com (Divy Le Ray) Date: Thu, 06 Nov 2008 15:27:21 -0800 Subject: [ofa-general] Re: [PATCH 2.6.28] RDMA/cxgb3: deadlock in iw_cxgb3 can cause hang when configuring interface. In-Reply-To: <20081106230642.28808.66765.stgit@dell3.ogc.int> References: <20081106230642.28808.66765.stgit@dell3.ogc.int> Message-ID: <49137D59.9070306@chelsio.com> Steve Wise wrote: > From: Steve Wise > > When the iw_cxgb3 module's cxgb3_client "add" func gets called by the > cxgb3 module, the iwarp driver ends up calling the ethtool ops get_drvinfo > function in cxgb3 to get the fw version and other info. Currently the > iwarp driver grabs the rtnl lock around this down call to serialize. > As of 2.6.27 or so, things changed such that the rtnl lock is held around > the call to the netdev driver open function. Also the cxgb3_client "add" > function doesn't get called if the device is down. > > So, if you load cxgb3, then load iw_cxgb3, then ifconfig up the device, > the iw_cxgb3 add func gets called with the rtnl_lock held. If you > load cxgb3, ifconfig up the device, then load iw_cxgb3, the add func > gets called without the rtnl_lock held. The former causes the deadlock, > the latter does not. > > In addition, there are iw_cxgb3 sysfs handlers that also can call > down into cxgb3 to gather the fw and hw versions. These can be called > concurrently on different processors and at any time. Thus we need to > push this serialization down in the cxgb3 driver get_drvinfo func. > > The fix is to remove rtnl lock usage, and use a per-device lock in cxgb3. > > Signed-off-by: Steve Wise > Acked-by: Divy Le Ray > --- > > drivers/infiniband/hw/cxgb3/iwch_provider.c | 6 ------ > drivers/net/cxgb3/cxgb3_main.c | 2 ++ > 2 files changed, 2 insertions(+), 6 deletions(-) > > diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c > index ecff980..160ef48 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c > @@ -1102,9 +1102,7 @@ static u64 fw_vers_string_to_u64(struct iwch_dev *iwch_dev) > char *cp, *next; > unsigned fw_maj, fw_min, fw_mic; > > - rtnl_lock(); > lldev->ethtool_ops->get_drvinfo(lldev, &info); > - rtnl_unlock(); > > next = info.fw_version + 1; > cp = strsep(&next, "."); > @@ -1192,9 +1190,7 @@ static ssize_t show_fw_ver(struct device *dev, struct device_attribute *attr, ch > struct net_device *lldev = iwch_dev->rdev.t3cdev_p->lldev; > > PDBG("%s dev 0x%p\n", __func__, dev); > - rtnl_lock(); > lldev->ethtool_ops->get_drvinfo(lldev, &info); > - rtnl_unlock(); > return sprintf(buf, "%s\n", info.fw_version); > } > > @@ -1207,9 +1203,7 @@ static ssize_t show_hca(struct device *dev, struct device_attribute *attr, > struct net_device *lldev = iwch_dev->rdev.t3cdev_p->lldev; > > PDBG("%s dev 0x%p\n", __func__, dev); > - rtnl_lock(); > lldev->ethtool_ops->get_drvinfo(lldev, &info); > - rtnl_unlock(); > return sprintf(buf, "%s\n", info.driver); > } > > diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c > index 1ace41a..5e663cc 100644 > --- a/drivers/net/cxgb3/cxgb3_main.c > +++ b/drivers/net/cxgb3/cxgb3_main.c > @@ -1307,8 +1307,10 @@ static void get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) > u32 fw_vers = 0; > u32 tp_vers = 0; > > + spin_lock(&adapter->stats_lock); > t3_get_fw_version(adapter, &fw_vers); > t3_get_tp_version(adapter, &tp_vers); > + spin_unlock(&adapter->stats_lock); > > strcpy(info->driver, DRV_NAME); > strcpy(info->version, DRV_VERSION); > From panda at cse.ohio-state.edu Thu Nov 6 23:02:30 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri, 7 Nov 2008 02:02:30 -0500 (EST) Subject: [ofa-general] Announcing the release of MVAPICH2 1.2 Message-ID: The MVAPICH team is pleased to announce the availability of MVAPICH2-1.2 with the following NEW features: - Scalable and robust daemon-less job startup - Enhanced and robust mpirun_rsh framework (non-MPD-based) to provide scalable job launching on multi-thousand core clusters - Available for OpenFabrics (IB and iWARP) and uDAPL interfaces (including Solaris) - Support for Totalview debugger - Checkpoint-restart with intra-node shared memory support - Allows best performance and scalability with fault-tolerance support - Enhancement to software installation - Full autoconf-based configuration - Automatically detects system architecture and adapter types and optimizes MVAPICH2 for any particular installation - An application (mpiname) for querying the MVAPICH2 library version and configuration information - Enhanced processor affinity using PLPA for multi-core architectures - Allows user-defined flexible processor affinity - Enhanced scalability for RDMA-based direct one-sided communication with less communication resource - Available for OpenFabrics (IB and iWARP) interfaces - Shared memory optimized algorithm for MPI_Bcast operation - Optimized and tuned MPI_Alltoall - Based on MPICH2 1.0.7 More details on all features and supported platforms can be obtained by visiting the following URL: http://mvapich.cse.ohio-state.edu/overview/mvapich2/features.shtml MVAPICH2 1.2 is being made available with OFED 1.4. It is also tested with OFED 1.3. It continues to deliver excellent performance. Sample performance numbers include: OpenFabrics/Gen2 on EM64T quad-core with PCIe-Gen2 and ConnectX-QDR: Two-sided operations: - 1.25 microsec one-way latency (4 bytes) - 2573 MB/sec unidirectional bandwidth - 5037 MB/sec bidirectional bandwidth One-sided operations: - 2.73 microsec Put latency (4 bytes) - 2576 MB/sec unidirectional Put bandwidth - 4921 MB/sec bidirectional Put bandwidth Performance numbers for several other platforms, system configurations and operations can be viewed by visiting `Performance' section of the project's web page. For downloading MVAPICH2 1.2 package and accessing the anonymous SVN, please visit the following URL: http://mvapich.cse.ohio-state.edu/ All feedbacks, including bug reports, hints for performance tuning, patches and enhancements are welcome. Please post it to the mvapich-discuss mailing list. Thanks, The MVAPICH Team From o.w.saastad at usit.uio.no Fri Nov 7 00:01:23 2008 From: o.w.saastad at usit.uio.no (Ole Widar Saastad) Date: Fri, 07 Nov 2008 09:01:23 +0100 Subject: [ofa-general] Problems running many MPI concurrent prosesses Message-ID: <1226044883.11237.3.camel@pyren.uio.no> I have experienced problems running many MPI processes concurrently. Some of the MPI processes run fine (the first started) while the others hang or have very very slow progress. I have dual socket quad core SUN 2200 nodes and Mellanox cards. Se below. I have tried the OFED 1.2.5 stack and the OFED 1.4rc3 stack. Any suggestions about settings or increments of buffers, tokens etc is welcome. An example : Barrier benchmark : Barrier size 9 iterations 32768 [8 procs - Resolution 0.95us] 9 nodes 12186.93 us A barrier using 9 nodes should not take 12 milliseconds. One barrier normally takes 11.20 microseconds using 9 nodes. Some background information : Stack: OFED 1.4rc3 Card : InfiniBand: Mellanox Technologies: Unknown device 634a (rev a0) Best regards, Ole W. Saastad -- Ole W. Saastad, dr. scient. Scientific Computing Group, USIT, University of Oslo http://hpc.uio.no From vlad at lists.openfabrics.org Fri Nov 7 03:25:21 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Fri, 7 Nov 2008 03:25:21 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081107-0200 daily build status Message-ID: <20081107112521.5E4CBE60DEA@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From pradeeps at linux.vnet.ibm.com Fri Nov 7 08:47:03 2008 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Fri, 07 Nov 2008 08:47:03 -0800 Subject: [ofa-general] [PATCH] ipoib: null tx/rx_ring skb pointers on free In-Reply-To: <20081106164005.GS31163@sgi.com> References: <20081106012307.GP31163@sgi.com> <200811061712.50605.jackm@dev.mellanox.co.il> <20081106164005.GS31163@sgi.com> Message-ID: <49147107.2090600@linux.vnet.ibm.com> akepner at sgi.com wrote: > On Thu, Nov 06, 2008 at 05:12:50PM +0200, Jack Morgenstein wrote: >> On Thursday 06 November 2008 03:23, akepner at sgi.com wrote: >>> I described an IPoIB-related panic we were seeing on large >>> clusters. The signature was a backtrace like this: >>> >>> skb_over_panic >>> :ib_ipoib:ipoib_ib_handle_rx_wc >>> :ib_ipoib:ipoib_poll >>> net_rx_action >>> ..... >>> >>> The bug is difficult to reproduce, but we finally got a crashdump, >>> and the problem appears to be that stale skb pointers on the tx_ring >>> were left pointing to skbs that had been since reused, so that the >>> skb's data region was now unexpectedly short, etc. >>> >> How does ipoib_ib_handle_rx_wc() involve the tx_ring? This is >> receive processing. >> > > What I surmise may be happening is something like this: > > - tx skb is freed, but a stale pointer remains on tx_ring > - the same skb is reallocated, and added to the rx_ring > - now we get an 'unexpected' tx completion, and use the stale > skb pointer on the tx_ring to again free the skb (this step > seems to invoke a f/w bug) > - another driver, say an ethernet driver, reallocates the skb, > reducing the extent of the data region (leading to the > skb_over_panic once it's processed by ipoib_ib_handle_rx_wc) > > > This bug leaves the tx and rx rings corrupted in many ways, > including: > > - different rx_ring members refer to the same skb > - different skbs on the rx_ring have identical data, head, end, tail ptrs > - skbs on the rx_ring have sizes inconsistent with what the ipoib > driver allocates (which causes the skb_over_panic, of course) > - rx skbs have 'dev' pointers to ethernet devices > - dma mappings in rx_ring aren't consistent with what's in skb > - some skbs are simultaneously on the tx and rx rings If I am not mistaken we saw a problem that showed similar characteristics more than two years ago on IBM platforms. The same issue of rx_ring reusing tx_ring skbs and so on and would show up only under stress. This was with UD mode (before CM came into the picture) and it turned out to be a driver issue. Could that be the same here? Pradeep From fenkes at de.ibm.com Fri Nov 7 08:42:51 2008 From: fenkes at de.ibm.com (Joachim Fenkes) Date: Fri, 7 Nov 2008 17:42:51 +0100 Subject: [ofa-general] [PATCH] IB/ehca: Fix suppression of port activation events In-Reply-To: <48499C11.7030504@gmail.com> References: <200806061835.43802.fenkes@de.ibm.com> <48499C11.7030504@gmail.com> Message-ID: <200811071742.51867.fenkes@de.ibm.com> A previous fix introduced a regression where port activation events were dropped unconditionally if port autodetection was not enabled. Fixed. Signed-off-by: Joachim Fenkes --- Roland -- this patch is made against your for-linus branch. Please review and apply if you think it's okay. Hope it's not too late for the next kernel. Joachim drivers/infiniband/hw/ehca/ehca_irq.c | 45 +++++++++++++++++++------------- 1 files changed, 27 insertions(+), 18 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index 9e43459..757035e 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -359,34 +359,43 @@ static void notify_port_conf_change(struct ehca_shca *shca, int port_num) *old_attr = new_attr; } +/* replay modify_qp for sqps -- return 0 if all is well, 1 if AQP1 destroyed */ +static int replay_modify_qp(struct ehca_sport *sport) +{ + int aqp1_destroyed; + unsigned long flags; + + spin_lock_irqsave(&sport->mod_sqp_lock, flags); + + aqp1_destroyed = !sport->ibqp_sqp[IB_QPT_GSI]; + + if (sport->ibqp_sqp[IB_QPT_SMI]) + ehca_recover_sqp(sport->ibqp_sqp[IB_QPT_SMI]); + if (!aqp1_destroyed) + ehca_recover_sqp(sport->ibqp_sqp[IB_QPT_GSI]); + + spin_unlock_irqrestore(&sport->mod_sqp_lock, flags); + + return aqp1_destroyed; +} + static void parse_ec(struct ehca_shca *shca, u64 eqe) { u8 ec = EHCA_BMASK_GET(NEQE_EVENT_CODE, eqe); u8 port = EHCA_BMASK_GET(NEQE_PORT_NUMBER, eqe); u8 spec_event; struct ehca_sport *sport = &shca->sport[port - 1]; - unsigned long flags; switch (ec) { case 0x30: /* port availability change */ if (EHCA_BMASK_GET(NEQE_PORT_AVAILABILITY, eqe)) { - /* only for autodetect mode important */ - if (ehca_nr_ports >= 0) - break; - - int suppress_event; - /* replay modify_qp for sqps */ - spin_lock_irqsave(&sport->mod_sqp_lock, flags); - suppress_event = !sport->ibqp_sqp[IB_QPT_GSI]; - if (sport->ibqp_sqp[IB_QPT_SMI]) - ehca_recover_sqp(sport->ibqp_sqp[IB_QPT_SMI]); - if (!suppress_event) - ehca_recover_sqp(sport->ibqp_sqp[IB_QPT_GSI]); - spin_unlock_irqrestore(&sport->mod_sqp_lock, flags); - - /* AQP1 was destroyed, ignore this event */ - if (suppress_event) - break; + /* only replay modify_qp calls in autodetect mode; + * if AQP1 was destroyed, the port is already down + * again and we can drop the event. + */ + if (ehca_nr_ports < 0) + if (replay_modify_qp(sport)) + break; sport->port_state = IB_PORT_ACTIVE; dispatch_port_event(shca, port, IB_EVENT_PORT_ACTIVE, -- 1.5.5 From vladsk at gmail.com Fri Nov 7 12:21:13 2008 From: vladsk at gmail.com (Vladimir Sokolovsky) Date: Fri, 07 Nov 2008 22:21:13 +0200 Subject: ***SPAM*** Re: [rds-devel] [ofa-general] Re: [PATCH] opensm: fix iser service-id used for SL assignment In-Reply-To: <4913646D.6080308@oracle.com> References: <49133E50.7090508@oracle.com> <15ddcffd0811061325m2dcad0e9i789fa873d76aaa7f@mail.gmail.com> <4913646D.6080308@oracle.com> Message-ID: <4914A339.2080303@gmail.com> Andy Grover wrote: > Hi Vlad, please pull the below trees. They remove the unused RDS TCP > transport in 1.3 and 1.4. I've verified they do not break the build: > > OFED 1.3: > > Andy Grover (1): > RDS: Remove TCP transport > > www.openfabrics.org:/pub/scm/~agrover/ofed_1_3/linux-2.6.git > code-drop/20081106 > > OFED 1.4: > > Andy Grover (2): > RDS: Remove TCP transport > RDS: Change listen port back to 18634 > > www.openfabrics.org:/pub/scm/~agrover/ofed_1_4/linux-2.6.git > code-drop/20081106 > > Thanks -- Andy > Done, Regards, Vladimir From arlin.r.davis at intel.com Fri Nov 7 15:34:04 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Fri, 7 Nov 2008 15:34:04 -0800 Subject: [ofa-general] [ANNOUNCE] compat-dapl-1.2.12 and dapl-2.0.15 Release Message-ID: New DAPL releases now available from OFA download page: http://www.openfabrics.org/downloads/dapl/ md5sum: 098c3efdf812f291449de0253c35d2b9 compat-dapl-1.2.12.tar.gz md5sum: 8bcf281049f7ff282202639d4bc523f8 dapl-2.0.15.tar.gz Summary of changes since last release: v1,v2 - allow override of /etc/dat.conf via syscondir option v1,v2 - fix dapltest transaction test to avoid cleanup before rdma complete v1 - add ipath, ehca socket cm provider entries for v1.2, sync with v2.0 Vlad, please pick up new packages and install following for OFED 1.4 rc4: compat-dapl-1.2.12-1 compat-dapl-devel-1.2.12-1 dapl-2.0.15-1 dapl-utils-2.0.15-1 dapl-devel-2.0.15-1 dapl-debuginfo-2.0.15-1 Thanks, -arlin From frederic.ciesielski at hp.com Sat Nov 8 00:13:58 2008 From: frederic.ciesielski at hp.com (Ciesielski, Frederic (EMEA HPC&OSLO CC)) Date: Sat, 8 Nov 2008 08:13:58 +0000 Subject: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ? Message-ID: <7391130E01ED404FBD7A3C86731EEB7D20EC0F8737@GVW1087EXB.americas.hpqcorp.net> Is there any chance that the new NFS-RDMA features coming with OFED 1.4 work with standard and current distributions, like RHEL5, SLES10 ? Did anybody test this, or would pretend it is supposed to work ? I mean without building a 2.6.27 or equivalent kernel on top of it, keeping almost full support from the vendors. Enhanced kernel modules may not be sufficient to work around the limitations of old kernels... -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at lists.openfabrics.org Sat Nov 8 03:18:26 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sat, 8 Nov 2008 03:18:26 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081108-0200 daily build status Message-ID: <20081108111826.F236AE60B1D@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From Jeffrey.C.Becker at nasa.gov Sat Nov 8 13:35:20 2008 From: Jeffrey.C.Becker at nasa.gov (Jeff Becker) Date: Sat, 08 Nov 2008 13:35:20 -0800 Subject: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ? In-Reply-To: <7391130E01ED404FBD7A3C86731EEB7D20EC0F8737@GVW1087EXB.americas.hpqcorp.net> References: <7391130E01ED404FBD7A3C86731EEB7D20EC0F8737@GVW1087EXB.americas.hpqcorp.net> Message-ID: <49160618.3050409@nasa.gov> Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote: > Is there any chance that the new NFS-RDMA features coming with OFED > 1.4 work with standard and current distributions, like RHEL5, SLES10 ? Not yet, but I'm working on it. I intend for NFSRDMA to work on 2.6.27 and 2.6.26 for OFED 1.4. The RHEL5 and SLES10 backports will likely be done for OFED 1.4.1. Thanks. -jeff > Did anybody test this, or would pretend it is supposed to work ? > > I mean without building a 2.6.27 or equivalent kernel on top of it, > keeping almost full support from the vendors. > > Enhanced kernel modules may not be sufficient to work around the > limitations of old kernels... > > > > ------------------------------------------------------------------------ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From ogerlitz at voltaire.com Sun Nov 9 01:53:28 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 09 Nov 2008 11:53:28 +0200 Subject: [ofa-general] [PATCH] ipoib: null tx/rx_ring skb pointers on free In-Reply-To: <49147107.2090600@linux.vnet.ibm.com> References: <20081106012307.GP31163@sgi.com> <200811061712.50605.jackm@dev.mellanox.co.il> <20081106164005.GS31163@sgi.com> <49147107.2090600@linux.vnet.ibm.com> Message-ID: <4916B318.50503@voltaire.com> Pradeep Satyanarayana wrote: > If I am not mistaken we saw a problem that showed similar characteristics more than two years ago on IBM platforms. The same issue of rx_ring reusing tx_ring skbs and so on and would show up only under stress. This was with UD mode (before CM came into the picture) and it turned out to be a driver issue. Can you send pointer to the relevant thread / commit that solved this issue? Or. From vlad at lists.openfabrics.org Sun Nov 9 03:23:18 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sun, 9 Nov 2008 03:23:18 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081109-0200 daily build status Message-ID: <20081109112318.9F6EFE60E7A@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From sashak at voltaire.com Sun Nov 9 05:56:46 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 9 Nov 2008 15:56:46 +0200 Subject: [ofa-general] Re: [PATCH] limit log records number and size In-Reply-To: <4912FEA3.3090409@Voltaire.COM> References: <4912FEA3.3090409@Voltaire.COM> Message-ID: <20081109135646.GE29807@sashak.voltaire.com> Hi Doron, On 16:26 Thu 06 Nov , Doron Shoham wrote: > limit log records number and size > > Signed-off-by: Doron Shoham > --- > opensm/scripts/opensm.logrotate | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/opensm/scripts/opensm.logrotate b/opensm/scripts/opensm.logrotate > index e16e227..e0f4125 100644 > --- a/opensm/scripts/opensm.logrotate > +++ b/opensm/scripts/opensm.logrotate > @@ -4,4 +4,6 @@ > copytruncate > weekly > compress > + rotate 10 > + size 100M Why it should be limited this (and not another) way? Is not it better to follow the default site policy? Sasha From sashak at voltaire.com Sun Nov 9 09:25:18 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 9 Nov 2008 19:25:18 +0200 Subject: [ofa-general] Re: [opensm patch] support dump_conf command in opensm console In-Reply-To: <1225759191.7307.9.camel@cardanus.llnl.gov> References: <1225759191.7307.9.camel@cardanus.llnl.gov> Message-ID: <20081109172518.GG30588@sashak.voltaire.com> Hi Al, On 16:39 Mon 03 Nov , Al Chu wrote: > Hey Sasha, > > When config files are rescanned and loaded, there's no way to know if > the right configuration was actually reloaded or not. A console command > to dump the current config is a useful way to verify the loading of new > configs or not. > > This patch assumes the fixes from my "fix qos config parsing bugs" is > accepted. Didn't pass over it, sorry about delay. > > Al > > -- > Albert Chu > chu11 at llnl.gov > Computer Scientist > High Performance Systems Division > Lawrence Livermore National Laboratory > From 249607e47ec7ef1b92f9578cece90460418d12b8 Mon Sep 17 00:00:00 2001 > From: Albert Chu > Date: Mon, 3 Nov 2008 16:22:29 -0800 > Subject: [PATCH] support dump_conf console command > > > Signed-off-by: Albert Chu > --- > opensm/opensm/osm_console.c | 158 +++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 158 insertions(+), 0 deletions(-) > > diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c > index d9bbbc2..8422655 100644 > --- a/opensm/opensm/osm_console.c > +++ b/opensm/opensm/osm_console.c > @@ -53,6 +53,10 @@ > #include > #include > > +#define NULL_STR "(null)" > + > +#define BOOLEAN_STR(__b) ((__b) ? "TRUE" : "FALSE") > + > struct command { > char *name; > void (*help_function) (FILE * out, int detail); > @@ -189,6 +193,14 @@ static void help_lidbalance(FILE * out, int detail) > } > } > > +static void help_dump_conf(FILE *out, int detail) > +{ > + fprintf(out, "dump_conf\n"); > + if (detail) { > + fprintf(out, "dump current opensm configuration\n"); > + } > +} > + > #ifdef ENABLE_OSM_PERF_MGR > static void help_perfmgr(FILE * out, int detail) > { > @@ -1136,6 +1148,151 @@ static void perfmgr_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) > } > #endif /* ENABLE_OSM_PERF_MGR */ > > +static void dump_qos_options(osm_qos_options_t * opt, > + osm_qos_options_t * dflt, > + char *prefix, > + FILE * out) > +{ > + fprintf(out, "%s_max_vls : %u\n", > + prefix, opt->max_vls ? opt->max_vls : dflt->max_vls); > + fprintf(out, "%s_high_limit : %u\n", > + prefix, opt->high_limit >= 0 ? (unsigned)opt->high_limit : (unsigned)dflt->high_limit); > + fprintf(out, "%s_vlarb_high : %s\n", > + prefix, opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high); > + fprintf(out, "%s_vlarb_low : %s\n", > + prefix, opt->vlarb_low ? opt->vlarb_low : dflt->vlarb_low); > + fprintf(out, "%s_sl2vl : %s\n", > + prefix, opt->sl2vl ? opt->sl2vl : dflt->sl2vl); > +} > + > +static void dump_conf_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) > +{ Why to not use osm_subn_write_conf_file() function (wrapped by dump_conf_parse())? I think we need to have config dumping code consolidated. Sasha > + osm_subn_opt_t * opt = &p_osm->subn.opt; > + > + fprintf(out, "config_file : %s\n", > + opt->config_file ? opt->config_file : NULL_STR); > + fprintf(out, "guid : 0x%016" PRIx64 "\n", opt->guid); > + fprintf(out, "m_key : 0x%016" PRIx64 "\n", opt->m_key); > + fprintf(out, "sm_key : 0x%016" PRIx64 "\n", opt->sm_key); > + fprintf(out, "sa_key : 0x%016" PRIx64 "\n", opt->sa_key); > + fprintf(out, "subnet_prefix : 0x%016" PRIx64 "\n", opt->subnet_prefix); > + fprintf(out, "m_key_lease_period : %u\n", opt->m_key_lease_period); > + fprintf(out, "sweep_interval : %u\n", opt->sweep_interval); > + fprintf(out, "max_wire_smps : %u\n", opt->max_wire_smps); > + fprintf(out, "transaction_timeout : %u\n", opt->transaction_timeout); > + fprintf(out, "sm_priority : %u\n", opt->sm_priority); > + fprintf(out, "lmc : %u\n", opt->lmc); > + fprintf(out, "lmc_esp0 : %s\n", > + BOOLEAN_STR(opt->lmc_esp0)); > + fprintf(out, "max_op_vls : %u\n", opt->max_op_vls); > + fprintf(out, "force_link_speed : %u\n", opt->force_link_speed); > + fprintf(out, "reassign_lids : %s\n", > + BOOLEAN_STR(opt->reassign_lids)); > + fprintf(out, "ignore_other_sm : %s\n", > + BOOLEAN_STR(opt->ignore_other_sm)); > + fprintf(out, "single_thread : %s\n", > + BOOLEAN_STR(opt->single_thread)); > + fprintf(out, "disable_multicast : %s\n", > + BOOLEAN_STR(opt->disable_multicast)); > + fprintf(out, "force_log_flush : %s\n", > + BOOLEAN_STR(opt->force_log_flush)); > + fprintf(out, "subnet_timeout : %u\n", opt->subnet_timeout); > + fprintf(out, "packet_life_time : %u\n", opt->packet_life_time); > + fprintf(out, "vl_stall_count : %u\n", opt->vl_stall_count); > + fprintf(out, "leaf_vl_stall_count : %u\n", opt->leaf_vl_stall_count); > + fprintf(out, "head_of_queue_lifetime : %u\n", opt->head_of_queue_lifetime); > + fprintf(out, "leaf_head_of_queue_lifetime : %u\n", opt->leaf_head_of_queue_lifetime); > + fprintf(out, "local_phy_errors_threshold : %u\n", opt->local_phy_errors_threshold); > + fprintf(out, "overrun_errors_threshold : %u\n", opt->overrun_errors_threshold); > + fprintf(out, "sminfo_polling_timeout : %u\n", opt->sminfo_polling_timeout); > + fprintf(out, "polling_retry_number : %u\n", opt->polling_retry_number); > + fprintf(out, "max_msg_fifo_timeout : %u\n", opt->max_msg_fifo_timeout); > + fprintf(out, "force_heavy_sweep : %s\n", > + BOOLEAN_STR(opt->force_heavy_sweep)); > + fprintf(out, "log_flags : 0x%02x\n", opt->log_flags); > + fprintf(out, "dump_files_dir : %s\n", > + opt->dump_files_dir ? opt->dump_files_dir : NULL_STR); > + fprintf(out, "log_file : %s\n", > + opt->log_file ? opt->log_file : NULL_STR); > + fprintf(out, "log_max_size : %lu\n", opt->log_max_size); > + fprintf(out, "partition_config_file : %s\n", > + opt->partition_config_file ? opt->partition_config_file : NULL_STR); > + fprintf(out, "no_partition_enforcement : %s\n", > + BOOLEAN_STR(opt->no_partition_enforcement)); > + fprintf(out, "qos : %s\n", > + BOOLEAN_STR(opt->qos)); > + fprintf(out, "qos_policy_file : %s\n", > + opt->qos_policy_file ? opt->qos_policy_file : NULL_STR); > + fprintf(out, "accum_log_file: %s\n", > + BOOLEAN_STR(opt->accum_log_file)); > + fprintf(out, "console : %s\n", > + opt->console ? opt->console : NULL_STR); > + fprintf(out, "console_port : %u\n", opt->console_port); > + fprintf(out, "port_prof_ignore_file : %s\n", > + opt->port_prof_ignore_file ? opt->port_prof_ignore_file : NULL_STR); > + fprintf(out, "port_profile_switch_nodes : %s\n", > + BOOLEAN_STR(opt->port_profile_switch_nodes)); > + fprintf(out, "sweep_on_trap : %s\n", > + BOOLEAN_STR(opt->sweep_on_trap)); > + fprintf(out, "routing_engine_names : %s\n", > + opt->routing_engine_names ? opt->routing_engine_names : NULL_STR); > + fprintf(out, "use_ucast_cache : %s\n", > + BOOLEAN_STR(opt->use_ucast_cache)); > + fprintf(out, "connect_roots : %s\n", > + BOOLEAN_STR(opt->connect_roots)); > + fprintf(out, "lid_matrix_dump_file : %s\n", > + opt->lid_matrix_dump_file ? opt->lid_matrix_dump_file : NULL_STR); > + fprintf(out, "lfts_file : %s\n", > + opt->lfts_file ? opt->lfts_file : NULL_STR); > + fprintf(out, "root_guid_file : %s\n", > + opt->root_guid_file ? opt->root_guid_file : NULL_STR); > + fprintf(out, "cn_guid_file : %s\n", > + opt->cn_guid_file ? opt->cn_guid_file : NULL_STR); > + fprintf(out, "ids_guid_file : %s\n", > + opt->ids_guid_file ? opt->ids_guid_file : NULL_STR); > + fprintf(out, "guid_routing_order_file : %s\n", > + opt->guid_routing_order_file ? opt->guid_routing_order_file : NULL_STR); > + fprintf(out, "sa_db_file : %s\n", > + opt->sa_db_file ? opt->sa_db_file : NULL_STR); > + fprintf(out, "exit_on_fatal : %s\n", > + BOOLEAN_STR(opt->exit_on_fatal)); > + fprintf(out, "honor_guid2lid_file : %s\n", > + BOOLEAN_STR(opt->honor_guid2lid_file)); > + fprintf(out, "daemon : %s\n", > + BOOLEAN_STR(opt->daemon)); > + fprintf(out, "sm_inactive : %s\n", > + BOOLEAN_STR(opt->sm_inactive)); > + fprintf(out, "babbling_port_policy : %s\n", > + BOOLEAN_STR(opt->babbling_port_policy)); > + dump_qos_options(&opt->qos_options, &opt->qos_options, "qos", out); > + dump_qos_options(&opt->qos_ca_options, &opt->qos_options, "qos_ca", out); > + dump_qos_options(&opt->qos_sw0_options, &opt->qos_options, "qos_sw0", out); > + dump_qos_options(&opt->qos_swe_options, &opt->qos_options, "qos_swe", out); > + dump_qos_options(&opt->qos_rtr_options, &opt->qos_options, "qos_rtr", out); > + fprintf(out, "enable_quirks : %s\n", > + BOOLEAN_STR(opt->enable_quirks)); > + fprintf(out, "no_clients_rereg : %s\n", > + BOOLEAN_STR(opt->no_clients_rereg)); > +#ifdef ENABLE_OSM_PERF_MGR > + fprintf(out, "perfmgr : %s\n", > + BOOLEAN_STR(opt->perfmgr)); > + fprintf(out, "perfmgr_redir : %s\n", > + BOOLEAN_STR(opt->perfmgr_redir)); > + fprintf(out, "perfmgr_sweep_time_s : %u\n", opt->perfmgr_sweep_time_s); > + fprintf(out, "perfmgr_max_outstanding_queries : %u\n", opt->perfmgr_max_outstanding_queries); > + fprintf(out, "event_db_dump_file : %s\n", > + opt->event_db_dump_file ? opt->event_db_dump_file : NULL_STR); > +#endif > + fprintf(out, "event_plugin_name : %s\n", > + opt->event_plugin_name ? opt->event_plugin_name : NULL_STR); > + fprintf(out, "node_name_map_name : %s\n", > + opt->node_name_map_name ? opt->node_name_map_name : NULL_STR); > + fprintf(out, "prefix_routes_file : %s\n", > + opt->prefix_routes_file ? opt->prefix_routes_file : NULL_STR); > + fprintf(out, "consolidate_ipv6_snm_req : %s\n", > + BOOLEAN_STR(opt->consolidate_ipv6_snm_req)); > +} > + > static void quit_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) > { > osm_console_exit(&p_osm->console, &p_osm->log); > @@ -1166,6 +1323,7 @@ static const struct command console_cmds[] = { > {"portstatus", &help_portstatus, &portstatus_parse}, > {"switchbalance", &help_switchbalance, &switchbalance_parse}, > {"lidbalance", &help_lidbalance, &lidbalance_parse}, > + {"dump_conf", &help_dump_conf, &dump_conf_parse}, > {"version", &help_version, &version_parse}, > #ifdef ENABLE_OSM_PERF_MGR > {"perfmgr", &help_perfmgr, &perfmgr_parse}, > -- > 1.5.4.5 > From sashak at voltaire.com Sun Nov 9 09:47:33 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 9 Nov 2008 19:47:33 +0200 Subject: [ofa-general] Re: [PATCH] Add check for previous versions of plugins. In-Reply-To: <20081104095812.2ff5920c.weiny2@llnl.gov> References: <20081104095812.2ff5920c.weiny2@llnl.gov> Message-ID: <20081109174733.GA30265@sashak.voltaire.com> Hi Ira, On 09:58 Tue 04 Nov , Ira Weiny wrote: > From 0db0d6667ed8baede1093a95127e2ce9c81959bd Mon Sep 17 00:00:00 2001 > From: Ira Weiny > Date: Mon, 3 Nov 2008 15:50:15 -0800 > Subject: [PATCH] Add check for previous versions of plugins. > > If old interface plugins are available to OpenSM they will cause a crash. > Check for this old version and error out gracefully. > > Signed-off-by: Ira Weiny > --- > opensm/include/opensm/osm_event_plugin.h | 1 + > opensm/opensm/osm_event_plugin.c | 10 ++++++++++ > 2 files changed, 11 insertions(+), 0 deletions(-) > > diff --git a/opensm/include/opensm/osm_event_plugin.h b/opensm/include/opensm/osm_event_plugin.h > index b2deeba..0b80b63 100644 > --- a/opensm/include/opensm/osm_event_plugin.h > +++ b/opensm/include/opensm/osm_event_plugin.h > @@ -150,6 +150,7 @@ typedef struct osm_epi_trap_event { > #define OSM_EVENT_PLUGIN_IMPL_NAME "osm_event_plugin" > #define OSM_EVENT_PLUGIN_INTERFACE_VER 2 > typedef struct osm_event_plugin { > + int interface_version; > const char *osm_version; > void *(*create) (struct osm_opensm *osm); > void (*delete) (void *plugin_data); The problem IMHO that this changes the current interface and will require to change all plugins (not just rebuild - actually rebuild will hide any interface changing issues and will not fail). What about the check like this: diff --git a/opensm/opensm/osm_event_plugin.c b/opensm/opensm/osm_event_plugin.c index c6999f5..f332a24 100644 --- a/opensm/opensm/osm_event_plugin.c +++ b/opensm/opensm/osm_event_plugin.c @@ -66,6 +66,7 @@ osm_epi_plugin_t *osm_epi_construct(osm_opensm_t *osm, char *plugin_name) { char lib_name[OSM_PATH_MAX]; + struct old_if { unsigned ver; } *old_impl; osm_epi_plugin_t *rc = NULL; if (!plugin_name || !*plugin_name) @@ -96,6 +97,17 @@ osm_epi_plugin_t *osm_epi_construct(osm_opensm_t *osm, char *plugin_name) goto Exit; } + /* be sure that not old interface plugin is used */ + old_impl = (struct old_if *) rc->impl; + if (old_impl->ver < OSM_EVENT_PLUGIN_INTERFACE_VER) { + OSM_LOG(&osm->log, OSM_LOG_ERROR, "Error loading plugin" + "\'%s\': it has the wrong interface version (%u); " + "OpenSM expected %u. Please rebuild.\n", + plugin_name, old_impl->ver, + OSM_EVENT_PLUGIN_INTERFACE_VER); + goto Exit; + } + /* Check the version to make sure this module will work with us */ if (strcmp(rc->impl->osm_version, osm->osm_version)) { OSM_LOG(&osm->log, OSM_LOG_ERROR, "Error loading plugin" Sasha From sashak at voltaire.com Sun Nov 9 10:13:16 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 9 Nov 2008 20:13:16 +0200 Subject: [ofa-general] Re: [PATCH 1/2] add default configuration files In-Reply-To: <4912B7CA.9080508@Voltaire.COM> References: <4912B719.3040907@Voltaire.COM> <4912B7CA.9080508@Voltaire.COM> Message-ID: <20081109181316.GA30682@sashak.voltaire.com> Hi Doron, On 11:24 Thu 06 Nov , Doron Shoham wrote: > add default configuration files: > opensm.conf > partitions.conf > qos-policy.conf > root-nodes.conf > > Signed-off-by: Doron Shoham > --- > opensm/scripts/opensm.conf | 331 ++++++++++++++++++++++++++++++++++++++++ Normally this file is autogenerated. And I don't see any good reason to put generated files under source control. > opensm/scripts/partitions.conf | 100 ++++++++++++ Existence of partition file changes default behavior of PM in OpenSM, so you will need to put some reasonable configuration there. OTOH you already have it in OpenSM (when using without file), so why to bother? > opensm/scripts/qos-policy.conf | 2 + > opensm/scripts/root-nodes.conf | 3 + Those are empty. > 4 files changed, 436 insertions(+), 0 deletions(-) > create mode 100644 opensm/scripts/opensm.conf > create mode 100644 opensm/scripts/partitions.conf > create mode 100644 opensm/scripts/qos-policy.conf > create mode 100644 opensm/scripts/root-nodes.conf > > diff --git a/opensm/scripts/opensm.conf b/opensm/scripts/opensm.conf > new file mode 100644 > index 0000000..89e4145 > --- /dev/null > +++ b/opensm/scripts/opensm.conf > @@ -0,0 +1,331 @@ > +# > +# DEVICE ATTRIBUTES OPTIONS > +# > +# The port GUID on which the OpenSM is running > +guid 0x0000000000000000 > + > +# M_Key value sent to all ports qualifying all Set(PortInfo) > +m_key 0x0000000000000000 > + > +# The lease period used for the M_Key on this subnet in [sec] > +m_key_lease_period 0 > + > +# SM_Key value of the SM used for SM authentication > +sm_key 0x0000000000000001 > + > +# SM_Key value to qualify rcv SA queries as 'trusted' > +sa_key 0x0000000000000001 > + > +# Note that for both values above (sm_key and sa_key) > +# OpenSM version 3.2.1 and below used the default value '1' > +# in a host byte order, it is fixed now but you may need to > +# change the values to interoperate with old OpenSM running > +# on a little endian machine. > + > +# Subnet prefix used on this subnet > +subnet_prefix 0xfe80000000000000 > + > +# The LMC value used on this subnet > +lmc 0 > + > +# lmc_esp0 determines whether LMC value used on subnet is used for > +# enhanced switch port 0. If TRUE, LMC value for subnet is used for > +# ESP0. Otherwise, LMC value for ESP0s is 0. > +lmc_esp0 FALSE > + > +# The code of maximal time a packet can live in a switch > +# The actual time is 4.096usec * 2^ > +# The value 0x14 disables this mechanism > +packet_life_time 0x12 > + > +# The number of sequential packets dropped that cause the port > +# to enter the VLStalled state. The result of setting this value to > +# zero is undefined. > +vl_stall_count 0x07 > + > +# The number of sequential packets dropped that cause the port > +# to enter the VLStalled state. This value is for switch ports > +# driving a CA or router port. The result of setting this value > +# to zero is undefined. > +leaf_vl_stall_count 0x07 > + > +# The code of maximal time a packet can wait at the head of > +# transmission queue. > +# The actual time is 4.096usec * 2^ > +# The value 0x14 disables this mechanism > +head_of_queue_lifetime 0x12 > + > +# The maximal time a packet can wait at the head of queue on > +# switch port connected to a CA or router port > +leaf_head_of_queue_lifetime 0x10 > + > +# Limit the maximal operational VLs > +max_op_vls 5 > + > +# Force PortInfo:LinkSpeedEnabled on switch ports > +# If 0, don't modify PortInfo:LinkSpeedEnabled on switch port > +# Otherwise, use value for PortInfo:LinkSpeedEnabled on switch port > +# Values are (IB Spec 1.2.1, 14.2.5.6 Table 146 "PortInfo") > +# 1: 2.5 Gbps > +# 3: 2.5 or 5.0 Gbps > +# 5: 2.5 or 10.0 Gbps > +# 7: 2.5 or 5.0 or 10.0 Gbps > +# 2,4,6,8-14 Reserved > +# Default 15: set to PortInfo:LinkSpeedSupported > +force_link_speed 15 > + > +# The subnet_timeout code that will be set for all the ports > +# The actual timeout is 4.096usec * 2^ > +subnet_timeout 18 > + > +# Threshold of local phy errors for sending Trap 129 > +local_phy_errors_threshold 0x08 > + > +# Threshold of credit overrun errors for sending Trap 130 > +overrun_errors_threshold 0x08 > + > +# > +# PARTITIONING OPTIONS > +# > +# Partition configuration file to be used > +partition_config_file /etc/opensm/partitions.conf > + > +# Disable partition enforcement by switches > +no_partition_enforcement FALSE > + > +# > +# SWEEP OPTIONS > +# > +# The number of seconds between subnet sweeps (0 disables it) > +sweep_interval 10 > + > +# If TRUE cause all lids to be reassigned > +reassign_lids FALSE > + > +# If TRUE forces every sweep to be a heavy sweep > +force_heavy_sweep FALSE > + > +# If TRUE every trap will cause a heavy sweep. > +# NOTE: successive identical traps (>10) are suppressed > +sweep_on_trap TRUE > + > +# > +# ROUTING OPTIONS > +# > +# If TRUE count switches as link subscriptions > +port_profile_switch_nodes FALSE > + > +# Name of file with port guids to be ignored by port profiling > +port_prof_ignore_file (null) > + > +# Routing engine > +# Multiple routing engines can be specified separated by > +# commas so that specific ordering of routing algorithms will > +# be tried if earlier routing engines fail. > +# Supported engines: minhop, updn, file, ftree, lash, dor > +routing_engine minhop > + > +# Connect roots (use FALSE if unsure) > +connect_roots FALSE > + > +# Use unicast routing cache (use FALSE if unsure) > +use_ucast_cache FALSE > + > +# Lid matrix dump file name > +lid_matrix_dump_file (null) > + > +# LFTs file name > +lfts_file (null) > + > +# The file holding the root node guids (for fat-tree or Up/Down) > +# One guid in each line > +root_guid_file (null) > + > +# The file holding the fat-tree compute node guids > +# One guid in each line > +cn_guid_file (null) > + > +# The file holding the node ids which will be used by Up/Down algorithm instead > +# of GUIDs (one guid and id in each line) > +ids_guid_file (null) > + > +# The file holding guid routing order guids (for MinHop and Up/Down) > +guid_routing_order_file (null) > + > +# SA database file name > +sa_db_file (null) > + > +# > +# HANDOVER - MULTIPLE SMs OPTIONS > +# > +# SM priority used for deciding who is the master > +# Range goes from 0 (lowest priority) to 15 (highest). > +sm_priority 14 SM priority value 14 doesn't look as a good idea for a default value (we are not starting "priority wars" with other SMs :)). Sasha > + > +# If TRUE other SMs on the subnet should be ignored > +ignore_other_sm FALSE > + > +# Timeout in [msec] between two polls of active master SM > +sminfo_polling_timeout 10000 > + > +# Number of failing polls of remote SM that declares it dead > +polling_retry_number 4 > + > +# If TRUE honor the guid2lid file when coming out of standby > +# state, if such file exists and is valid > +honor_guid2lid_file FALSE > + > +# > +# TIMING AND THREADING OPTIONS > +# > +# Maximum number of SMPs sent in parallel > +max_wire_smps 4 > + > +# The maximum time in [msec] allowed for a transaction to complete > +transaction_timeout 200 > + > +# Maximal time in [msec] a message can stay in the incoming message queue. > +# If there is more than one message in the queue and the last message > +# stayed in the queue more than this value, any SA request will be > +# immediately returned with a BUSY status. > +max_msg_fifo_timeout 10000 > + > +# Use a single thread for handling SA queries > +single_thread FALSE > + > +# > +# MISC OPTIONS > +# > +# Daemon mode > +daemon FALSE > + > +# SM Inactive > +sm_inactive FALSE > + > +# Babbling Port Policy > +babbling_port_policy FALSE > + > +# > +# Performance Manager Options > +# > +# perfmgr enable > +perfmgr FALSE > + > +# perfmgr redirection enable > +perfmgr_redir TRUE > + > +# sweep time in seconds > +perfmgr_sweep_time_s 180 > + > +# Max outstanding queries > +perfmgr_max_outstanding_queries 500 > + > +# > +# Event DB Options > +# > +# Dump file to dump the events to > +event_db_dump_file (null) > + > +# > +# Event Plugin Options > +# > +event_plugin_name (null) > + > +# > +# Node name map for mapping node's to more descriptive node descriptions > +# (man ibnetdiscover for more information) > +# > +node_name_map_name (null) > + > +# > +# DEBUG FEATURES > +# > +# The log flags used > +log_flags 0x03 > + > +# Force flush of the log file after each log message > +force_log_flush FALSE > + > +# Log file to be used > +log_file /var/log/opensm.log > + > +# Limit the size(MB) of the log file. If overrun, log is restarted > +log_max_size 4096 > + > +# If TRUE will accumulate the log over multiple OpenSM sessions > +accum_log_file TRUE > + > +# The directory to hold the file OpenSM dumps > +dump_files_dir /var/log/ > + > +# If TRUE enables new high risk options and hardware specific quirks > +enable_quirks FALSE > + > +# If TRUE disables client reregistration > +no_clients_rereg FALSE > + > +# If TRUE OpenSM should disable multicast support and > +# no multicast routing is performed if TRUE > +disable_multicast FALSE > + > +# If TRUE opensm will exit on fatal initialization issues > +exit_on_fatal TRUE > + > +# console [off|local] > +console off > + > +# Telnet port for console (default 10000) > +console_port 10000 > + > +# > +# QoS OPTIONS > +# > +# Enable QoS setup > +qos FALSE > + > +# QoS policy file to be used > +qos_policy_file /etc/opensm/qos-policy.conf > + > +# QoS default options > +qos_max_vls 15 > +qos_high_limit 0 > +qos_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 > +qos_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 > +qos_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 > + > +# QoS CA options > +qos_ca_max_vls 15 > +qos_ca_high_limit 0 > +qos_ca_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 > +qos_ca_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 > +qos_ca_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 > + > +# QoS Switch Port 0 options > +qos_sw0_max_vls 15 > +qos_sw0_high_limit 0 > +qos_sw0_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 > +qos_sw0_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 > +qos_sw0_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 > + > +# QoS Switch external ports options > +qos_swe_max_vls 15 > +qos_swe_high_limit 0 > +qos_swe_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 > +qos_swe_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 > +qos_swe_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 > + > +# QoS Router ports options > +qos_rtr_max_vls 15 > +qos_rtr_high_limit 0 > +qos_rtr_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 > +qos_rtr_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 > +qos_rtr_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 > + > +# Prefix routes file name > +prefix_routes_file /etc/opensm/prefix-routes.conf > + > +# > +# IPv6 Solicited Node Multicast (SNM) Options > +# > +consolidate_ipv6_snm_req FALSE > + > diff --git a/opensm/scripts/partitions.conf b/opensm/scripts/partitions.conf > new file mode 100644 > index 0000000..868a26a > --- /dev/null > +++ b/opensm/scripts/partitions.conf > @@ -0,0 +1,100 @@ > +# Default partition configuration file for OpenSM > +# > +# The default name of OpenSM partitions configuration file is /etc/opensm/partitions.conf. The default may be changed by using --Pconfig (-P) > +# option with OpenSM. > +# > +# The default partition will be created by OpenSM unconditionally even when partition configuration file does not exist or cannot be accessed. > +# > +# The default partition has P_Key value 0x7fff. OpenSM??s port will have full membership in default partition. All other end ports will have par??? > +# tial membership. > +# > +# File Format > +# > +# Comments: > +# > +# Line content followed after ??#?? character is comment and ignored by parser. > +# > +# General file format: > +# > +# : ; > +# > +# Partition Definition: > +# > +# [PartitionName][=PKey][,flag[=value]][,defmember=full|limited] > +# > +# PartitionName - string, will be used with logging. When omitted > +# empty string will be used. > +# PKey - P_Key value for this partition. Only low 15 bits will > +# be used. When omitted will be autogenerated. > +# flag - used to indicate IPoIB capability of this partition. > +# defmember=full|limited - specifies default membership for port guid > +# list. Default is limited. > +# > +# Currently recognized flags are: > +# > +# ipoib - indicates that this partition may be used for IPoIB, as > +# result IPoIB capable MC group will be created. > +# rate= - specifies rate for this IPoIB MC group > +# (default is 3 (10GBps)) > +# mtu= - specifies MTU for this IPoIB MC group > +# (default is 4 (2048)) > +# sl= - specifies SL for this IPoIB MC group > +# (default is 0) > +# scope= - specifies scope for this IPoIB MC group > +# (default is 2 (link local)). Multiple scope settings > +# are permitted for a partition. > +# > +# Note that values for rate, mtu, and scope should be specified as defined in the IBTA specification (for example, mtu=4 for 2048). > +# > +# PortGUIDs list: > +# > +# PortGUID - GUID of partition member EndPort. Hexadecimal > +# numbers should start from 0x, decimal numbers > +# are accepted too. > +# full or limited - indicates full or limited membership for this > +# port. When omitted (or unrecognized) limited > +# membership is assumed. > +# > +# There are two useful keywords for PortGUID definition: > +# > +# - 'ALL' means all end ports in this subnet. > +# - 'SELF' means subnet manager's port. > +# > +# Empty list means no ports in this partition. > +# > +# Notes: > +# > +# White space is permitted between delimiters ('=', ',',':',';'). > +# > +# The line can be wrapped after ':' followed after Partition Definition and between. > +# > +# PartitionName does not need to be unique, PKey does need to be unique. If PKey is repeated then those partition configurations will be merged > +# and first PartitionName will be used (see also next note). > +# > +# It is possible to split partition configuration in more than one definition, but then PKey should be explicitly specified (otherwise different > +# PKey values will be generated for those definitions). > +# > +# Examples: > +# > +# Default=0x7fff : ALL, SELF=full ; > +# > +# NewPartition , ipoib : 0x123456=full, 0x3456789034=limi, 0x2134af2306 ; > +# > +# YetAnotherOne = 0x300 : SELF=full ; > +# YetAnotherOne = 0x300 : ALL=limited ; > +# > +# ShareIO = 0x80 , defmember=full : 0x123451, 0x123452; > +# # 0x123453, 0x123454 will be limited > +# ShareIO = 0x80 : 0x123453, 0x123454, 0x123455=full; > +# # 0x123456, 0x123457 will be limited > +# ShareIO = 0x80 : defmember=limited : 0x123456, 0x123457, 0x123458=full; > +# ShareIO = 0x80 , defmember=full : 0x123459, 0x12345a; > +# ShareIO = 0x80 , defmember=full : 0x12345b, 0x12345c=limited, 0x12345d; > +# > +# > +# Note: > +# > +# The following rule is equivalent to how OpenSM used to run prior to the partition manager: > +# > + Default=0x7fff,ipoib:ALL=full; > +# > diff --git a/opensm/scripts/qos-policy.conf b/opensm/scripts/qos-policy.conf > new file mode 100644 > index 0000000..42a88c0 > --- /dev/null > +++ b/opensm/scripts/qos-policy.conf > @@ -0,0 +1,2 @@ > +# Default Quality of Service policy configuration file > +# For further details see /usr/share/doc/opensm-/QoS_management_in_OpenSM.txt > diff --git a/opensm/scripts/root-nodes.conf b/opensm/scripts/root-nodes.conf > new file mode 100644 > index 0000000..d84d732 > --- /dev/null > +++ b/opensm/scripts/root-nodes.conf > @@ -0,0 +1,3 @@ > +# Default root node GUIDs configuration file for OpenSM > +# List of GUIDs in hex, one per line > +# 0x8f10002322134567 > -- > 1.5.3.8 > > From sashak at voltaire.com Sun Nov 9 10:30:35 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 9 Nov 2008 20:30:35 +0200 Subject: [ofa-general] Re: [PATCH 2/2] install QoS_management_in_OpenSM.txt In-Reply-To: <4912BDAB.5040704@Voltaire.COM> References: <4912BCFC.8030407@Voltaire.COM> <4912BDAB.5040704@Voltaire.COM> Message-ID: <20081109183035.GB30682@sashak.voltaire.com> On 11:49 Thu 06 Nov , Doron Shoham wrote: > install QoS_management_in_OpenSM.txt via the rpm > > Signed-off-by: Doron Shoham Applied. Thanks. Sasha From vlad at lists.openfabrics.org Mon Nov 10 03:16:57 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Mon, 10 Nov 2008 03:16:57 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081110-0200 daily build status Message-ID: <20081110111657.83D12E60C87@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From kliteyn at dev.mellanox.co.il Mon Nov 10 06:19:17 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 10 Nov 2008 16:19:17 +0200 Subject: [ofa-general] [PATCH] opensm/osm_pkey.c: cosmetics in some log message Message-ID: <491842E5.6040203@dev.mellanox.co.il> Hi Sasha, Just some cosmetics in a log message. Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_pkey.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/opensm/opensm/osm_pkey.c b/opensm/opensm/osm_pkey.c index 3adc8d7..e09faa8 100644 --- a/opensm/opensm/osm_pkey.c +++ b/opensm/opensm/osm_pkey.c @@ -475,7 +475,7 @@ osm_physp_has_pkey(IN osm_log_t * p_log, OSM_LOG_ENTER(p_log); OSM_LOG(p_log, OSM_LOG_DEBUG, - "Search for PKey: 0x%4x\n", cl_ntoh16(pkey)); + "Search for PKey: 0x%04x\n", cl_ntoh16(pkey)); /* if the pkey given is an invalid pkey - return TRUE. */ if (ib_pkey_is_invalid(pkey)) { -- 1.5.1.4 From kliteyn at dev.mellanox.co.il Mon Nov 10 06:25:09 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 10 Nov 2008 16:25:09 +0200 Subject: [ofa-general] [PATCH] opensm/ib_types.h: rename IB_MC_REC_STATE_SEND_ONLY_MEMBER Message-ID: <49184445.10007@dev.mellanox.co.il> Sasha, The multicast Send Only bit is defined in spec as "SendOnlyNonMemeber", to denote that the port is not considered a member for purposes of group creation/deletion. Renaming IB_MC_REC_STATE_SEND_ONLY_MEMBER to IB_MC_REC_STATE_SEND_ONLY_NON_MEMBER. Signed-off-by: Yevgeny Kliteynik --- opensm/include/iba/ib_types.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/opensm/include/iba/ib_types.h b/opensm/include/iba/ib_types.h index 6412ea9..0f9d110 100644 --- a/opensm/include/iba/ib_types.h +++ b/opensm/include/iba/ib_types.h @@ -7085,7 +7085,7 @@ ib_member_set_join_state(IN OUT ib_member_rec_t * p_mc_rec, */ #define IB_MC_REC_STATE_FULL_MEMBER 0x01 #define IB_MC_REC_STATE_NON_MEMBER 0x02 -#define IB_MC_REC_STATE_SEND_ONLY_MEMBER 0x04 +#define IB_MC_REC_STATE_SEND_ONLY_NON_MEMBER 0x04 /* * Generic MAD notice types -- 1.5.1.4 From kliteyn at dev.mellanox.co.il Mon Nov 10 06:36:54 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 10 Nov 2008 16:36:54 +0200 Subject: [ofa-general] [PATCH] opensm/osm_multicast.c: bug with joining/leaving mcast group Message-ID: <49184706.9070103@dev.mellanox.co.il> Hi Sasha, I think there's a bug in the osm_mgrp_add/remove_port functions. If some mcast group member has JoinState 0x1 (full member), and then new join from the same port received with JoinState 0x2 (non member), OpenSM will reduce number of full members of this group, which eventually might cause group deletion. Similar problem (only in logically opposite direction) happens when port tries to partially leave mcast group. This patch should fix it. Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_multicast.c | 33 +++++++++++---------------------- 1 files changed, 11 insertions(+), 22 deletions(-) diff --git a/opensm/opensm/osm_multicast.c b/opensm/opensm/osm_multicast.c index d62d585..350fd22 100644 --- a/opensm/opensm/osm_multicast.c +++ b/opensm/opensm/osm_multicast.c @@ -172,17 +172,11 @@ osm_mcm_port_t *osm_mgrp_add_port(IN osm_subn_t *subn, osm_log_t *log, p_mgrp->last_change_id++; } - if ((join_state ^ prev_join_state) & IB_JOIN_STATE_FULL) { - if (join_state & IB_JOIN_STATE_FULL) { - if (++p_mgrp->full_members == 1) { - mgrp_send_notice(subn, log, p_mgrp, 66); - p_mgrp->to_be_deleted = 0; - } - } else if (--p_mgrp->full_members == 0) { - mgrp_send_notice(subn, log, p_mgrp, 67); - if (!p_mgrp->well_known) - p_mgrp->to_be_deleted = 1; - } + if ((join_state & IB_JOIN_STATE_FULL) && + !(prev_join_state & IB_JOIN_STATE_FULL) && + (++p_mgrp->full_members == 1)) { + mgrp_send_notice(subn, log, p_mgrp, 66); + p_mgrp->to_be_deleted = 0; } return (p_mcm_port); @@ -224,17 +218,12 @@ int osm_mgrp_remove_port(osm_subn_t *subn, osm_log_t *log, osm_mgrp_t *mgrp, /* no more full members so the group will be deleted after re-route but only if it is not a well known group */ - if ((port_join_state ^ new_join_state) & IB_JOIN_STATE_FULL) { - if (port_join_state & IB_JOIN_STATE_FULL) { - if (--mgrp->full_members == 0) { - mgrp_send_notice(subn, log, mgrp, 67); - if (!mgrp->well_known) - mgrp->to_be_deleted = 1; - } - } else if (++mgrp->full_members == 1) { - mgrp_send_notice(subn, log, mgrp, 66); - mgrp->to_be_deleted = 0; - } + if ((port_join_state & IB_JOIN_STATE_FULL) && + !(new_join_state & IB_JOIN_STATE_FULL) && + (--mgrp->full_members == 0)) { + mgrp_send_notice(subn, log, mgrp, 67); + if (!mgrp->well_known) + mgrp->to_be_deleted = 1; } return ret; -- 1.5.1.4 From tziporet at mellanox.co.il Mon Nov 10 06:57:59 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 10 Nov 2008 16:57:59 +0200 Subject: [ofa-general] Agenda for OFED meeting today - Nov 10 Message-ID: <5D49E7A8952DC44FB38C38FA0D758EADE93B3E@mtlexch01.mtl.com> This is the agenda for OFED meeting today on OFED release status: 1. Decide on RC4 release - I suggest to do it tomorrow 2. Decide on GA release: my suggestion - RC5 in a week (Monday 17, Nov) GA - Nov 24 (we cannot delay more in that week since it will be on Thanks Giving holiday) We can try on Friday Nov 21 3. Release notes - all owners must update the release notes 4. Bugs review: 1323 blo stefan.roscher at de.ibm.com REOP IB/ehca: possibillity of kernel panic under certain circu... 1370 blo vlad at mellanox.co.il NEW Ping over IPoIB I/F fails after ifconfig down and up 1364 cri swise at opengridcomputing.com NEW system hang on rmmod cxgb3 in rhel4.7 1365 cri swise at opengridcomputing.com NEW Panic on loading iw_cxgb3 in RHEL 4.6 1366 cri swise at opengridcomputing.com NEW Panic during boot-up after an OFED install in RHEL 4.5 1242 cri yannick.cote at qlogic.com NEW kernel panic while running mpi2007 against ofed1.4 -- ib_... 1289 maj amirv at mellanox.co.il NEW Ib and ipoib doesnt respond while running multiple tests ... 1349 maj amirv at mellanox.co.il NEW Kernel panic on sdp 1336 maj vlad at mellanox.co.il NEW Can't to unloading the mlx4_ib module on ppc64 1358 maj vlad at mellanox.co.il ASSI fmr_test causes eth0 transmit timeout - should be fixed 1359 maj vlad at mellanox.co.il NEW Kernel panic while running Ltp - ongoing Tziporet & Vlad From frederic.ciesielski at hp.com Mon Nov 10 08:27:50 2008 From: frederic.ciesielski at hp.com (Ciesielski, Frederic (EMEA HPC&OSLO CC)) Date: Mon, 10 Nov 2008 16:27:50 +0000 Subject: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ? In-Reply-To: <49160618.3050409@nasa.gov> References: <7391130E01ED404FBD7A3C86731EEB7D20EC0F8737@GVW1087EXB.americas.hpqcorp.net> <49160618.3050409@nasa.gov> Message-ID: <7391130E01ED404FBD7A3C86731EEB7D20ECAB457D@GVW1087EXB.americas.hpqcorp.net> That's great, thanks. I ran some tests with the 2.6.27 kernel as server and client, and basically it works fine. I could not find yet any situation where NFS-RDMA would outperform NFS/IPoIB, at least when you compare apples to apples (same clients, same server, same protocol, and not just write to/read from the caches), and it even seems to have severe performance issues for reading with files larger than the memory size of the client and the server. Hopefully this will improve when more users will be able to give valuable feedback... Fred. -----Original Message----- From: Jeff Becker [mailto:Jeffrey.C.Becker at nasa.gov] Sent: Saturday, 08 November, 2008 22:35 To: Ciesielski, Frederic (EMEA HPC&OSLO CC) Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ? Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote: > Is there any chance that the new NFS-RDMA features coming with OFED > 1.4 work with standard and current distributions, like RHEL5, SLES10 ? Not yet, but I'm working on it. I intend for NFSRDMA to work on 2.6.27 and 2.6.26 for OFED 1.4. The RHEL5 and SLES10 backports will likely be done for OFED 1.4.1. Thanks. -jeff > Did anybody test this, or would pretend it is supposed to work ? > > I mean without building a 2.6.27 or equivalent kernel on top of it, > keeping almost full support from the vendors. > > Enhanced kernel modules may not be sufficient to work around the > limitations of old kernels... > > > > ---------------------------------------------------------------------- > -- > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From tom at opengridcomputing.com Mon Nov 10 09:07:14 2008 From: tom at opengridcomputing.com (Tom Tucker) Date: Mon, 10 Nov 2008 11:07:14 -0600 Subject: [Fwd: RE: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ?] In-Reply-To: <491867E1.4000101@nasa.gov> References: <491867E1.4000101@nasa.gov> Message-ID: <49186A42.8040303@opengridcomputing.com> Jeff: Unfortunately, the NFSRDMA transport cannot make your disks go faster. If the storage subsystem is incapable of keeping up with IPoIB, then it won't be able to keep up with NFSRDMA either. To compare NFSRDMA and IPoIB performance absent a very fast storage subsystem you'll need to keep the file sizes small enough such that they fit within the server cache. Tom Jeff Becker wrote: > Hi. Just passing this on in case you missed it. Do you have any advice > on what knobs to tweak to get better performance (than NFS/IPoIB)? Thanks. > > -jeff > > -------- Original Message -------- > Subject: RE: [ofa-general] NFS-RDMA (OFED1.4) with standard > distributions ? > Date: Mon, 10 Nov 2008 16:27:50 +0000 > From: Ciesielski, Frederic (EMEA HPC&OSLO CC) > To: Jeff Becker > CC: general at lists.openfabrics.org > References: > <7391130E01ED404FBD7A3C86731EEB7D20EC0F8737 at GVW1087EXB.americas.hpqcorp.net> > <49160618.3050409 at nasa.gov> > > > > That's great, thanks. > > I ran some tests with the 2.6.27 kernel as server and client, and basically it works fine. > > I could not find yet any situation where NFS-RDMA would outperform NFS/IPoIB, at least when you compare apples to apples (same clients, same server, same protocol, and not just write to/read from the caches), and it even seems to have severe performance issues for reading with files larger than the memory size of the client and the server. > Hopefully this will improve when more users will be able to give valuable feedback... > > Fred. > > -----Original Message----- > From: Jeff Becker [mailto:Jeffrey.C.Becker at nasa.gov] > Sent: Saturday, 08 November, 2008 22:35 > To: Ciesielski, Frederic (EMEA HPC&OSLO CC) > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ? > > Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote: >> Is there any chance that the new NFS-RDMA features coming with OFED >> 1.4 work with standard and current distributions, like RHEL5, SLES10 ? > Not yet, but I'm working on it. I intend for NFSRDMA to work on 2.6.27 and 2.6.26 for OFED 1.4. The RHEL5 and SLES10 backports will likely be done for OFED 1.4.1. Thanks. > > -jeff > >> Did anybody test this, or would pretend it is supposed to work ? >> >> I mean without building a 2.6.27 or equivalent kernel on top of it, >> keeping almost full support from the vendors. >> >> Enhanced kernel modules may not be sufficient to work around the >> limitations of old kernels... >> >> >> >> ---------------------------------------------------------------------- >> -- >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Mon Nov 10 09:42:53 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 10 Nov 2008 11:42:53 -0600 Subject: [ofa-general] Re: [ewg] Agenda for OFED meeting today - Nov 10 In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EADE93B3E@mtlexch01.mtl.com> References: <5D49E7A8952DC44FB38C38FA0D758EADE93B3E@mtlexch01.mtl.com> Message-ID: <4918729D.7090906@opengridcomputing.com> > 1364 cri swise at opengridcomputing.com NEW system > hang on rmmod cxgb3 in rhel4.7 > 1365 cri swise at opengridcomputing.com NEW Panic on > loading iw_cxgb3 in RHEL 4.6 > 1366 cri swise at opengridcomputing.com NEW Panic > during boot-up after an OFED install in RHEL 4.5 > Sorry I missed the call (yet again). 1364 is under investigation, should have a fix today. 1365 closed. Didn't see the problem in latest daily build 1366 will need a fix and hopefully I'll have something today/tomorrow. This isn't related to just RH4.5, but rather to new chelsio boards that aren't supported in ofed-1.4. These can all wait for -rc5 if you don't want to hold up rc4. Thanx, Steve. From chu11 at llnl.gov Mon Nov 10 09:42:42 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 10 Nov 2008 09:42:42 -0800 Subject: [ofa-general] Re: [opensm patch] support dump_conf command in opensm console In-Reply-To: <20081109172518.GG30588@sashak.voltaire.com> References: <1225759191.7307.9.camel@cardanus.llnl.gov> <20081109172518.GG30588@sashak.voltaire.com> Message-ID: <1226338962.13603.21.camel@cardanus.llnl.gov> Hey Sasha, On Sun, 2008-11-09 at 19:25 +0200, Sasha Khapyorsky wrote: > Hi Al, > > On 16:39 Mon 03 Nov , Al Chu wrote: > > Hey Sasha, > > > > When config files are rescanned and loaded, there's no way to know if > > the right configuration was actually reloaded or not. A console command > > to dump the current config is a useful way to verify the loading of new > > configs or not. > > > > This patch assumes the fixes from my "fix qos config parsing bugs" is > > accepted. > > Didn't pass over it, sorry about delay. > > > > > Al > > > > -- > > Albert Chu > > chu11 at llnl.gov > > Computer Scientist > > High Performance Systems Division > > Lawrence Livermore National Laboratory > > > From 249607e47ec7ef1b92f9578cece90460418d12b8 Mon Sep 17 00:00:00 2001 > > From: Albert Chu > > Date: Mon, 3 Nov 2008 16:22:29 -0800 > > Subject: [PATCH] support dump_conf console command > > > > > > Signed-off-by: Albert Chu > > --- > > opensm/opensm/osm_console.c | 158 +++++++++++++++++++++++++++++++++++++++++++ > > 1 files changed, 158 insertions(+), 0 deletions(-) > > > > diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c > > index d9bbbc2..8422655 100644 > > --- a/opensm/opensm/osm_console.c > > +++ b/opensm/opensm/osm_console.c > > @@ -53,6 +53,10 @@ > > #include > > #include > > > > +#define NULL_STR "(null)" > > + > > +#define BOOLEAN_STR(__b) ((__b) ? "TRUE" : "FALSE") > > + > > struct command { > > char *name; > > void (*help_function) (FILE * out, int detail); > > @@ -189,6 +193,14 @@ static void help_lidbalance(FILE * out, int detail) > > } > > } > > > > +static void help_dump_conf(FILE *out, int detail) > > +{ > > + fprintf(out, "dump_conf\n"); > > + if (detail) { > > + fprintf(out, "dump current opensm configuration\n"); > > + } > > +} > > + > > #ifdef ENABLE_OSM_PERF_MGR > > static void help_perfmgr(FILE * out, int detail) > > { > > @@ -1136,6 +1148,151 @@ static void perfmgr_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) > > } > > #endif /* ENABLE_OSM_PERF_MGR */ > > > > +static void dump_qos_options(osm_qos_options_t * opt, > > + osm_qos_options_t * dflt, > > + char *prefix, > > + FILE * out) > > +{ > > + fprintf(out, "%s_max_vls : %u\n", > > + prefix, opt->max_vls ? opt->max_vls : dflt->max_vls); > > + fprintf(out, "%s_high_limit : %u\n", > > + prefix, opt->high_limit >= 0 ? (unsigned)opt->high_limit : (unsigned)dflt->high_limit); > > + fprintf(out, "%s_vlarb_high : %s\n", > > + prefix, opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high); > > + fprintf(out, "%s_vlarb_low : %s\n", > > + prefix, opt->vlarb_low ? opt->vlarb_low : dflt->vlarb_low); > > + fprintf(out, "%s_sl2vl : %s\n", > > + prefix, opt->sl2vl ? opt->sl2vl : dflt->sl2vl); > > +} > > + > > +static void dump_conf_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) > > +{ > > Why to not use osm_subn_write_conf_file() function (wrapped by > dump_conf_parse())? I think we need to have config dumping code > consolidated. I had thought of that, but I didn't want all of the instructions and all the extra lines of output. But I guess it's not that big of a deal in the end. I'll send a new patch. Al > Sasha > > > + osm_subn_opt_t * opt = &p_osm->subn.opt; > > + > > + fprintf(out, "config_file : %s\n", > > + opt->config_file ? opt->config_file : NULL_STR); > > + fprintf(out, "guid : 0x%016" PRIx64 "\n", opt->guid); > > + fprintf(out, "m_key : 0x%016" PRIx64 "\n", opt->m_key); > > + fprintf(out, "sm_key : 0x%016" PRIx64 "\n", opt->sm_key); > > + fprintf(out, "sa_key : 0x%016" PRIx64 "\n", opt->sa_key); > > + fprintf(out, "subnet_prefix : 0x%016" PRIx64 "\n", opt->subnet_prefix); > > + fprintf(out, "m_key_lease_period : %u\n", opt->m_key_lease_period); > > + fprintf(out, "sweep_interval : %u\n", opt->sweep_interval); > > + fprintf(out, "max_wire_smps : %u\n", opt->max_wire_smps); > > + fprintf(out, "transaction_timeout : %u\n", opt->transaction_timeout); > > + fprintf(out, "sm_priority : %u\n", opt->sm_priority); > > + fprintf(out, "lmc : %u\n", opt->lmc); > > + fprintf(out, "lmc_esp0 : %s\n", > > + BOOLEAN_STR(opt->lmc_esp0)); > > + fprintf(out, "max_op_vls : %u\n", opt->max_op_vls); > > + fprintf(out, "force_link_speed : %u\n", opt->force_link_speed); > > + fprintf(out, "reassign_lids : %s\n", > > + BOOLEAN_STR(opt->reassign_lids)); > > + fprintf(out, "ignore_other_sm : %s\n", > > + BOOLEAN_STR(opt->ignore_other_sm)); > > + fprintf(out, "single_thread : %s\n", > > + BOOLEAN_STR(opt->single_thread)); > > + fprintf(out, "disable_multicast : %s\n", > > + BOOLEAN_STR(opt->disable_multicast)); > > + fprintf(out, "force_log_flush : %s\n", > > + BOOLEAN_STR(opt->force_log_flush)); > > + fprintf(out, "subnet_timeout : %u\n", opt->subnet_timeout); > > + fprintf(out, "packet_life_time : %u\n", opt->packet_life_time); > > + fprintf(out, "vl_stall_count : %u\n", opt->vl_stall_count); > > + fprintf(out, "leaf_vl_stall_count : %u\n", opt->leaf_vl_stall_count); > > + fprintf(out, "head_of_queue_lifetime : %u\n", opt->head_of_queue_lifetime); > > + fprintf(out, "leaf_head_of_queue_lifetime : %u\n", opt->leaf_head_of_queue_lifetime); > > + fprintf(out, "local_phy_errors_threshold : %u\n", opt->local_phy_errors_threshold); > > + fprintf(out, "overrun_errors_threshold : %u\n", opt->overrun_errors_threshold); > > + fprintf(out, "sminfo_polling_timeout : %u\n", opt->sminfo_polling_timeout); > > + fprintf(out, "polling_retry_number : %u\n", opt->polling_retry_number); > > + fprintf(out, "max_msg_fifo_timeout : %u\n", opt->max_msg_fifo_timeout); > > + fprintf(out, "force_heavy_sweep : %s\n", > > + BOOLEAN_STR(opt->force_heavy_sweep)); > > + fprintf(out, "log_flags : 0x%02x\n", opt->log_flags); > > + fprintf(out, "dump_files_dir : %s\n", > > + opt->dump_files_dir ? opt->dump_files_dir : NULL_STR); > > + fprintf(out, "log_file : %s\n", > > + opt->log_file ? opt->log_file : NULL_STR); > > + fprintf(out, "log_max_size : %lu\n", opt->log_max_size); > > + fprintf(out, "partition_config_file : %s\n", > > + opt->partition_config_file ? opt->partition_config_file : NULL_STR); > > + fprintf(out, "no_partition_enforcement : %s\n", > > + BOOLEAN_STR(opt->no_partition_enforcement)); > > + fprintf(out, "qos : %s\n", > > + BOOLEAN_STR(opt->qos)); > > + fprintf(out, "qos_policy_file : %s\n", > > + opt->qos_policy_file ? opt->qos_policy_file : NULL_STR); > > + fprintf(out, "accum_log_file: %s\n", > > + BOOLEAN_STR(opt->accum_log_file)); > > + fprintf(out, "console : %s\n", > > + opt->console ? opt->console : NULL_STR); > > + fprintf(out, "console_port : %u\n", opt->console_port); > > + fprintf(out, "port_prof_ignore_file : %s\n", > > + opt->port_prof_ignore_file ? opt->port_prof_ignore_file : NULL_STR); > > + fprintf(out, "port_profile_switch_nodes : %s\n", > > + BOOLEAN_STR(opt->port_profile_switch_nodes)); > > + fprintf(out, "sweep_on_trap : %s\n", > > + BOOLEAN_STR(opt->sweep_on_trap)); > > + fprintf(out, "routing_engine_names : %s\n", > > + opt->routing_engine_names ? opt->routing_engine_names : NULL_STR); > > + fprintf(out, "use_ucast_cache : %s\n", > > + BOOLEAN_STR(opt->use_ucast_cache)); > > + fprintf(out, "connect_roots : %s\n", > > + BOOLEAN_STR(opt->connect_roots)); > > + fprintf(out, "lid_matrix_dump_file : %s\n", > > + opt->lid_matrix_dump_file ? opt->lid_matrix_dump_file : NULL_STR); > > + fprintf(out, "lfts_file : %s\n", > > + opt->lfts_file ? opt->lfts_file : NULL_STR); > > + fprintf(out, "root_guid_file : %s\n", > > + opt->root_guid_file ? opt->root_guid_file : NULL_STR); > > + fprintf(out, "cn_guid_file : %s\n", > > + opt->cn_guid_file ? opt->cn_guid_file : NULL_STR); > > + fprintf(out, "ids_guid_file : %s\n", > > + opt->ids_guid_file ? opt->ids_guid_file : NULL_STR); > > + fprintf(out, "guid_routing_order_file : %s\n", > > + opt->guid_routing_order_file ? opt->guid_routing_order_file : NULL_STR); > > + fprintf(out, "sa_db_file : %s\n", > > + opt->sa_db_file ? opt->sa_db_file : NULL_STR); > > + fprintf(out, "exit_on_fatal : %s\n", > > + BOOLEAN_STR(opt->exit_on_fatal)); > > + fprintf(out, "honor_guid2lid_file : %s\n", > > + BOOLEAN_STR(opt->honor_guid2lid_file)); > > + fprintf(out, "daemon : %s\n", > > + BOOLEAN_STR(opt->daemon)); > > + fprintf(out, "sm_inactive : %s\n", > > + BOOLEAN_STR(opt->sm_inactive)); > > + fprintf(out, "babbling_port_policy : %s\n", > > + BOOLEAN_STR(opt->babbling_port_policy)); > > + dump_qos_options(&opt->qos_options, &opt->qos_options, "qos", out); > > + dump_qos_options(&opt->qos_ca_options, &opt->qos_options, "qos_ca", out); > > + dump_qos_options(&opt->qos_sw0_options, &opt->qos_options, "qos_sw0", out); > > + dump_qos_options(&opt->qos_swe_options, &opt->qos_options, "qos_swe", out); > > + dump_qos_options(&opt->qos_rtr_options, &opt->qos_options, "qos_rtr", out); > > + fprintf(out, "enable_quirks : %s\n", > > + BOOLEAN_STR(opt->enable_quirks)); > > + fprintf(out, "no_clients_rereg : %s\n", > > + BOOLEAN_STR(opt->no_clients_rereg)); > > +#ifdef ENABLE_OSM_PERF_MGR > > + fprintf(out, "perfmgr : %s\n", > > + BOOLEAN_STR(opt->perfmgr)); > > + fprintf(out, "perfmgr_redir : %s\n", > > + BOOLEAN_STR(opt->perfmgr_redir)); > > + fprintf(out, "perfmgr_sweep_time_s : %u\n", opt->perfmgr_sweep_time_s); > > + fprintf(out, "perfmgr_max_outstanding_queries : %u\n", opt->perfmgr_max_outstanding_queries); > > + fprintf(out, "event_db_dump_file : %s\n", > > + opt->event_db_dump_file ? opt->event_db_dump_file : NULL_STR); > > +#endif > > + fprintf(out, "event_plugin_name : %s\n", > > + opt->event_plugin_name ? opt->event_plugin_name : NULL_STR); > > + fprintf(out, "node_name_map_name : %s\n", > > + opt->node_name_map_name ? opt->node_name_map_name : NULL_STR); > > + fprintf(out, "prefix_routes_file : %s\n", > > + opt->prefix_routes_file ? opt->prefix_routes_file : NULL_STR); > > + fprintf(out, "consolidate_ipv6_snm_req : %s\n", > > + BOOLEAN_STR(opt->consolidate_ipv6_snm_req)); > > +} > > + > > static void quit_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) > > { > > osm_console_exit(&p_osm->console, &p_osm->log); > > @@ -1166,6 +1323,7 @@ static const struct command console_cmds[] = { > > {"portstatus", &help_portstatus, &portstatus_parse}, > > {"switchbalance", &help_switchbalance, &switchbalance_parse}, > > {"lidbalance", &help_lidbalance, &lidbalance_parse}, > > + {"dump_conf", &help_dump_conf, &dump_conf_parse}, > > {"version", &help_version, &version_parse}, > > #ifdef ENABLE_OSM_PERF_MGR > > {"perfmgr", &help_perfmgr, &perfmgr_parse}, > > -- > > 1.5.4.5 > > > -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From chu11 at llnl.gov Mon Nov 10 09:42:42 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 10 Nov 2008 09:42:42 -0800 Subject: [ofa-general] Re: [opensm patch] support dump_conf command in opensm console In-Reply-To: <20081109172518.GG30588@sashak.voltaire.com> References: <1225759191.7307.9.camel@cardanus.llnl.gov> <20081109172518.GG30588@sashak.voltaire.com> Message-ID: <1226338962.13603.21.camel@cardanus.llnl.gov> Hey Sasha, On Sun, 2008-11-09 at 19:25 +0200, Sasha Khapyorsky wrote: > Hi Al, > > On 16:39 Mon 03 Nov , Al Chu wrote: > > Hey Sasha, > > > > When config files are rescanned and loaded, there's no way to know if > > the right configuration was actually reloaded or not. A console command > > to dump the current config is a useful way to verify the loading of new > > configs or not. > > > > This patch assumes the fixes from my "fix qos config parsing bugs" is > > accepted. > > Didn't pass over it, sorry about delay. > > > > > Al > > > > -- > > Albert Chu > > chu11 at llnl.gov > > Computer Scientist > > High Performance Systems Division > > Lawrence Livermore National Laboratory > > > From 249607e47ec7ef1b92f9578cece90460418d12b8 Mon Sep 17 00:00:00 2001 > > From: Albert Chu > > Date: Mon, 3 Nov 2008 16:22:29 -0800 > > Subject: [PATCH] support dump_conf console command > > > > > > Signed-off-by: Albert Chu > > --- > > opensm/opensm/osm_console.c | 158 +++++++++++++++++++++++++++++++++++++++++++ > > 1 files changed, 158 insertions(+), 0 deletions(-) > > > > diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c > > index d9bbbc2..8422655 100644 > > --- a/opensm/opensm/osm_console.c > > +++ b/opensm/opensm/osm_console.c > > @@ -53,6 +53,10 @@ > > #include > > #include > > > > +#define NULL_STR "(null)" > > + > > +#define BOOLEAN_STR(__b) ((__b) ? "TRUE" : "FALSE") > > + > > struct command { > > char *name; > > void (*help_function) (FILE * out, int detail); > > @@ -189,6 +193,14 @@ static void help_lidbalance(FILE * out, int detail) > > } > > } > > > > +static void help_dump_conf(FILE *out, int detail) > > +{ > > + fprintf(out, "dump_conf\n"); > > + if (detail) { > > + fprintf(out, "dump current opensm configuration\n"); > > + } > > +} > > + > > #ifdef ENABLE_OSM_PERF_MGR > > static void help_perfmgr(FILE * out, int detail) > > { > > @@ -1136,6 +1148,151 @@ static void perfmgr_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) > > } > > #endif /* ENABLE_OSM_PERF_MGR */ > > > > +static void dump_qos_options(osm_qos_options_t * opt, > > + osm_qos_options_t * dflt, > > + char *prefix, > > + FILE * out) > > +{ > > + fprintf(out, "%s_max_vls : %u\n", > > + prefix, opt->max_vls ? opt->max_vls : dflt->max_vls); > > + fprintf(out, "%s_high_limit : %u\n", > > + prefix, opt->high_limit >= 0 ? (unsigned)opt->high_limit : (unsigned)dflt->high_limit); > > + fprintf(out, "%s_vlarb_high : %s\n", > > + prefix, opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high); > > + fprintf(out, "%s_vlarb_low : %s\n", > > + prefix, opt->vlarb_low ? opt->vlarb_low : dflt->vlarb_low); > > + fprintf(out, "%s_sl2vl : %s\n", > > + prefix, opt->sl2vl ? opt->sl2vl : dflt->sl2vl); > > +} > > + > > +static void dump_conf_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) > > +{ > > Why to not use osm_subn_write_conf_file() function (wrapped by > dump_conf_parse())? I think we need to have config dumping code > consolidated. I had thought of that, but I didn't want all of the instructions and all the extra lines of output. But I guess it's not that big of a deal in the end. I'll send a new patch. Al > Sasha > > > + osm_subn_opt_t * opt = &p_osm->subn.opt; > > + > > + fprintf(out, "config_file : %s\n", > > + opt->config_file ? opt->config_file : NULL_STR); > > + fprintf(out, "guid : 0x%016" PRIx64 "\n", opt->guid); > > + fprintf(out, "m_key : 0x%016" PRIx64 "\n", opt->m_key); > > + fprintf(out, "sm_key : 0x%016" PRIx64 "\n", opt->sm_key); > > + fprintf(out, "sa_key : 0x%016" PRIx64 "\n", opt->sa_key); > > + fprintf(out, "subnet_prefix : 0x%016" PRIx64 "\n", opt->subnet_prefix); > > + fprintf(out, "m_key_lease_period : %u\n", opt->m_key_lease_period); > > + fprintf(out, "sweep_interval : %u\n", opt->sweep_interval); > > + fprintf(out, "max_wire_smps : %u\n", opt->max_wire_smps); > > + fprintf(out, "transaction_timeout : %u\n", opt->transaction_timeout); > > + fprintf(out, "sm_priority : %u\n", opt->sm_priority); > > + fprintf(out, "lmc : %u\n", opt->lmc); > > + fprintf(out, "lmc_esp0 : %s\n", > > + BOOLEAN_STR(opt->lmc_esp0)); > > + fprintf(out, "max_op_vls : %u\n", opt->max_op_vls); > > + fprintf(out, "force_link_speed : %u\n", opt->force_link_speed); > > + fprintf(out, "reassign_lids : %s\n", > > + BOOLEAN_STR(opt->reassign_lids)); > > + fprintf(out, "ignore_other_sm : %s\n", > > + BOOLEAN_STR(opt->ignore_other_sm)); > > + fprintf(out, "single_thread : %s\n", > > + BOOLEAN_STR(opt->single_thread)); > > + fprintf(out, "disable_multicast : %s\n", > > + BOOLEAN_STR(opt->disable_multicast)); > > + fprintf(out, "force_log_flush : %s\n", > > + BOOLEAN_STR(opt->force_log_flush)); > > + fprintf(out, "subnet_timeout : %u\n", opt->subnet_timeout); > > + fprintf(out, "packet_life_time : %u\n", opt->packet_life_time); > > + fprintf(out, "vl_stall_count : %u\n", opt->vl_stall_count); > > + fprintf(out, "leaf_vl_stall_count : %u\n", opt->leaf_vl_stall_count); > > + fprintf(out, "head_of_queue_lifetime : %u\n", opt->head_of_queue_lifetime); > > + fprintf(out, "leaf_head_of_queue_lifetime : %u\n", opt->leaf_head_of_queue_lifetime); > > + fprintf(out, "local_phy_errors_threshold : %u\n", opt->local_phy_errors_threshold); > > + fprintf(out, "overrun_errors_threshold : %u\n", opt->overrun_errors_threshold); > > + fprintf(out, "sminfo_polling_timeout : %u\n", opt->sminfo_polling_timeout); > > + fprintf(out, "polling_retry_number : %u\n", opt->polling_retry_number); > > + fprintf(out, "max_msg_fifo_timeout : %u\n", opt->max_msg_fifo_timeout); > > + fprintf(out, "force_heavy_sweep : %s\n", > > + BOOLEAN_STR(opt->force_heavy_sweep)); > > + fprintf(out, "log_flags : 0x%02x\n", opt->log_flags); > > + fprintf(out, "dump_files_dir : %s\n", > > + opt->dump_files_dir ? opt->dump_files_dir : NULL_STR); > > + fprintf(out, "log_file : %s\n", > > + opt->log_file ? opt->log_file : NULL_STR); > > + fprintf(out, "log_max_size : %lu\n", opt->log_max_size); > > + fprintf(out, "partition_config_file : %s\n", > > + opt->partition_config_file ? opt->partition_config_file : NULL_STR); > > + fprintf(out, "no_partition_enforcement : %s\n", > > + BOOLEAN_STR(opt->no_partition_enforcement)); > > + fprintf(out, "qos : %s\n", > > + BOOLEAN_STR(opt->qos)); > > + fprintf(out, "qos_policy_file : %s\n", > > + opt->qos_policy_file ? opt->qos_policy_file : NULL_STR); > > + fprintf(out, "accum_log_file: %s\n", > > + BOOLEAN_STR(opt->accum_log_file)); > > + fprintf(out, "console : %s\n", > > + opt->console ? opt->console : NULL_STR); > > + fprintf(out, "console_port : %u\n", opt->console_port); > > + fprintf(out, "port_prof_ignore_file : %s\n", > > + opt->port_prof_ignore_file ? opt->port_prof_ignore_file : NULL_STR); > > + fprintf(out, "port_profile_switch_nodes : %s\n", > > + BOOLEAN_STR(opt->port_profile_switch_nodes)); > > + fprintf(out, "sweep_on_trap : %s\n", > > + BOOLEAN_STR(opt->sweep_on_trap)); > > + fprintf(out, "routing_engine_names : %s\n", > > + opt->routing_engine_names ? opt->routing_engine_names : NULL_STR); > > + fprintf(out, "use_ucast_cache : %s\n", > > + BOOLEAN_STR(opt->use_ucast_cache)); > > + fprintf(out, "connect_roots : %s\n", > > + BOOLEAN_STR(opt->connect_roots)); > > + fprintf(out, "lid_matrix_dump_file : %s\n", > > + opt->lid_matrix_dump_file ? opt->lid_matrix_dump_file : NULL_STR); > > + fprintf(out, "lfts_file : %s\n", > > + opt->lfts_file ? opt->lfts_file : NULL_STR); > > + fprintf(out, "root_guid_file : %s\n", > > + opt->root_guid_file ? opt->root_guid_file : NULL_STR); > > + fprintf(out, "cn_guid_file : %s\n", > > + opt->cn_guid_file ? opt->cn_guid_file : NULL_STR); > > + fprintf(out, "ids_guid_file : %s\n", > > + opt->ids_guid_file ? opt->ids_guid_file : NULL_STR); > > + fprintf(out, "guid_routing_order_file : %s\n", > > + opt->guid_routing_order_file ? opt->guid_routing_order_file : NULL_STR); > > + fprintf(out, "sa_db_file : %s\n", > > + opt->sa_db_file ? opt->sa_db_file : NULL_STR); > > + fprintf(out, "exit_on_fatal : %s\n", > > + BOOLEAN_STR(opt->exit_on_fatal)); > > + fprintf(out, "honor_guid2lid_file : %s\n", > > + BOOLEAN_STR(opt->honor_guid2lid_file)); > > + fprintf(out, "daemon : %s\n", > > + BOOLEAN_STR(opt->daemon)); > > + fprintf(out, "sm_inactive : %s\n", > > + BOOLEAN_STR(opt->sm_inactive)); > > + fprintf(out, "babbling_port_policy : %s\n", > > + BOOLEAN_STR(opt->babbling_port_policy)); > > + dump_qos_options(&opt->qos_options, &opt->qos_options, "qos", out); > > + dump_qos_options(&opt->qos_ca_options, &opt->qos_options, "qos_ca", out); > > + dump_qos_options(&opt->qos_sw0_options, &opt->qos_options, "qos_sw0", out); > > + dump_qos_options(&opt->qos_swe_options, &opt->qos_options, "qos_swe", out); > > + dump_qos_options(&opt->qos_rtr_options, &opt->qos_options, "qos_rtr", out); > > + fprintf(out, "enable_quirks : %s\n", > > + BOOLEAN_STR(opt->enable_quirks)); > > + fprintf(out, "no_clients_rereg : %s\n", > > + BOOLEAN_STR(opt->no_clients_rereg)); > > +#ifdef ENABLE_OSM_PERF_MGR > > + fprintf(out, "perfmgr : %s\n", > > + BOOLEAN_STR(opt->perfmgr)); > > + fprintf(out, "perfmgr_redir : %s\n", > > + BOOLEAN_STR(opt->perfmgr_redir)); > > + fprintf(out, "perfmgr_sweep_time_s : %u\n", opt->perfmgr_sweep_time_s); > > + fprintf(out, "perfmgr_max_outstanding_queries : %u\n", opt->perfmgr_max_outstanding_queries); > > + fprintf(out, "event_db_dump_file : %s\n", > > + opt->event_db_dump_file ? opt->event_db_dump_file : NULL_STR); > > +#endif > > + fprintf(out, "event_plugin_name : %s\n", > > + opt->event_plugin_name ? opt->event_plugin_name : NULL_STR); > > + fprintf(out, "node_name_map_name : %s\n", > > + opt->node_name_map_name ? opt->node_name_map_name : NULL_STR); > > + fprintf(out, "prefix_routes_file : %s\n", > > + opt->prefix_routes_file ? opt->prefix_routes_file : NULL_STR); > > + fprintf(out, "consolidate_ipv6_snm_req : %s\n", > > + BOOLEAN_STR(opt->consolidate_ipv6_snm_req)); > > +} > > + > > static void quit_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) > > { > > osm_console_exit(&p_osm->console, &p_osm->log); > > @@ -1166,6 +1323,7 @@ static const struct command console_cmds[] = { > > {"portstatus", &help_portstatus, &portstatus_parse}, > > {"switchbalance", &help_switchbalance, &switchbalance_parse}, > > {"lidbalance", &help_lidbalance, &lidbalance_parse}, > > + {"dump_conf", &help_dump_conf, &dump_conf_parse}, > > {"version", &help_version, &version_parse}, > > #ifdef ENABLE_OSM_PERF_MGR > > {"perfmgr", &help_perfmgr, &perfmgr_parse}, > > -- > > 1.5.4.5 > > > -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From meier3 at llnl.gov Mon Nov 10 10:26:17 2008 From: meier3 at llnl.gov (Timothy A. Meier) Date: Mon, 10 Nov 2008 10:26:17 -0800 Subject: [ofa-general] [PATCH] opensm: osm_opensm.c added a method to remove plugins Message-ID: <49187CC9.6010600@llnl.gov> Sasha, During development, I am constantly bringing the SM up and down, so this helps make sure things shut down gracefully. Should have no impact, if people are not using plugins... yet. >From e0434e676d0b3dd63a323218d207f029da9e27a4 Mon Sep 17 00:00:00 2001 From: Tim Meier Date: Mon, 10 Nov 2008 09:48:55 -0800 Subject: [PATCH] opensm: osm_opensm.c added a method to remove plugins Upon shutdown, iterates through the plugins and releases resources and removes them via their destroy() method. Signed-off-by: Tim Meier --- opensm/opensm/osm_opensm.c | 14 ++++++++++++++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c index 7deea6d..7286782 100644 --- a/opensm/opensm/osm_opensm.c +++ b/opensm/opensm/osm_opensm.c @@ -238,6 +238,19 @@ static void destroy_routing_engines(osm_opensm_t *osm) } } +/********************************************************************** + **********************************************************************/ +static void destroy_plugins(osm_opensm_t *osm) +{ + osm_epi_plugin_t *p; + // remove from the list, and destroy it + while (!cl_is_qlist_empty(&osm->plugin_list)){ + p = (osm_epi_plugin_t *)cl_qlist_remove_head(&osm->plugin_list); + // plugin is responsible for freeing its own resources + osm_epi_destroy(p); + } +} + void osm_opensm_destroy(IN osm_opensm_t * const p_osm) { /* in case of shutdown through exit proc - no ^C */ @@ -275,6 +288,7 @@ void osm_opensm_destroy(IN osm_opensm_t * const p_osm) osm_sa_db_file_dump(p_osm); /* do the destruction in reverse order as init */ + destroy_plugins(p_osm); destroy_routing_engines(p_osm); osm_sa_destroy(&p_osm->sa); osm_sm_destroy(&p_osm->sm); -- 1.5.4.5 -- Timothy A. Meier Computer Scientist ICCD/High Performance Computing 925.422.3341 meier3 at llnl.gov -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: 0001-opensm-osm_opensm.c-added-a-method-to-remove-plugi.patch URL: From sashak at voltaire.com Mon Nov 10 11:11:08 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 10 Nov 2008 21:11:08 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_multicast.c: bug with joining/leaving mcast group In-Reply-To: <49184706.9070103@dev.mellanox.co.il> References: <49184706.9070103@dev.mellanox.co.il> Message-ID: <20081110191108.GD313@sashak.voltaire.com> Hi Yevgeny, On 16:36 Mon 10 Nov , Yevgeny Kliteynik wrote: > > I think there's a bug in the osm_mgrp_add/remove_port functions. > If some mcast group member has JoinState 0x1 (full member), > and then new join from the same port received with JoinState > 0x2 (non member), OpenSM will reduce number of full members > of this group, which eventually might cause group deletion. Right, isn't this how things should work? When full member updates it state to non member the number of full members are reduced, and then last full member leaves the MC group is deleted (o15-0.2-1.9). Sasha > Similar problem (only in logically opposite direction) happens > when port tries to partially leave mcast group. > > This patch should fix it. > > Signed-off-by: Yevgeny Kliteynik > --- > opensm/opensm/osm_multicast.c | 33 +++++++++++---------------------- > 1 files changed, 11 insertions(+), 22 deletions(-) > > diff --git a/opensm/opensm/osm_multicast.c b/opensm/opensm/osm_multicast.c > index d62d585..350fd22 100644 > --- a/opensm/opensm/osm_multicast.c > +++ b/opensm/opensm/osm_multicast.c > @@ -172,17 +172,11 @@ osm_mcm_port_t *osm_mgrp_add_port(IN osm_subn_t *subn, osm_log_t *log, > p_mgrp->last_change_id++; > } > > - if ((join_state ^ prev_join_state) & IB_JOIN_STATE_FULL) { > - if (join_state & IB_JOIN_STATE_FULL) { > - if (++p_mgrp->full_members == 1) { > - mgrp_send_notice(subn, log, p_mgrp, 66); > - p_mgrp->to_be_deleted = 0; > - } > - } else if (--p_mgrp->full_members == 0) { > - mgrp_send_notice(subn, log, p_mgrp, 67); > - if (!p_mgrp->well_known) > - p_mgrp->to_be_deleted = 1; > - } > + if ((join_state & IB_JOIN_STATE_FULL) && > + !(prev_join_state & IB_JOIN_STATE_FULL) && > + (++p_mgrp->full_members == 1)) { > + mgrp_send_notice(subn, log, p_mgrp, 66); > + p_mgrp->to_be_deleted = 0; > } > > return (p_mcm_port); > @@ -224,17 +218,12 @@ int osm_mgrp_remove_port(osm_subn_t *subn, osm_log_t *log, osm_mgrp_t *mgrp, > > /* no more full members so the group will be deleted after re-route > but only if it is not a well known group */ > - if ((port_join_state ^ new_join_state) & IB_JOIN_STATE_FULL) { > - if (port_join_state & IB_JOIN_STATE_FULL) { > - if (--mgrp->full_members == 0) { > - mgrp_send_notice(subn, log, mgrp, 67); > - if (!mgrp->well_known) > - mgrp->to_be_deleted = 1; > - } > - } else if (++mgrp->full_members == 1) { > - mgrp_send_notice(subn, log, mgrp, 66); > - mgrp->to_be_deleted = 0; > - } > + if ((port_join_state & IB_JOIN_STATE_FULL) && > + !(new_join_state & IB_JOIN_STATE_FULL) && > + (--mgrp->full_members == 0)) { > + mgrp_send_notice(subn, log, mgrp, 67); > + if (!mgrp->well_known) > + mgrp->to_be_deleted = 1; > } > > return ret; > -- > 1.5.1.4 > From sashak at voltaire.com Mon Nov 10 11:11:40 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 10 Nov 2008 21:11:40 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_pkey.c: cosmetics in some log message In-Reply-To: <491842E5.6040203@dev.mellanox.co.il> References: <491842E5.6040203@dev.mellanox.co.il> Message-ID: <20081110191140.GE313@sashak.voltaire.com> On 16:19 Mon 10 Nov , Yevgeny Kliteynik wrote: > Hi Sasha, > > Just some cosmetics in a log message. > > Signed-off-by: Yevgeny Kliteynik Applied. Thanks. Sasha From sashak at voltaire.com Mon Nov 10 11:12:01 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 10 Nov 2008 21:12:01 +0200 Subject: [ofa-general] Re: [PATCH] opensm/ib_types.h: rename IB_MC_REC_STATE_SEND_ONLY_MEMBER In-Reply-To: <49184445.10007@dev.mellanox.co.il> References: <49184445.10007@dev.mellanox.co.il> Message-ID: <20081110191201.GF313@sashak.voltaire.com> On 16:25 Mon 10 Nov , Yevgeny Kliteynik wrote: > Sasha, > > The multicast Send Only bit is defined in spec as "SendOnlyNonMemeber", > to denote that the port is not considered a member for purposes of group > creation/deletion. > > Renaming IB_MC_REC_STATE_SEND_ONLY_MEMBER to IB_MC_REC_STATE_SEND_ONLY_NON_MEMBER. > > Signed-off-by: Yevgeny Kliteynik Applied. Thanks. Sasha From kliteyn at dev.mellanox.co.il Mon Nov 10 11:18:19 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 10 Nov 2008 21:18:19 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_multicast.c: bug with joining/leaving mcast group In-Reply-To: <20081110191108.GD313@sashak.voltaire.com> References: <49184706.9070103@dev.mellanox.co.il> <20081110191108.GD313@sashak.voltaire.com> Message-ID: <491888FB.5020107@dev.mellanox.co.il> Hi Sasha, Sasha Khapyorsky wrote: > Hi Yevgeny, > > On 16:36 Mon 10 Nov , Yevgeny Kliteynik wrote: >> I think there's a bug in the osm_mgrp_add/remove_port functions. >> If some mcast group member has JoinState 0x1 (full member), >> and then new join from the same port received with JoinState >> 0x2 (non member), OpenSM will reduce number of full members >> of this group, which eventually might cause group deletion. > > Right, isn't this how things should work? When full member updates it > state to non member the number of full members are reduced, and then > last full member leaves the MC group is deleted (o15-0.2-1.9). I thought so too, but turns out that it's wrong: o15-0.1.11: If SA supports UD multicast, then if an endport joins a multicast group as specified in o15-0.1.10:, SA shall replace the endport’s current MCMemberRecord:JoinState component with the logical OR of the MCMemberRecord:JoinState component with the endport’s current MCMemberRecord:JoinState component if the endport had joined this multicast group before. So the full member doesn't update its state to non-member, but rather adds additional bit to the JoinState (the non-member). -- Yevgeny > Sasha > >> Similar problem (only in logically opposite direction) happens >> when port tries to partially leave mcast group. >> >> This patch should fix it. >> >> Signed-off-by: Yevgeny Kliteynik >> --- >> opensm/opensm/osm_multicast.c | 33 +++++++++++---------------------- >> 1 files changed, 11 insertions(+), 22 deletions(-) >> >> diff --git a/opensm/opensm/osm_multicast.c b/opensm/opensm/osm_multicast.c >> index d62d585..350fd22 100644 >> --- a/opensm/opensm/osm_multicast.c >> +++ b/opensm/opensm/osm_multicast.c >> @@ -172,17 +172,11 @@ osm_mcm_port_t *osm_mgrp_add_port(IN osm_subn_t *subn, osm_log_t *log, >> p_mgrp->last_change_id++; >> } >> >> - if ((join_state ^ prev_join_state) & IB_JOIN_STATE_FULL) { >> - if (join_state & IB_JOIN_STATE_FULL) { >> - if (++p_mgrp->full_members == 1) { >> - mgrp_send_notice(subn, log, p_mgrp, 66); >> - p_mgrp->to_be_deleted = 0; >> - } >> - } else if (--p_mgrp->full_members == 0) { >> - mgrp_send_notice(subn, log, p_mgrp, 67); >> - if (!p_mgrp->well_known) >> - p_mgrp->to_be_deleted = 1; >> - } >> + if ((join_state & IB_JOIN_STATE_FULL) && >> + !(prev_join_state & IB_JOIN_STATE_FULL) && >> + (++p_mgrp->full_members == 1)) { >> + mgrp_send_notice(subn, log, p_mgrp, 66); >> + p_mgrp->to_be_deleted = 0; >> } >> >> return (p_mcm_port); >> @@ -224,17 +218,12 @@ int osm_mgrp_remove_port(osm_subn_t *subn, osm_log_t *log, osm_mgrp_t *mgrp, >> >> /* no more full members so the group will be deleted after re-route >> but only if it is not a well known group */ >> - if ((port_join_state ^ new_join_state) & IB_JOIN_STATE_FULL) { >> - if (port_join_state & IB_JOIN_STATE_FULL) { >> - if (--mgrp->full_members == 0) { >> - mgrp_send_notice(subn, log, mgrp, 67); >> - if (!mgrp->well_known) >> - mgrp->to_be_deleted = 1; >> - } >> - } else if (++mgrp->full_members == 1) { >> - mgrp_send_notice(subn, log, mgrp, 66); >> - mgrp->to_be_deleted = 0; >> - } >> + if ((port_join_state & IB_JOIN_STATE_FULL) && >> + !(new_join_state & IB_JOIN_STATE_FULL) && >> + (--mgrp->full_members == 0)) { >> + mgrp_send_notice(subn, log, mgrp, 67); >> + if (!mgrp->well_known) >> + mgrp->to_be_deleted = 1; >> } >> >> return ret; >> -- >> 1.5.1.4 >> > From sashak at voltaire.com Mon Nov 10 11:20:26 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 10 Nov 2008 21:20:26 +0200 Subject: [ofa-general] Re: [PATCH] opensm: osm_opensm.c added a method to remove plugins In-Reply-To: <49187CC9.6010600@llnl.gov> References: <49187CC9.6010600@llnl.gov> Message-ID: <20081110192026.GH313@sashak.voltaire.com> On 10:26 Mon 10 Nov , Timothy A. Meier wrote: > Sasha, > > During development, I am constantly bringing the SM up and down, so this helps make sure things > shut down gracefully. > > Should have no impact, if people are not using plugins... yet. > > From e0434e676d0b3dd63a323218d207f029da9e27a4 Mon Sep 17 00:00:00 2001 > From: Tim Meier > Date: Mon, 10 Nov 2008 09:48:55 -0800 > Subject: [PATCH] opensm: osm_opensm.c added a method to remove plugins > > Upon shutdown, iterates through the plugins and releases > resources and removes them via their destroy() method. > > Signed-off-by: Tim Meier Applied. Thanks. Sasha From boris at mellanox.com Mon Nov 10 11:24:36 2008 From: boris at mellanox.com (Boris Shpolyansky) Date: Mon, 10 Nov 2008 11:24:36 -0800 Subject: [ofa-general] ib_mthca catastrophic error detected References: <4906645D.6010101@ucla.edu> <4907054E.9080205@mellanox.co.il><490763D0.5020002@ucla.edu><200811061154.02260.jackm@dev.mellanox.co.il> <491338D1.8050205@ucla.edu> Message-ID: <1E3DCD1C63492545881FACB6063A57C1030CF354@mtiexch01.mti.com> Scott, Do you use any form of Boot-over-IB in this cluster? If so - what version/flavor of it? Thanks, Boris Shpolyansky Sr. Member of Technical Staff Applications Mellanox Technologies Inc. 2900 Stender Way Santa Clara, CA 95054 Tel.: (408) 916 0014 Fax: (408) 970 3403 Cell: (408) 834 9365 www.mellanox.com -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Scott A. Friedman Sent: Thursday, November 06, 2008 10:35 AM To: Jack Morgenstein Cc: Matthew Finlay; general at lists.openfabrics.org Subject: Re: [ofa-general] ib_mthca catastrophic error detected Hi We have been working with Matthew Finlay on this recently - you/we might pull all of this together. We are able to make any of our sdr cards have a catastrophic error - and are unable to do the same with our ddr cards. Matt has suggested that there is a firmware fix possibly? Anyway, to answer your questions: The hosts are Sun X2200M, but we have swapped a few around with some hosts we have from Aspen systems and the problem remains. I suppose the similarity is that they are all nForce based. The MPI used was the latest OpenMPI - I will find the version, but I do not think it matters whether we are using OpenMPI or MVAPICH. The job itself does not seem to matter either. The situation is after a node comes up it takes a very long time for the card to become ACTIVE. It seems to ocsillate between ACTIVE and INIT. We have waited several minutes sometimes but can never be sure of when it will settle down. The queue certainly doesn't know and a job submitted to such a node will die as the cards will have a catastrophic error. Scott > Console output from the following linux commands: > cat /etc/*rel* Not a good idea...maybe this #cat /etc/redhat-release CentOS release 5 (Final) > cat /etc/lilo.conf , or: cat /boot/grub/menu.lst (if you are using grub) # grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file # NOTICE: You have a /boot partition. This means that # all kernel and initrd paths are relative to /boot/, eg. # root (hd0,0) # kernel /vmlinuz-version ro root=/dev/hda3 # initrd /initrd-version.img #boot=/dev/hda default=0 timeout=5 splashimage=(hd0,0)/grub/splash.xpm.gz hiddenmenu title CentOS (2.6.18-92.1.6.el5) root (hd0,0) kernel /vmlinuz-2.6.18-92.1.6.el5 ro root=LABEL=/ rhgb quiet initrd /initrd-2.6.18-92.1.6.el5.img > uname -a Linux n141 2.6.18-92.1.6.el5 #1 SMP Wed Jun 25 13:45:47 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux > cat /proc/cpuinfo > cat /proc/meminfo processor : 0 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4424.75 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 1 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4426.22 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 2 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4421.37 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 3 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4421.65 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 4 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 1 siblings : 4 core id : 0 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4422.36 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 5 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 1 siblings : 4 core id : 1 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4422.71 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 6 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 1 siblings : 4 core id : 2 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4422.17 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] processor : 7 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 2354 stepping : 3 cpu MHz : 2200.000 cache size : 512 KB physical id : 1 siblings : 4 core id : 3 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4422.17 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] MemTotal: 8182568 kB MemFree: 4535892 kB Buffers: 318232 kB Cached: 1583772 kB SwapCached: 0 kB Active: 2714400 kB Inactive: 730260 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 8182568 kB LowFree: 4535892 kB SwapTotal: 8289532 kB SwapFree: 8289380 kB Dirty: 340 kB Writeback: 0 kB AnonPages: 1542636 kB Mapped: 14588 kB Slab: 139788 kB PageTables: 7208 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 12380816 kB Committed_AS: 1679420 kB VmallocTotal: 34359738367 kB VmallocUsed: 4600 kB VmallocChunk: 34359733707 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB Jack Morgenstein wrote: > On Tuesday 28 October 2008 21:11, Scott A. Friedman wrote: >> Hi >> >> This cluster has OFED 1.2.5.4 running on it. The ib_mthca kernel module >> reports the following on startup: >> >> ib_mthca: Mellanox InfiniBand HCA driver v1.0 (February 28, 2008) >> >> The cards in all (22) of the nodes we have seen this error on are as >> follows: >> >> hca_id: mthca0 >> fw_ver: 1.2.0 >> vendor_id: 0x02c9 >> vendor_part_id: 25204 >> hw_ver: 0xA0 >> board_id: MT_03B0140001 >> phys_port_cnt: 1 >> >> It appears that when this happens the driver restarts (loads?) itself >> however the job running at the time of the error is, of course, killed. >> >> Scott > > Scott, > We are trying to reproduce this here. It would help if you could supply > the following info: > > Host model for hosts which are experiencing the failure: > > Console output from the following linux commands: > cat /etc/*rel* > cat /etc/lilo.conf , or: cat /boot/grub/menu.lst (if you are using grub) > uname -a > cat /proc/cpuinfo > cat /proc/meminfo > > Also, what sort of job was running when the failure occurred: > -- which MPI are you using? > -- do you have a test example which we can run here to reproduce the problem? > > Thanks in advance for your help! > > Jack Morgenstein > Senior Software Development Engineer > Mellanox _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From hal.rosenstock at gmail.com Mon Nov 10 11:42:12 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 10 Nov 2008 14:42:12 -0500 Subject: ***SPAM*** Re: [ofa-general] Re: [PATCH] opensm/osm_multicast.c: bug with joining/leaving mcast group In-Reply-To: <491888FB.5020107@dev.mellanox.co.il> References: <49184706.9070103@dev.mellanox.co.il> <20081110191108.GD313@sashak.voltaire.com> <491888FB.5020107@dev.mellanox.co.il> Message-ID: On Mon, Nov 10, 2008 at 2:18 PM, Yevgeny Kliteynik wrote: > Hi Sasha, > > Sasha Khapyorsky wrote: >> >> Hi Yevgeny, >> >> On 16:36 Mon 10 Nov , Yevgeny Kliteynik wrote: >>> >>> I think there's a bug in the osm_mgrp_add/remove_port functions. >>> If some mcast group member has JoinState 0x1 (full member), >>> and then new join from the same port received with JoinState >>> 0x2 (non member), OpenSM will reduce number of full members >>> of this group, which eventually might cause group deletion. >> >> Right, isn't this how things should work? When full member updates it >> state to non member the number of full members are reduced, and then >> last full member leaves the MC group is deleted (o15-0.2-1.9). > > I thought so too, It's true; what you are seeing is the addition of send only non member (to full member) and not eliminating full member. >but turns out that it's wrong: > > o15-0.1.11: If SA supports UD multicast, then if an endport joins a > multicast group as specified in o15-0.1.10:, SA shall replace the > endport's current MCMemberRecord:JoinState component with the logical > OR of the MCMemberRecord:JoinState component with the endport's current > MCMemberRecord:JoinState component if the endport had joined this > multicast group before. > > So the full member doesn't update its state to non-member, but rather > adds additional bit to the JoinState (the non-member). Right, a port can simultaneously be full member, non member, and send only non member. -- Hal > > -- Yevgeny > >> Sasha >> >>> Similar problem (only in logically opposite direction) happens >>> when port tries to partially leave mcast group. >>> >>> This patch should fix it. >>> >>> Signed-off-by: Yevgeny Kliteynik >>> --- >>> opensm/opensm/osm_multicast.c | 33 +++++++++++---------------------- >>> 1 files changed, 11 insertions(+), 22 deletions(-) >>> >>> diff --git a/opensm/opensm/osm_multicast.c >>> b/opensm/opensm/osm_multicast.c >>> index d62d585..350fd22 100644 >>> --- a/opensm/opensm/osm_multicast.c >>> +++ b/opensm/opensm/osm_multicast.c >>> @@ -172,17 +172,11 @@ osm_mcm_port_t *osm_mgrp_add_port(IN osm_subn_t >>> *subn, osm_log_t *log, >>> p_mgrp->last_change_id++; >>> } >>> >>> - if ((join_state ^ prev_join_state) & IB_JOIN_STATE_FULL) { >>> - if (join_state & IB_JOIN_STATE_FULL) { >>> - if (++p_mgrp->full_members == 1) { >>> - mgrp_send_notice(subn, log, p_mgrp, 66); >>> - p_mgrp->to_be_deleted = 0; >>> - } >>> - } else if (--p_mgrp->full_members == 0) { >>> - mgrp_send_notice(subn, log, p_mgrp, 67); >>> - if (!p_mgrp->well_known) >>> - p_mgrp->to_be_deleted = 1; >>> - } >>> + if ((join_state & IB_JOIN_STATE_FULL) && >>> + !(prev_join_state & IB_JOIN_STATE_FULL) && >>> + (++p_mgrp->full_members == 1)) { >>> + mgrp_send_notice(subn, log, p_mgrp, 66); >>> + p_mgrp->to_be_deleted = 0; >>> } >>> >>> return (p_mcm_port); >>> @@ -224,17 +218,12 @@ int osm_mgrp_remove_port(osm_subn_t *subn, >>> osm_log_t *log, osm_mgrp_t *mgrp, >>> >>> /* no more full members so the group will be deleted after >>> re-route >>> but only if it is not a well known group */ >>> - if ((port_join_state ^ new_join_state) & IB_JOIN_STATE_FULL) { >>> - if (port_join_state & IB_JOIN_STATE_FULL) { >>> - if (--mgrp->full_members == 0) { >>> - mgrp_send_notice(subn, log, mgrp, 67); >>> - if (!mgrp->well_known) >>> - mgrp->to_be_deleted = 1; >>> - } >>> - } else if (++mgrp->full_members == 1) { >>> - mgrp_send_notice(subn, log, mgrp, 66); >>> - mgrp->to_be_deleted = 0; >>> - } >>> + if ((port_join_state & IB_JOIN_STATE_FULL) && >>> + !(new_join_state & IB_JOIN_STATE_FULL) && >>> + (--mgrp->full_members == 0)) { >>> + mgrp_send_notice(subn, log, mgrp, 67); >>> + if (!mgrp->well_known) >>> + mgrp->to_be_deleted = 1; >>> } >>> >>> return ret; >>> -- >>> 1.5.1.4 >>> >> > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Mon Nov 10 11:43:34 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 10 Nov 2008 21:43:34 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_multicast.c: bug with joining/leaving mcast group In-Reply-To: <491888FB.5020107@dev.mellanox.co.il> References: <49184706.9070103@dev.mellanox.co.il> <20081110191108.GD313@sashak.voltaire.com> <491888FB.5020107@dev.mellanox.co.il> Message-ID: <20081110194334.GJ313@sashak.voltaire.com> On 21:18 Mon 10 Nov , Yevgeny Kliteynik wrote: > Hi Sasha, > > Sasha Khapyorsky wrote: >> Hi Yevgeny, >> On 16:36 Mon 10 Nov , Yevgeny Kliteynik wrote: >>> I think there's a bug in the osm_mgrp_add/remove_port functions. >>> If some mcast group member has JoinState 0x1 (full member), >>> and then new join from the same port received with JoinState >>> 0x2 (non member), OpenSM will reduce number of full members >>> of this group, which eventually might cause group deletion. >> Right, isn't this how things should work? When full member updates it >> state to non member the number of full members are reduced, and then >> last full member leaves the MC group is deleted (o15-0.2-1.9). > > I thought so too, but turns out that it's wrong: > > o15-0.1.11: If SA supports UD multicast, then if an endport joins a > multicast group as specified in o15-0.1.10:, SA shall replace the > endport?s current MCMemberRecord:JoinState component with the logical > OR of the MCMemberRecord:JoinState component with the endport?s current > MCMemberRecord:JoinState component if the endport had joined this > multicast group before. > > So the full member doesn't update its state to non-member, but rather > adds additional bit to the JoinState (the non-member). Ok. I see now. Applied. Thanks. Sasha From friedman at ucla.edu Mon Nov 10 11:45:07 2008 From: friedman at ucla.edu (Scott A. Friedman) Date: Mon, 10 Nov 2008 11:45:07 -0800 Subject: [ofa-general] ib_mthca catastrophic error detected In-Reply-To: <1E3DCD1C63492545881FACB6063A57C1030CF354@mtiexch01.mti.com> References: <4906645D.6010101@ucla.edu> <4907054E.9080205@mellanox.co.il><490763D0.5020002@ucla.edu><200811061154.02260.jackm@dev.mellanox.co.il> <491338D1.8050205@ucla.edu> <1E3DCD1C63492545881FACB6063A57C1030CF354@mtiexch01.mti.com> Message-ID: <49188F43.3050907@ucla.edu> Hi No, no boot over IB - in fact there is no IPoIB configured on this cluster at all. The firmware Matt sent seems to have fixed the problem as we have been unable to reproduce since we flashed some test nodes. We are in the process of flashing the remaining 100 or so nodes that have SDR cards as jobs finish. Scott Boris Shpolyansky wrote: > Scott, > > Do you use any form of Boot-over-IB in this cluster? > If so - what version/flavor of it? > > Thanks, > Boris Shpolyansky > Sr. Member of Technical Staff > Applications > Mellanox Technologies Inc. > 2900 Stender Way > Santa Clara, CA 95054 > Tel.: (408) 916 0014 > Fax: (408) 970 3403 > Cell: (408) 834 9365 > www.mellanox.com > > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Scott A. > Friedman > Sent: Thursday, November 06, 2008 10:35 AM > To: Jack Morgenstein > Cc: Matthew Finlay; general at lists.openfabrics.org > Subject: Re: [ofa-general] ib_mthca catastrophic error detected > > Hi > > We have been working with Matthew Finlay on this > recently - you/we might pull all of this together. We are able to make > any of our sdr cards have a catastrophic error - and are unable to do > the same with our ddr cards. Matt has suggested that there is a firmware > > fix possibly? > > Anyway, to answer your questions: > > The hosts are Sun X2200M, but we have swapped a few around with some > hosts we have from Aspen systems and the problem remains. I suppose the > similarity is that they are all nForce based. > > The MPI used was the latest OpenMPI - I will find the version, but I do > not think it matters whether we are using OpenMPI or MVAPICH. > > The job itself does not seem to matter either. The situation is after a > node comes up it takes a very long time for the card to become ACTIVE. > It seems to ocsillate between ACTIVE and INIT. We have waited several > minutes sometimes but can never be sure of when it will settle down. The > > queue certainly doesn't know and a job submitted to such a node will die > > as the cards will have a catastrophic error. > > Scott > > > > Console output from the following linux commands: > > cat /etc/*rel* > > > Not a good idea...maybe this > > #cat /etc/redhat-release > CentOS release 5 (Final) > > > cat /etc/lilo.conf , or: cat /boot/grub/menu.lst (if you are using > > grub) > > # grub.conf generated by anaconda > # > # Note that you do not have to rerun grub after making changes to this > file > # NOTICE: You have a /boot partition. This means that > # all kernel and initrd paths are relative to /boot/, eg. > # root (hd0,0) > # kernel /vmlinuz-version ro root=/dev/hda3 > # initrd /initrd-version.img > #boot=/dev/hda > default=0 > timeout=5 > splashimage=(hd0,0)/grub/splash.xpm.gz > hiddenmenu > title CentOS (2.6.18-92.1.6.el5) > root (hd0,0) > kernel /vmlinuz-2.6.18-92.1.6.el5 ro root=LABEL=/ rhgb quiet > initrd /initrd-2.6.18-92.1.6.el5.img > > > > uname -a > > Linux n141 2.6.18-92.1.6.el5 #1 SMP Wed Jun 25 13:45:47 EDT 2008 x86_64 > x86_64 x86_64 GNU/Linux > > > > cat /proc/cpuinfo > > cat /proc/meminfo > > processor : 0 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 0 > siblings : 4 > core id : 0 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4424.75 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > processor : 1 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 0 > siblings : 4 > core id : 1 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4426.22 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > processor : 2 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 0 > siblings : 4 > core id : 2 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4421.37 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > processor : 3 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 0 > siblings : 4 > core id : 3 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4421.65 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > processor : 4 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 1 > siblings : 4 > core id : 0 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4422.36 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > processor : 5 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 1 > siblings : 4 > core id : 1 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4422.71 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > processor : 6 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 1 > siblings : 4 > core id : 2 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4422.17 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > processor : 7 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 1 > siblings : 4 > core id : 3 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4422.17 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > > > > MemTotal: 8182568 kB > MemFree: 4535892 kB > Buffers: 318232 kB > Cached: 1583772 kB > SwapCached: 0 kB > Active: 2714400 kB > Inactive: 730260 kB > HighTotal: 0 kB > HighFree: 0 kB > LowTotal: 8182568 kB > LowFree: 4535892 kB > SwapTotal: 8289532 kB > SwapFree: 8289380 kB > Dirty: 340 kB > Writeback: 0 kB > AnonPages: 1542636 kB > Mapped: 14588 kB > Slab: 139788 kB > PageTables: 7208 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > CommitLimit: 12380816 kB > Committed_AS: 1679420 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 4600 kB > VmallocChunk: 34359733707 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > Hugepagesize: 2048 kB > > > > Jack Morgenstein wrote: >> On Tuesday 28 October 2008 21:11, Scott A. Friedman wrote: >>> Hi >>> >>> This cluster has OFED 1.2.5.4 running on it. The ib_mthca kernel > module >>> reports the following on startup: >>> >>> ib_mthca: Mellanox InfiniBand HCA driver v1.0 (February 28, 2008) >>> >>> The cards in all (22) of the nodes we have seen this error on are as >>> follows: >>> >>> hca_id: mthca0 >>> fw_ver: 1.2.0 >>> vendor_id: 0x02c9 >>> vendor_part_id: 25204 >>> hw_ver: 0xA0 >>> board_id: MT_03B0140001 >>> phys_port_cnt: 1 >>> >>> It appears that when this happens the driver restarts (loads?) itself > >>> however the job running at the time of the error is, of course, > killed. >>> Scott >> Scott, >> We are trying to reproduce this here. It would help if you could > supply >> the following info: >> >> Host model for hosts which are experiencing the failure: >> >> Console output from the following linux commands: >> cat /etc/*rel* >> cat /etc/lilo.conf , or: cat /boot/grub/menu.lst (if you are using > grub) >> uname -a >> cat /proc/cpuinfo >> cat /proc/meminfo >> >> Also, what sort of job was running when the failure occurred: >> -- which MPI are you using? >> -- do you have a test example which we can run here to reproduce the > problem? >> Thanks in advance for your help! >> >> Jack Morgenstein >> Senior Software Development Engineer >> Mellanox > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From sashak at voltaire.com Mon Nov 10 11:51:29 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 10 Nov 2008 21:51:29 +0200 Subject: [ofa-general] Re: [PATCH 1/2] fix default configuration files path In-Reply-To: <4912BD7C.1030603@Voltaire.COM> References: <4912BCFC.8030407@Voltaire.COM> <4912BD7C.1030603@Voltaire.COM> Message-ID: <20081110195129.GK313@sashak.voltaire.com> On 11:48 Thu 06 Nov , Doron Shoham wrote: > fix default configuration files path in QoS_management_in_OpenSM.txt file > from /usr/local/etc/opensm/ to /etc/opensm/ > > Signed-off-by: Doron Shoham > --- > opensm/doc/QoS_management_in_OpenSM.txt | 6 +++--- > 1 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/opensm/doc/QoS_management_in_OpenSM.txt b/opensm/doc/QoS_management_in_OpenSM.txt > index ba1b4b1..1a48b1a 100644 > --- a/opensm/doc/QoS_management_in_OpenSM.txt > +++ b/opensm/doc/QoS_management_in_OpenSM.txt > @@ -20,7 +20,7 @@ > > When QoS in OpenSM is enabled (-Q or --qos), OpenSM looks for QoS Policy file. > The default name of OpenSM QoS policy file is > -/usr/local/etc/opensm/qos-policy.conf. The default may be changed by using -Y > +/etc/opensm/qos-policy.conf. The default may be changed by using -Y > or --qos_policy_file option with OpenSM. The OpenSM config dir is configured value so it could be /usr/local/etc/opensm or /etc/opensm or something else. Basically I'm fine with using '/etc/opensm', but then it should be updated to other docs too (specifically in doc/performance-manager-HOWTO.txt). Other way to handle this is to make *.in templates for those docs where config path is used and generate the file in ./configure time (similar to how it is done with OpenSM man page). Probably it is overkill for docs... Thoughts? Sasha From sashak at voltaire.com Mon Nov 10 11:58:17 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 10 Nov 2008 21:58:17 +0200 Subject: [ofa-general] Re: [PATCH] export osm_log_max in MB In-Reply-To: <4912DC30.40309@Voltaire.COM> References: <49101D1F.4040605@Voltaire.COM> <4912DC30.40309@Voltaire.COM> Message-ID: <20081110195817.GL313@sashak.voltaire.com> On 13:59 Thu 06 Nov , Doron Shoham wrote: > export the osm_log_max in MB when using 'opensm -c > > Signed-off-by: Doron Shoham Both applied. Thanks. Sasha From sashak at voltaire.com Mon Nov 10 12:13:33 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 10 Nov 2008 22:13:33 +0200 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_state_mgr.c: Add check for valid physical port before using pointer. In-Reply-To: <20081104095744.35893d4a.weiny2@llnl.gov> References: <20081104095744.35893d4a.weiny2@llnl.gov> Message-ID: <20081110201333.GM313@sashak.voltaire.com> On 09:57 Tue 04 Nov , Ira Weiny wrote: > From 567c3893f24f4dc25ef5f4e74ef9deeb8ae541ad Mon Sep 17 00:00:00 2001 > From: Ira Weiny > Date: Mon, 3 Nov 2008 14:47:50 -0800 > Subject: [PATCH] opensm/opensm/osm_state_mgr.c: Add check for valid physical port before using > pointer. > > There are times when PortInfo fails which leaves osm_node_t with invalid > osm_physp_t pointers. In this case do not use an invalid pointer. > > Signed-off-by: Ira Weiny Applied. Thanks. However some note is below. > --- > opensm/opensm/osm_state_mgr.c | 6 ++++++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c > index ba3b6bf..841438c 100644 > --- a/opensm/opensm/osm_state_mgr.c > +++ b/opensm/opensm/osm_state_mgr.c > @@ -542,6 +542,12 @@ static void __osm_state_mgr_get_node_desc(IN cl_map_item_t * const p_object, > > /* get a physp to request from. */ > p_physp = osm_node_get_any_physp_ptr(p_node); > + if (!osm_physp_is_valid(p_physp)) { > + OSM_LOG(sm->p_log, OSM_LOG_ERROR, > + "__osm_state_mgr_get_node_desc: ERR 331C: " > + "Failed to get valid physical port object\n"); > + goto exit; > + } Actually it can be a valid case. For example when node was first time discovered via port A, when this port was disconnected and the same node was discovered via port B - it is not a new node and node_info (where port number for osm_node_get_any_physp_ptr() is stored) will not be updated. Obviously the patch is fine. But probably we need more general fix, for example to redo osm_node_get_any_physp_ptr() so that it will not return invalid ports. Need to review other osm_node_get_any_physp_ptr() usages. Sasha From rdreier at cisco.com Mon Nov 10 12:36:23 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 10 Nov 2008 12:36:23 -0800 Subject: [ofa-general] Re: [PATCH] IB/ehca: Fix suppression of port activation events In-Reply-To: <200811071742.51867.fenkes@de.ibm.com> (Joachim Fenkes's message of "Fri, 7 Nov 2008 17:42:51 +0100") References: <200806061835.43802.fenkes@de.ibm.com> <48499C11.7030504@gmail.com> <200811071742.51867.fenkes@de.ibm.com> Message-ID: > A previous fix introduced a regression where port activation events were > dropped unconditionally if port autodetection was not enabled. Fixed. Is this a fix to "IB/ehca: Remove reference to special QP in case of port activation failure"? Because if so I can roll it into that patch, since Linus hasn't pulled it yet. - R. From boris at mellanox.com Mon Nov 10 12:50:13 2008 From: boris at mellanox.com (Boris Shpolyansky) Date: Mon, 10 Nov 2008 12:50:13 -0800 Subject: [ofa-general] ib_mthca catastrophic error detected References: <4906645D.6010101@ucla.edu> <4907054E.9080205@mellanox.co.il><490763D0.5020002@ucla.edu><200811061154.02260.jackm@dev.mellanox.co.il> <491338D1.8050205@ucla.edu> <1E3DCD1C63492545881FACB6063A57C1030CF354@mtiexch01.mti.com> <49188F43.3050907@ucla.edu> Message-ID: <1E3DCD1C63492545881FACB6063A57C1031A10D9@mtiexch01.mti.com> OK, great! Please, update us as soon as you have the entire cluster upgraded to the new FW and have run more tests on it. Thanks, Boris Shpolyansky Sr. Member of Technical Staff Applications Mellanox Technologies Inc. 2900 Stender Way Santa Clara, CA 95054 Tel.: (408) 916 0014 Fax: (408) 970 3403 Cell: (408) 834 9365 www.mellanox.com -----Original Message----- From: Scott A. Friedman [mailto:friedman at ucla.edu] Sent: Monday, November 10, 2008 11:45 AM To: Boris Shpolyansky Cc: Jack Morgenstein; Matthew Finlay; general at lists.openfabrics.org Subject: Re: [ofa-general] ib_mthca catastrophic error detected Hi No, no boot over IB - in fact there is no IPoIB configured on this cluster at all. The firmware Matt sent seems to have fixed the problem as we have been unable to reproduce since we flashed some test nodes. We are in the process of flashing the remaining 100 or so nodes that have SDR cards as jobs finish. Scott Boris Shpolyansky wrote: > Scott, > > Do you use any form of Boot-over-IB in this cluster? > If so - what version/flavor of it? > > Thanks, > Boris Shpolyansky > Sr. Member of Technical Staff > Applications > Mellanox Technologies Inc. > 2900 Stender Way > Santa Clara, CA 95054 > Tel.: (408) 916 0014 > Fax: (408) 970 3403 > Cell: (408) 834 9365 > www.mellanox.com > > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Scott A. > Friedman > Sent: Thursday, November 06, 2008 10:35 AM > To: Jack Morgenstein > Cc: Matthew Finlay; general at lists.openfabrics.org > Subject: Re: [ofa-general] ib_mthca catastrophic error detected > > Hi > > We have been working with Matthew Finlay on this > recently - you/we might pull all of this together. We are able to make > any of our sdr cards have a catastrophic error - and are unable to do > the same with our ddr cards. Matt has suggested that there is a firmware > > fix possibly? > > Anyway, to answer your questions: > > The hosts are Sun X2200M, but we have swapped a few around with some > hosts we have from Aspen systems and the problem remains. I suppose the > similarity is that they are all nForce based. > > The MPI used was the latest OpenMPI - I will find the version, but I do > not think it matters whether we are using OpenMPI or MVAPICH. > > The job itself does not seem to matter either. The situation is after a > node comes up it takes a very long time for the card to become ACTIVE. > It seems to ocsillate between ACTIVE and INIT. We have waited several > minutes sometimes but can never be sure of when it will settle down. The > > queue certainly doesn't know and a job submitted to such a node will die > > as the cards will have a catastrophic error. > > Scott > > > > Console output from the following linux commands: > > cat /etc/*rel* > > > Not a good idea...maybe this > > #cat /etc/redhat-release > CentOS release 5 (Final) > > > cat /etc/lilo.conf , or: cat /boot/grub/menu.lst (if you are using > > grub) > > # grub.conf generated by anaconda > # > # Note that you do not have to rerun grub after making changes to this > file > # NOTICE: You have a /boot partition. This means that > # all kernel and initrd paths are relative to /boot/, eg. > # root (hd0,0) > # kernel /vmlinuz-version ro root=/dev/hda3 > # initrd /initrd-version.img > #boot=/dev/hda > default=0 > timeout=5 > splashimage=(hd0,0)/grub/splash.xpm.gz > hiddenmenu > title CentOS (2.6.18-92.1.6.el5) > root (hd0,0) > kernel /vmlinuz-2.6.18-92.1.6.el5 ro root=LABEL=/ rhgb quiet > initrd /initrd-2.6.18-92.1.6.el5.img > > > > uname -a > > Linux n141 2.6.18-92.1.6.el5 #1 SMP Wed Jun 25 13:45:47 EDT 2008 x86_64 > x86_64 x86_64 GNU/Linux > > > > cat /proc/cpuinfo > > cat /proc/meminfo > > processor : 0 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 0 > siblings : 4 > core id : 0 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4424.75 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > processor : 1 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 0 > siblings : 4 > core id : 1 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4426.22 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > processor : 2 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 0 > siblings : 4 > core id : 2 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4421.37 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > processor : 3 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 0 > siblings : 4 > core id : 3 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4421.65 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > processor : 4 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 1 > siblings : 4 > core id : 0 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4422.36 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > processor : 5 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 1 > siblings : 4 > core id : 1 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4422.71 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > processor : 6 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 1 > siblings : 4 > core id : 2 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4422.17 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > processor : 7 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 2354 > stepping : 3 > cpu MHz : 2200.000 > cache size : 512 KB > physical id : 1 > siblings : 4 > core id : 3 > cpu cores : 4 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm > cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse > 3dnowprefetch osvw > bogomips : 4422.17 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > > > > MemTotal: 8182568 kB > MemFree: 4535892 kB > Buffers: 318232 kB > Cached: 1583772 kB > SwapCached: 0 kB > Active: 2714400 kB > Inactive: 730260 kB > HighTotal: 0 kB > HighFree: 0 kB > LowTotal: 8182568 kB > LowFree: 4535892 kB > SwapTotal: 8289532 kB > SwapFree: 8289380 kB > Dirty: 340 kB > Writeback: 0 kB > AnonPages: 1542636 kB > Mapped: 14588 kB > Slab: 139788 kB > PageTables: 7208 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > CommitLimit: 12380816 kB > Committed_AS: 1679420 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 4600 kB > VmallocChunk: 34359733707 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > Hugepagesize: 2048 kB > > > > Jack Morgenstein wrote: >> On Tuesday 28 October 2008 21:11, Scott A. Friedman wrote: >>> Hi >>> >>> This cluster has OFED 1.2.5.4 running on it. The ib_mthca kernel > module >>> reports the following on startup: >>> >>> ib_mthca: Mellanox InfiniBand HCA driver v1.0 (February 28, 2008) >>> >>> The cards in all (22) of the nodes we have seen this error on are as >>> follows: >>> >>> hca_id: mthca0 >>> fw_ver: 1.2.0 >>> vendor_id: 0x02c9 >>> vendor_part_id: 25204 >>> hw_ver: 0xA0 >>> board_id: MT_03B0140001 >>> phys_port_cnt: 1 >>> >>> It appears that when this happens the driver restarts (loads?) itself > >>> however the job running at the time of the error is, of course, > killed. >>> Scott >> Scott, >> We are trying to reproduce this here. It would help if you could > supply >> the following info: >> >> Host model for hosts which are experiencing the failure: >> >> Console output from the following linux commands: >> cat /etc/*rel* >> cat /etc/lilo.conf , or: cat /boot/grub/menu.lst (if you are using > grub) >> uname -a >> cat /proc/cpuinfo >> cat /proc/meminfo >> >> Also, what sort of job was running when the failure occurred: >> -- which MPI are you using? >> -- do you have a test example which we can run here to reproduce the > problem? >> Thanks in advance for your help! >> >> Jack Morgenstein >> Senior Software Development Engineer >> Mellanox > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From sashak at voltaire.com Mon Nov 10 13:02:33 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 10 Nov 2008 23:02:33 +0200 Subject: [ofa-general] Re: [opensm patch][2/2] verify config inputs when config file is rescanned In-Reply-To: <1225404081.1197.534.camel@cardanus.llnl.gov> References: <1225404081.1197.534.camel@cardanus.llnl.gov> Message-ID: <20081110210233.GE3467@sashak.voltaire.com> Hi Al, On 15:01 Thu 30 Oct , Al Chu wrote: > Hey Sasha, > > I noticed that after the config file is rescanned, the new potential > inputs aren't checked for validity. Patch is attached. > > Al > > -- > Albert Chu > chu11 at llnl.gov > Computer Scientist > High Performance Systems Division > Lawrence Livermore National Laboratory > From edfcd2de96c3525d1609b4c0f03c17ecc0495c18 Mon Sep 17 00:00:00 2001 > From: root > Date: Thu, 30 Oct 2008 13:58:55 -0700 > Subject: [PATCH] verify rescanned config input > > > Signed-off-by: root ^^^^^^^^^^^^^^^^^^^^^^^^ I'm fine with this patch, but could you fix S-O-B line? Thanks. Sasha From chu11 at llnl.gov Mon Nov 10 13:03:53 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 10 Nov 2008 13:03:53 -0800 Subject: [ofa-general] Re: [opensm patch] support dump_conf command in opensm console In-Reply-To: <1226338962.13603.21.camel@cardanus.llnl.gov> References: <1225759191.7307.9.camel@cardanus.llnl.gov> <20081109172518.GG30588@sashak.voltaire.com> <1226338962.13603.21.camel@cardanus.llnl.gov> Message-ID: <1226351033.13603.23.camel@cardanus.llnl.gov> Hey Sasha, Attached is the re-worked patch. Assumes changes from my "fix qos config parsing bugs" patch are accepted. Al On Mon, 2008-11-10 at 09:42 -0800, Al Chu wrote: > Hey Sasha, > > On Sun, 2008-11-09 at 19:25 +0200, Sasha Khapyorsky wrote: > > Hi Al, > > > > On 16:39 Mon 03 Nov , Al Chu wrote: > > > Hey Sasha, > > > > > > When config files are rescanned and loaded, there's no way to know if > > > the right configuration was actually reloaded or not. A console command > > > to dump the current config is a useful way to verify the loading of new > > > configs or not. > > > > > > This patch assumes the fixes from my "fix qos config parsing bugs" is > > > accepted. > > > > Didn't pass over it, sorry about delay. > > > > > > > > Al > > > > > > -- > > > Albert Chu > > > chu11 at llnl.gov > > > Computer Scientist > > > High Performance Systems Division > > > Lawrence Livermore National Laboratory > > > > > From 249607e47ec7ef1b92f9578cece90460418d12b8 Mon Sep 17 00:00:00 2001 > > > From: Albert Chu > > > Date: Mon, 3 Nov 2008 16:22:29 -0800 > > > Subject: [PATCH] support dump_conf console command > > > > > > > > > Signed-off-by: Albert Chu > > > --- > > > opensm/opensm/osm_console.c | 158 +++++++++++++++++++++++++++++++++++++++++++ > > > 1 files changed, 158 insertions(+), 0 deletions(-) > > > > > > diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c > > > index d9bbbc2..8422655 100644 > > > --- a/opensm/opensm/osm_console.c > > > +++ b/opensm/opensm/osm_console.c > > > @@ -53,6 +53,10 @@ > > > #include > > > #include > > > > > > +#define NULL_STR "(null)" > > > + > > > +#define BOOLEAN_STR(__b) ((__b) ? "TRUE" : "FALSE") > > > + > > > struct command { > > > char *name; > > > void (*help_function) (FILE * out, int detail); > > > @@ -189,6 +193,14 @@ static void help_lidbalance(FILE * out, int detail) > > > } > > > } > > > > > > +static void help_dump_conf(FILE *out, int detail) > > > +{ > > > + fprintf(out, "dump_conf\n"); > > > + if (detail) { > > > + fprintf(out, "dump current opensm configuration\n"); > > > + } > > > +} > > > + > > > #ifdef ENABLE_OSM_PERF_MGR > > > static void help_perfmgr(FILE * out, int detail) > > > { > > > @@ -1136,6 +1148,151 @@ static void perfmgr_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) > > > } > > > #endif /* ENABLE_OSM_PERF_MGR */ > > > > > > +static void dump_qos_options(osm_qos_options_t * opt, > > > + osm_qos_options_t * dflt, > > > + char *prefix, > > > + FILE * out) > > > +{ > > > + fprintf(out, "%s_max_vls : %u\n", > > > + prefix, opt->max_vls ? opt->max_vls : dflt->max_vls); > > > + fprintf(out, "%s_high_limit : %u\n", > > > + prefix, opt->high_limit >= 0 ? (unsigned)opt->high_limit : (unsigned)dflt->high_limit); > > > + fprintf(out, "%s_vlarb_high : %s\n", > > > + prefix, opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high); > > > + fprintf(out, "%s_vlarb_low : %s\n", > > > + prefix, opt->vlarb_low ? opt->vlarb_low : dflt->vlarb_low); > > > + fprintf(out, "%s_sl2vl : %s\n", > > > + prefix, opt->sl2vl ? opt->sl2vl : dflt->sl2vl); > > > +} > > > + > > > +static void dump_conf_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) > > > +{ > > > > Why to not use osm_subn_write_conf_file() function (wrapped by > > dump_conf_parse())? I think we need to have config dumping code > > consolidated. > > I had thought of that, but I didn't want all of the instructions and all > the extra lines of output. But I guess it's not that big of a deal in > the end. I'll send a new patch. > > Al > > > Sasha > > > > > + osm_subn_opt_t * opt = &p_osm->subn.opt; > > > + > > > + fprintf(out, "config_file : %s\n", > > > + opt->config_file ? opt->config_file : NULL_STR); > > > + fprintf(out, "guid : 0x%016" PRIx64 "\n", opt->guid); > > > + fprintf(out, "m_key : 0x%016" PRIx64 "\n", opt->m_key); > > > + fprintf(out, "sm_key : 0x%016" PRIx64 "\n", opt->sm_key); > > > + fprintf(out, "sa_key : 0x%016" PRIx64 "\n", opt->sa_key); > > > + fprintf(out, "subnet_prefix : 0x%016" PRIx64 "\n", opt->subnet_prefix); > > > + fprintf(out, "m_key_lease_period : %u\n", opt->m_key_lease_period); > > > + fprintf(out, "sweep_interval : %u\n", opt->sweep_interval); > > > + fprintf(out, "max_wire_smps : %u\n", opt->max_wire_smps); > > > + fprintf(out, "transaction_timeout : %u\n", opt->transaction_timeout); > > > + fprintf(out, "sm_priority : %u\n", opt->sm_priority); > > > + fprintf(out, "lmc : %u\n", opt->lmc); > > > + fprintf(out, "lmc_esp0 : %s\n", > > > + BOOLEAN_STR(opt->lmc_esp0)); > > > + fprintf(out, "max_op_vls : %u\n", opt->max_op_vls); > > > + fprintf(out, "force_link_speed : %u\n", opt->force_link_speed); > > > + fprintf(out, "reassign_lids : %s\n", > > > + BOOLEAN_STR(opt->reassign_lids)); > > > + fprintf(out, "ignore_other_sm : %s\n", > > > + BOOLEAN_STR(opt->ignore_other_sm)); > > > + fprintf(out, "single_thread : %s\n", > > > + BOOLEAN_STR(opt->single_thread)); > > > + fprintf(out, "disable_multicast : %s\n", > > > + BOOLEAN_STR(opt->disable_multicast)); > > > + fprintf(out, "force_log_flush : %s\n", > > > + BOOLEAN_STR(opt->force_log_flush)); > > > + fprintf(out, "subnet_timeout : %u\n", opt->subnet_timeout); > > > + fprintf(out, "packet_life_time : %u\n", opt->packet_life_time); > > > + fprintf(out, "vl_stall_count : %u\n", opt->vl_stall_count); > > > + fprintf(out, "leaf_vl_stall_count : %u\n", opt->leaf_vl_stall_count); > > > + fprintf(out, "head_of_queue_lifetime : %u\n", opt->head_of_queue_lifetime); > > > + fprintf(out, "leaf_head_of_queue_lifetime : %u\n", opt->leaf_head_of_queue_lifetime); > > > + fprintf(out, "local_phy_errors_threshold : %u\n", opt->local_phy_errors_threshold); > > > + fprintf(out, "overrun_errors_threshold : %u\n", opt->overrun_errors_threshold); > > > + fprintf(out, "sminfo_polling_timeout : %u\n", opt->sminfo_polling_timeout); > > > + fprintf(out, "polling_retry_number : %u\n", opt->polling_retry_number); > > > + fprintf(out, "max_msg_fifo_timeout : %u\n", opt->max_msg_fifo_timeout); > > > + fprintf(out, "force_heavy_sweep : %s\n", > > > + BOOLEAN_STR(opt->force_heavy_sweep)); > > > + fprintf(out, "log_flags : 0x%02x\n", opt->log_flags); > > > + fprintf(out, "dump_files_dir : %s\n", > > > + opt->dump_files_dir ? opt->dump_files_dir : NULL_STR); > > > + fprintf(out, "log_file : %s\n", > > > + opt->log_file ? opt->log_file : NULL_STR); > > > + fprintf(out, "log_max_size : %lu\n", opt->log_max_size); > > > + fprintf(out, "partition_config_file : %s\n", > > > + opt->partition_config_file ? opt->partition_config_file : NULL_STR); > > > + fprintf(out, "no_partition_enforcement : %s\n", > > > + BOOLEAN_STR(opt->no_partition_enforcement)); > > > + fprintf(out, "qos : %s\n", > > > + BOOLEAN_STR(opt->qos)); > > > + fprintf(out, "qos_policy_file : %s\n", > > > + opt->qos_policy_file ? opt->qos_policy_file : NULL_STR); > > > + fprintf(out, "accum_log_file: %s\n", > > > + BOOLEAN_STR(opt->accum_log_file)); > > > + fprintf(out, "console : %s\n", > > > + opt->console ? opt->console : NULL_STR); > > > + fprintf(out, "console_port : %u\n", opt->console_port); > > > + fprintf(out, "port_prof_ignore_file : %s\n", > > > + opt->port_prof_ignore_file ? opt->port_prof_ignore_file : NULL_STR); > > > + fprintf(out, "port_profile_switch_nodes : %s\n", > > > + BOOLEAN_STR(opt->port_profile_switch_nodes)); > > > + fprintf(out, "sweep_on_trap : %s\n", > > > + BOOLEAN_STR(opt->sweep_on_trap)); > > > + fprintf(out, "routing_engine_names : %s\n", > > > + opt->routing_engine_names ? opt->routing_engine_names : NULL_STR); > > > + fprintf(out, "use_ucast_cache : %s\n", > > > + BOOLEAN_STR(opt->use_ucast_cache)); > > > + fprintf(out, "connect_roots : %s\n", > > > + BOOLEAN_STR(opt->connect_roots)); > > > + fprintf(out, "lid_matrix_dump_file : %s\n", > > > + opt->lid_matrix_dump_file ? opt->lid_matrix_dump_file : NULL_STR); > > > + fprintf(out, "lfts_file : %s\n", > > > + opt->lfts_file ? opt->lfts_file : NULL_STR); > > > + fprintf(out, "root_guid_file : %s\n", > > > + opt->root_guid_file ? opt->root_guid_file : NULL_STR); > > > + fprintf(out, "cn_guid_file : %s\n", > > > + opt->cn_guid_file ? opt->cn_guid_file : NULL_STR); > > > + fprintf(out, "ids_guid_file : %s\n", > > > + opt->ids_guid_file ? opt->ids_guid_file : NULL_STR); > > > + fprintf(out, "guid_routing_order_file : %s\n", > > > + opt->guid_routing_order_file ? opt->guid_routing_order_file : NULL_STR); > > > + fprintf(out, "sa_db_file : %s\n", > > > + opt->sa_db_file ? opt->sa_db_file : NULL_STR); > > > + fprintf(out, "exit_on_fatal : %s\n", > > > + BOOLEAN_STR(opt->exit_on_fatal)); > > > + fprintf(out, "honor_guid2lid_file : %s\n", > > > + BOOLEAN_STR(opt->honor_guid2lid_file)); > > > + fprintf(out, "daemon : %s\n", > > > + BOOLEAN_STR(opt->daemon)); > > > + fprintf(out, "sm_inactive : %s\n", > > > + BOOLEAN_STR(opt->sm_inactive)); > > > + fprintf(out, "babbling_port_policy : %s\n", > > > + BOOLEAN_STR(opt->babbling_port_policy)); > > > + dump_qos_options(&opt->qos_options, &opt->qos_options, "qos", out); > > > + dump_qos_options(&opt->qos_ca_options, &opt->qos_options, "qos_ca", out); > > > + dump_qos_options(&opt->qos_sw0_options, &opt->qos_options, "qos_sw0", out); > > > + dump_qos_options(&opt->qos_swe_options, &opt->qos_options, "qos_swe", out); > > > + dump_qos_options(&opt->qos_rtr_options, &opt->qos_options, "qos_rtr", out); > > > + fprintf(out, "enable_quirks : %s\n", > > > + BOOLEAN_STR(opt->enable_quirks)); > > > + fprintf(out, "no_clients_rereg : %s\n", > > > + BOOLEAN_STR(opt->no_clients_rereg)); > > > +#ifdef ENABLE_OSM_PERF_MGR > > > + fprintf(out, "perfmgr : %s\n", > > > + BOOLEAN_STR(opt->perfmgr)); > > > + fprintf(out, "perfmgr_redir : %s\n", > > > + BOOLEAN_STR(opt->perfmgr_redir)); > > > + fprintf(out, "perfmgr_sweep_time_s : %u\n", opt->perfmgr_sweep_time_s); > > > + fprintf(out, "perfmgr_max_outstanding_queries : %u\n", opt->perfmgr_max_outstanding_queries); > > > + fprintf(out, "event_db_dump_file : %s\n", > > > + opt->event_db_dump_file ? opt->event_db_dump_file : NULL_STR); > > > +#endif > > > + fprintf(out, "event_plugin_name : %s\n", > > > + opt->event_plugin_name ? opt->event_plugin_name : NULL_STR); > > > + fprintf(out, "node_name_map_name : %s\n", > > > + opt->node_name_map_name ? opt->node_name_map_name : NULL_STR); > > > + fprintf(out, "prefix_routes_file : %s\n", > > > + opt->prefix_routes_file ? opt->prefix_routes_file : NULL_STR); > > > + fprintf(out, "consolidate_ipv6_snm_req : %s\n", > > > + BOOLEAN_STR(opt->consolidate_ipv6_snm_req)); > > > +} > > > + > > > static void quit_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) > > > { > > > osm_console_exit(&p_osm->console, &p_osm->log); > > > @@ -1166,6 +1323,7 @@ static const struct command console_cmds[] = { > > > {"portstatus", &help_portstatus, &portstatus_parse}, > > > {"switchbalance", &help_switchbalance, &switchbalance_parse}, > > > {"lidbalance", &help_lidbalance, &lidbalance_parse}, > > > + {"dump_conf", &help_dump_conf, &dump_conf_parse}, > > > {"version", &help_version, &version_parse}, > > > #ifdef ENABLE_OSM_PERF_MGR > > > {"perfmgr", &help_perfmgr, &perfmgr_parse}, > > > -- > > > 1.5.4.5 > > > > > -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0003-support-dump_conf-console-command.patch Type: text/x-patch Size: 11769 bytes Desc: not available URL: From weiny2 at llnl.gov Mon Nov 10 13:11:40 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 10 Nov 2008 13:11:40 -0800 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_state_mgr.c: Add check for valid physical port before using pointer. In-Reply-To: <20081110201333.GM313@sashak.voltaire.com> References: <20081104095744.35893d4a.weiny2@llnl.gov> <20081110201333.GM313@sashak.voltaire.com> Message-ID: <20081110131140.52561f42.weiny2@llnl.gov> On Mon, 10 Nov 2008 22:13:33 +0200 Sasha Khapyorsky wrote: > On 09:57 Tue 04 Nov , Ira Weiny wrote: > > From 567c3893f24f4dc25ef5f4e74ef9deeb8ae541ad Mon Sep 17 00:00:00 2001 > > From: Ira Weiny > > Date: Mon, 3 Nov 2008 14:47:50 -0800 > > Subject: [PATCH] opensm/opensm/osm_state_mgr.c: Add check for valid physical port before using > > pointer. > > > > There are times when PortInfo fails which leaves osm_node_t with invalid > > osm_physp_t pointers. In this case do not use an invalid pointer. > > > > Signed-off-by: Ira Weiny > > Applied. Thanks. > > However some note is below. > > > --- > > opensm/opensm/osm_state_mgr.c | 6 ++++++ > > 1 files changed, 6 insertions(+), 0 deletions(-) > > > > diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c > > index ba3b6bf..841438c 100644 > > --- a/opensm/opensm/osm_state_mgr.c > > +++ b/opensm/opensm/osm_state_mgr.c > > @@ -542,6 +542,12 @@ static void __osm_state_mgr_get_node_desc(IN cl_map_item_t * const p_object, > > > > /* get a physp to request from. */ > > p_physp = osm_node_get_any_physp_ptr(p_node); > > + if (!osm_physp_is_valid(p_physp)) { > > + OSM_LOG(sm->p_log, OSM_LOG_ERROR, > > + "__osm_state_mgr_get_node_desc: ERR 331C: " > > + "Failed to get valid physical port object\n"); > > + goto exit; > > + } > > Actually it can be a valid case. For example when node was first time > discovered via port A, when this port was disconnected and the same node > was discovered via port B - it is not a new node and node_info (where > port number for osm_node_get_any_physp_ptr() is stored) will not be > updated. Ah, good point, I just happened to see it when PortInfo failed. > > Obviously the patch is fine. But probably we need more general fix, for > example to redo osm_node_get_any_physp_ptr() so that it will not return > invalid ports. Need to review other osm_node_get_any_physp_ptr() usages. I was wondering if it would return invalid ports ever. It would be easy for it to return only valid ports but perhaps that should be another function to preserve functionality? Ira From chu11 at llnl.gov Mon Nov 10 13:15:30 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 10 Nov 2008 13:15:30 -0800 Subject: [ofa-general] Re: [opensm patch][2/2] verify config inputs when config file is rescanned In-Reply-To: <20081110210233.GE3467@sashak.voltaire.com> References: <1225404081.1197.534.camel@cardanus.llnl.gov> <20081110210233.GE3467@sashak.voltaire.com> Message-ID: <1226351730.13603.27.camel@cardanus.llnl.gov> On Mon, 2008-11-10 at 23:02 +0200, Sasha Khapyorsky wrote: > Hi Al, > > On 15:01 Thu 30 Oct , Al Chu wrote: > > Hey Sasha, > > > > I noticed that after the config file is rescanned, the new potential > > inputs aren't checked for validity. Patch is attached. > > > > Al > > > > -- > > Albert Chu > > chu11 at llnl.gov > > Computer Scientist > > High Performance Systems Division > > Lawrence Livermore National Laboratory > > > From edfcd2de96c3525d1609b4c0f03c17ecc0495c18 Mon Sep 17 00:00:00 2001 > > From: root > > Date: Thu, 30 Oct 2008 13:58:55 -0700 > > Subject: [PATCH] verify rescanned config input > > > > > > Signed-off-by: root > ^^^^^^^^^^^^^^^^^^^^^^^^ > > I'm fine with this patch, but could you fix S-O-B line? Thanks. Oops. New one is attached (I'll repost the [1/2] patch too). Al > Sasha -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-verify-rescanned-config-input.patch Type: text/x-patch Size: 1047 bytes Desc: not available URL: From chu11 at llnl.gov Mon Nov 10 13:16:15 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 10 Nov 2008 13:16:15 -0800 Subject: [ofa-general] [opensm patch][1/2] fix qos config parsing bugs In-Reply-To: <1225404078.1197.533.camel@cardanus.llnl.gov> References: <1225404078.1197.533.camel@cardanus.llnl.gov> Message-ID: <1226351775.13603.30.camel@cardanus.llnl.gov> Hey Sasha, New patch w/ proper "signed off by" line. Al On Thu, 2008-10-30 at 15:01 -0700, Al Chu wrote: > Hey Sasha, > > I found a bunch of qos config parsing issues, listed below: > > 1) > > If the user sets the qos default fields (i.e. qos_high_limit, > qos_vlarb_high. etc.), but do not have the qos_ca, qos_swe, qos_rtr, > etc. equivalent fields listed (i.e. qos_ca_high_limit, > qos_sw0_vlarb_high), the values set in teh qos default fields are not > loaded into the CAs, switches, etc. The reason is in qos_build_config() > we load defaults like this: > > p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; > > but we always set the fields to something non-NULL. > > static void subn_set_default_qos_options(IN osm_qos_options_t * opt) > { > opt->max_vls = OSM_DEFAULT_QOS_MAX_VLS; > opt->high_limit = OSM_DEFAULT_QOS_HIGH_LIMIT; > opt->vlarb_high = OSM_DEFAULT_QOS_VLARB_HIGH; > opt->vlarb_low = OSM_DEFAULT_QOS_VLARB_LOW; > opt->sl2vl = OSM_DEFAULT_QOS_SL2VL; > } > > 2) > > In qos_build_config() we load the high_limit like this: > > cfg->vl_high_limit = (uint8_t) opt->high_limit; > > So there is no way to tell the qos_ca, qos_swe, qos_rtr, etc. high_limit > options to "go back to" the default high_limit. It just assumes that > whatever is input (or was set by default) is what you should use. > > 3) > > Some fields like qos_vlarb_high are assumed to be correctly set and can > segfault opensm. > > The attached patch fixes these up. Obviously there's tons of ways to > do this. I decided to ... > > A) only initialization qos_options to the real defaults > > B) init all qos_*_options to sentinel values (-1, NULL, etc.) to > indicate it should use the configured defaults if they aren't set by the > user. The high_limit was changed from an unsigned to an int b/c 0 is a > valid high_limit value. > > C) verify that the default qos inputs are definitely correct (i.e. can't > be NULL). Reset to hard coded defaults if need be. > > D) load the default vs. non-default appropriately in QoS. > > Al > > P.S. This patch does not rely on my previous "remove qos_max_vls > config" patch. I assume we're keeping the max_vls fields in this patch. > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-fix-qos-config-parsing-bugs.patch Type: text/x-patch Size: 21153 bytes Desc: not available URL: From chu11 at llnl.gov Mon Nov 10 13:41:04 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 10 Nov 2008 13:41:04 -0800 Subject: [ofa-general] [opensm patch][1/2] fix qos config parsing bugs In-Reply-To: <1226351775.13603.30.camel@cardanus.llnl.gov> References: <1225404078.1197.533.camel@cardanus.llnl.gov> <1226351775.13603.30.camel@cardanus.llnl.gov> Message-ID: <1226353264.13603.37.camel@cardanus.llnl.gov> On Mon, 2008-11-10 at 13:16 -0800, Al Chu wrote: > Hey Sasha, > > New patch w/ proper "signed off by" line. Argh. Repost, w/ right Author. Sorry. Al > Al > > On Thu, 2008-10-30 at 15:01 -0700, Al Chu wrote: > > Hey Sasha, > > > > I found a bunch of qos config parsing issues, listed below: > > > > 1) > > > > If the user sets the qos default fields (i.e. qos_high_limit, > > qos_vlarb_high. etc.), but do not have the qos_ca, qos_swe, qos_rtr, > > etc. equivalent fields listed (i.e. qos_ca_high_limit, > > qos_sw0_vlarb_high), the values set in teh qos default fields are not > > loaded into the CAs, switches, etc. The reason is in qos_build_config() > > we load defaults like this: > > > > p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; > > > > but we always set the fields to something non-NULL. > > > > static void subn_set_default_qos_options(IN osm_qos_options_t * opt) > > { > > opt->max_vls = OSM_DEFAULT_QOS_MAX_VLS; > > opt->high_limit = OSM_DEFAULT_QOS_HIGH_LIMIT; > > opt->vlarb_high = OSM_DEFAULT_QOS_VLARB_HIGH; > > opt->vlarb_low = OSM_DEFAULT_QOS_VLARB_LOW; > > opt->sl2vl = OSM_DEFAULT_QOS_SL2VL; > > } > > > > 2) > > > > In qos_build_config() we load the high_limit like this: > > > > cfg->vl_high_limit = (uint8_t) opt->high_limit; > > > > So there is no way to tell the qos_ca, qos_swe, qos_rtr, etc. high_limit > > options to "go back to" the default high_limit. It just assumes that > > whatever is input (or was set by default) is what you should use. > > > > 3) > > > > Some fields like qos_vlarb_high are assumed to be correctly set and can > > segfault opensm. > > > > The attached patch fixes these up. Obviously there's tons of ways to > > do this. I decided to ... > > > > A) only initialization qos_options to the real defaults > > > > B) init all qos_*_options to sentinel values (-1, NULL, etc.) to > > indicate it should use the configured defaults if they aren't set by the > > user. The high_limit was changed from an unsigned to an int b/c 0 is a > > valid high_limit value. > > > > C) verify that the default qos inputs are definitely correct (i.e. can't > > be NULL). Reset to hard coded defaults if need be. > > > > D) load the default vs. non-default appropriately in QoS. > > > > Al > > > > P.S. This patch does not rely on my previous "remove qos_max_vls > > config" patch. I assume we're keeping the max_vls fields in this patch. > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-fix-qos-config-parsing-bugs.patch Type: text/x-patch Size: 21156 bytes Desc: not available URL: From chu11 at llnl.gov Mon Nov 10 13:41:04 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 10 Nov 2008 13:41:04 -0800 Subject: [ofa-general] [opensm patch][1/2] fix qos config parsing bugs In-Reply-To: <1226351775.13603.30.camel@cardanus.llnl.gov> References: <1225404078.1197.533.camel@cardanus.llnl.gov> <1226351775.13603.30.camel@cardanus.llnl.gov> Message-ID: <1226353264.13603.37.camel@cardanus.llnl.gov> On Mon, 2008-11-10 at 13:16 -0800, Al Chu wrote: > Hey Sasha, > > New patch w/ proper "signed off by" line. Argh. Repost, w/ right Author. Sorry. Al > Al > > On Thu, 2008-10-30 at 15:01 -0700, Al Chu wrote: > > Hey Sasha, > > > > I found a bunch of qos config parsing issues, listed below: > > > > 1) > > > > If the user sets the qos default fields (i.e. qos_high_limit, > > qos_vlarb_high. etc.), but do not have the qos_ca, qos_swe, qos_rtr, > > etc. equivalent fields listed (i.e. qos_ca_high_limit, > > qos_sw0_vlarb_high), the values set in teh qos default fields are not > > loaded into the CAs, switches, etc. The reason is in qos_build_config() > > we load defaults like this: > > > > p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; > > > > but we always set the fields to something non-NULL. > > > > static void subn_set_default_qos_options(IN osm_qos_options_t * opt) > > { > > opt->max_vls = OSM_DEFAULT_QOS_MAX_VLS; > > opt->high_limit = OSM_DEFAULT_QOS_HIGH_LIMIT; > > opt->vlarb_high = OSM_DEFAULT_QOS_VLARB_HIGH; > > opt->vlarb_low = OSM_DEFAULT_QOS_VLARB_LOW; > > opt->sl2vl = OSM_DEFAULT_QOS_SL2VL; > > } > > > > 2) > > > > In qos_build_config() we load the high_limit like this: > > > > cfg->vl_high_limit = (uint8_t) opt->high_limit; > > > > So there is no way to tell the qos_ca, qos_swe, qos_rtr, etc. high_limit > > options to "go back to" the default high_limit. It just assumes that > > whatever is input (or was set by default) is what you should use. > > > > 3) > > > > Some fields like qos_vlarb_high are assumed to be correctly set and can > > segfault opensm. > > > > The attached patch fixes these up. Obviously there's tons of ways to > > do this. I decided to ... > > > > A) only initialization qos_options to the real defaults > > > > B) init all qos_*_options to sentinel values (-1, NULL, etc.) to > > indicate it should use the configured defaults if they aren't set by the > > user. The high_limit was changed from an unsigned to an int b/c 0 is a > > valid high_limit value. > > > > C) verify that the default qos inputs are definitely correct (i.e. can't > > be NULL). Reset to hard coded defaults if need be. > > > > D) load the default vs. non-default appropriately in QoS. > > > > Al > > > > P.S. This patch does not rely on my previous "remove qos_max_vls > > config" patch. I assume we're keeping the max_vls fields in this patch. > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-fix-qos-config-parsing-bugs.patch Type: text/x-patch Size: 21156 bytes Desc: not available URL: From chu11 at llnl.gov Mon Nov 10 13:41:13 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 10 Nov 2008 13:41:13 -0800 Subject: [ofa-general] Re: [opensm patch][2/2] verify config inputs when config file is rescanned In-Reply-To: <1226351730.13603.27.camel@cardanus.llnl.gov> References: <1225404081.1197.534.camel@cardanus.llnl.gov> <20081110210233.GE3467@sashak.voltaire.com> <1226351730.13603.27.camel@cardanus.llnl.gov> Message-ID: <1226353273.13603.39.camel@cardanus.llnl.gov> Hey Sasha, Sorry, repost, w/ the right Author. Al On Mon, 2008-11-10 at 13:15 -0800, Al Chu wrote: > On Mon, 2008-11-10 at 23:02 +0200, Sasha Khapyorsky wrote: > > Hi Al, > > > > On 15:01 Thu 30 Oct , Al Chu wrote: > > > Hey Sasha, > > > > > > I noticed that after the config file is rescanned, the new potential > > > inputs aren't checked for validity. Patch is attached. > > > > > > Al > > > > > > -- > > > Albert Chu > > > chu11 at llnl.gov > > > Computer Scientist > > > High Performance Systems Division > > > Lawrence Livermore National Laboratory > > > > > From edfcd2de96c3525d1609b4c0f03c17ecc0495c18 Mon Sep 17 00:00:00 2001 > > > From: root > > > Date: Thu, 30 Oct 2008 13:58:55 -0700 > > > Subject: [PATCH] verify rescanned config input > > > > > > > > > Signed-off-by: root > > ^^^^^^^^^^^^^^^^^^^^^^^^ > > > > I'm fine with this patch, but could you fix S-O-B line? Thanks. > > Oops. New one is attached (I'll repost the [1/2] patch too). > > Al > > > Sasha > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-verify-rescanned-config-input.patch Type: text/x-patch Size: 1050 bytes Desc: not available URL: From chu11 at llnl.gov Mon Nov 10 13:42:31 2008 From: chu11 at llnl.gov (Al Chu) Date: Mon, 10 Nov 2008 13:42:31 -0800 Subject: [ofa-general] Re: [opensm patch] support dump_conf command in opensm console In-Reply-To: <1226351033.13603.23.camel@cardanus.llnl.gov> References: <1225759191.7307.9.camel@cardanus.llnl.gov> <20081109172518.GG30588@sashak.voltaire.com> <1226338962.13603.21.camel@cardanus.llnl.gov> <1226351033.13603.23.camel@cardanus.llnl.gov> Message-ID: <1226353351.13603.42.camel@cardanus.llnl.gov> Hey Sasha, Sorry. Repost patch w/ the right Author. Al On Mon, 2008-11-10 at 13:03 -0800, Al Chu wrote: > Hey Sasha, > > Attached is the re-worked patch. Assumes changes from my "fix qos > config parsing bugs" patch are accepted. > > Al > > On Mon, 2008-11-10 at 09:42 -0800, Al Chu wrote: > > Hey Sasha, > > > > On Sun, 2008-11-09 at 19:25 +0200, Sasha Khapyorsky wrote: > > > Hi Al, > > > > > > On 16:39 Mon 03 Nov , Al Chu wrote: > > > > Hey Sasha, > > > > > > > > When config files are rescanned and loaded, there's no way to know if > > > > the right configuration was actually reloaded or not. A console command > > > > to dump the current config is a useful way to verify the loading of new > > > > configs or not. > > > > > > > > This patch assumes the fixes from my "fix qos config parsing bugs" is > > > > accepted. > > > > > > Didn't pass over it, sorry about delay. > > > > > > > > > > > Al > > > > > > > > -- > > > > Albert Chu > > > > chu11 at llnl.gov > > > > Computer Scientist > > > > High Performance Systems Division > > > > Lawrence Livermore National Laboratory > > > > > > > From 249607e47ec7ef1b92f9578cece90460418d12b8 Mon Sep 17 00:00:00 2001 > > > > From: Albert Chu > > > > Date: Mon, 3 Nov 2008 16:22:29 -0800 > > > > Subject: [PATCH] support dump_conf console command > > > > > > > > > > > > Signed-off-by: Albert Chu > > > > --- > > > > opensm/opensm/osm_console.c | 158 +++++++++++++++++++++++++++++++++++++++++++ > > > > 1 files changed, 158 insertions(+), 0 deletions(-) > > > > > > > > diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c > > > > index d9bbbc2..8422655 100644 > > > > --- a/opensm/opensm/osm_console.c > > > > +++ b/opensm/opensm/osm_console.c > > > > @@ -53,6 +53,10 @@ > > > > #include > > > > #include > > > > > > > > +#define NULL_STR "(null)" > > > > + > > > > +#define BOOLEAN_STR(__b) ((__b) ? "TRUE" : "FALSE") > > > > + > > > > struct command { > > > > char *name; > > > > void (*help_function) (FILE * out, int detail); > > > > @@ -189,6 +193,14 @@ static void help_lidbalance(FILE * out, int detail) > > > > } > > > > } > > > > > > > > +static void help_dump_conf(FILE *out, int detail) > > > > +{ > > > > + fprintf(out, "dump_conf\n"); > > > > + if (detail) { > > > > + fprintf(out, "dump current opensm configuration\n"); > > > > + } > > > > +} > > > > + > > > > #ifdef ENABLE_OSM_PERF_MGR > > > > static void help_perfmgr(FILE * out, int detail) > > > > { > > > > @@ -1136,6 +1148,151 @@ static void perfmgr_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) > > > > } > > > > #endif /* ENABLE_OSM_PERF_MGR */ > > > > > > > > +static void dump_qos_options(osm_qos_options_t * opt, > > > > + osm_qos_options_t * dflt, > > > > + char *prefix, > > > > + FILE * out) > > > > +{ > > > > + fprintf(out, "%s_max_vls : %u\n", > > > > + prefix, opt->max_vls ? opt->max_vls : dflt->max_vls); > > > > + fprintf(out, "%s_high_limit : %u\n", > > > > + prefix, opt->high_limit >= 0 ? (unsigned)opt->high_limit : (unsigned)dflt->high_limit); > > > > + fprintf(out, "%s_vlarb_high : %s\n", > > > > + prefix, opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high); > > > > + fprintf(out, "%s_vlarb_low : %s\n", > > > > + prefix, opt->vlarb_low ? opt->vlarb_low : dflt->vlarb_low); > > > > + fprintf(out, "%s_sl2vl : %s\n", > > > > + prefix, opt->sl2vl ? opt->sl2vl : dflt->sl2vl); > > > > +} > > > > + > > > > +static void dump_conf_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) > > > > +{ > > > > > > Why to not use osm_subn_write_conf_file() function (wrapped by > > > dump_conf_parse())? I think we need to have config dumping code > > > consolidated. > > > > I had thought of that, but I didn't want all of the instructions and all > > the extra lines of output. But I guess it's not that big of a deal in > > the end. I'll send a new patch. > > > > Al > > > > > Sasha > > > > > > > + osm_subn_opt_t * opt = &p_osm->subn.opt; > > > > + > > > > + fprintf(out, "config_file : %s\n", > > > > + opt->config_file ? opt->config_file : NULL_STR); > > > > + fprintf(out, "guid : 0x%016" PRIx64 "\n", opt->guid); > > > > + fprintf(out, "m_key : 0x%016" PRIx64 "\n", opt->m_key); > > > > + fprintf(out, "sm_key : 0x%016" PRIx64 "\n", opt->sm_key); > > > > + fprintf(out, "sa_key : 0x%016" PRIx64 "\n", opt->sa_key); > > > > + fprintf(out, "subnet_prefix : 0x%016" PRIx64 "\n", opt->subnet_prefix); > > > > + fprintf(out, "m_key_lease_period : %u\n", opt->m_key_lease_period); > > > > + fprintf(out, "sweep_interval : %u\n", opt->sweep_interval); > > > > + fprintf(out, "max_wire_smps : %u\n", opt->max_wire_smps); > > > > + fprintf(out, "transaction_timeout : %u\n", opt->transaction_timeout); > > > > + fprintf(out, "sm_priority : %u\n", opt->sm_priority); > > > > + fprintf(out, "lmc : %u\n", opt->lmc); > > > > + fprintf(out, "lmc_esp0 : %s\n", > > > > + BOOLEAN_STR(opt->lmc_esp0)); > > > > + fprintf(out, "max_op_vls : %u\n", opt->max_op_vls); > > > > + fprintf(out, "force_link_speed : %u\n", opt->force_link_speed); > > > > + fprintf(out, "reassign_lids : %s\n", > > > > + BOOLEAN_STR(opt->reassign_lids)); > > > > + fprintf(out, "ignore_other_sm : %s\n", > > > > + BOOLEAN_STR(opt->ignore_other_sm)); > > > > + fprintf(out, "single_thread : %s\n", > > > > + BOOLEAN_STR(opt->single_thread)); > > > > + fprintf(out, "disable_multicast : %s\n", > > > > + BOOLEAN_STR(opt->disable_multicast)); > > > > + fprintf(out, "force_log_flush : %s\n", > > > > + BOOLEAN_STR(opt->force_log_flush)); > > > > + fprintf(out, "subnet_timeout : %u\n", opt->subnet_timeout); > > > > + fprintf(out, "packet_life_time : %u\n", opt->packet_life_time); > > > > + fprintf(out, "vl_stall_count : %u\n", opt->vl_stall_count); > > > > + fprintf(out, "leaf_vl_stall_count : %u\n", opt->leaf_vl_stall_count); > > > > + fprintf(out, "head_of_queue_lifetime : %u\n", opt->head_of_queue_lifetime); > > > > + fprintf(out, "leaf_head_of_queue_lifetime : %u\n", opt->leaf_head_of_queue_lifetime); > > > > + fprintf(out, "local_phy_errors_threshold : %u\n", opt->local_phy_errors_threshold); > > > > + fprintf(out, "overrun_errors_threshold : %u\n", opt->overrun_errors_threshold); > > > > + fprintf(out, "sminfo_polling_timeout : %u\n", opt->sminfo_polling_timeout); > > > > + fprintf(out, "polling_retry_number : %u\n", opt->polling_retry_number); > > > > + fprintf(out, "max_msg_fifo_timeout : %u\n", opt->max_msg_fifo_timeout); > > > > + fprintf(out, "force_heavy_sweep : %s\n", > > > > + BOOLEAN_STR(opt->force_heavy_sweep)); > > > > + fprintf(out, "log_flags : 0x%02x\n", opt->log_flags); > > > > + fprintf(out, "dump_files_dir : %s\n", > > > > + opt->dump_files_dir ? opt->dump_files_dir : NULL_STR); > > > > + fprintf(out, "log_file : %s\n", > > > > + opt->log_file ? opt->log_file : NULL_STR); > > > > + fprintf(out, "log_max_size : %lu\n", opt->log_max_size); > > > > + fprintf(out, "partition_config_file : %s\n", > > > > + opt->partition_config_file ? opt->partition_config_file : NULL_STR); > > > > + fprintf(out, "no_partition_enforcement : %s\n", > > > > + BOOLEAN_STR(opt->no_partition_enforcement)); > > > > + fprintf(out, "qos : %s\n", > > > > + BOOLEAN_STR(opt->qos)); > > > > + fprintf(out, "qos_policy_file : %s\n", > > > > + opt->qos_policy_file ? opt->qos_policy_file : NULL_STR); > > > > + fprintf(out, "accum_log_file: %s\n", > > > > + BOOLEAN_STR(opt->accum_log_file)); > > > > + fprintf(out, "console : %s\n", > > > > + opt->console ? opt->console : NULL_STR); > > > > + fprintf(out, "console_port : %u\n", opt->console_port); > > > > + fprintf(out, "port_prof_ignore_file : %s\n", > > > > + opt->port_prof_ignore_file ? opt->port_prof_ignore_file : NULL_STR); > > > > + fprintf(out, "port_profile_switch_nodes : %s\n", > > > > + BOOLEAN_STR(opt->port_profile_switch_nodes)); > > > > + fprintf(out, "sweep_on_trap : %s\n", > > > > + BOOLEAN_STR(opt->sweep_on_trap)); > > > > + fprintf(out, "routing_engine_names : %s\n", > > > > + opt->routing_engine_names ? opt->routing_engine_names : NULL_STR); > > > > + fprintf(out, "use_ucast_cache : %s\n", > > > > + BOOLEAN_STR(opt->use_ucast_cache)); > > > > + fprintf(out, "connect_roots : %s\n", > > > > + BOOLEAN_STR(opt->connect_roots)); > > > > + fprintf(out, "lid_matrix_dump_file : %s\n", > > > > + opt->lid_matrix_dump_file ? opt->lid_matrix_dump_file : NULL_STR); > > > > + fprintf(out, "lfts_file : %s\n", > > > > + opt->lfts_file ? opt->lfts_file : NULL_STR); > > > > + fprintf(out, "root_guid_file : %s\n", > > > > + opt->root_guid_file ? opt->root_guid_file : NULL_STR); > > > > + fprintf(out, "cn_guid_file : %s\n", > > > > + opt->cn_guid_file ? opt->cn_guid_file : NULL_STR); > > > > + fprintf(out, "ids_guid_file : %s\n", > > > > + opt->ids_guid_file ? opt->ids_guid_file : NULL_STR); > > > > + fprintf(out, "guid_routing_order_file : %s\n", > > > > + opt->guid_routing_order_file ? opt->guid_routing_order_file : NULL_STR); > > > > + fprintf(out, "sa_db_file : %s\n", > > > > + opt->sa_db_file ? opt->sa_db_file : NULL_STR); > > > > + fprintf(out, "exit_on_fatal : %s\n", > > > > + BOOLEAN_STR(opt->exit_on_fatal)); > > > > + fprintf(out, "honor_guid2lid_file : %s\n", > > > > + BOOLEAN_STR(opt->honor_guid2lid_file)); > > > > + fprintf(out, "daemon : %s\n", > > > > + BOOLEAN_STR(opt->daemon)); > > > > + fprintf(out, "sm_inactive : %s\n", > > > > + BOOLEAN_STR(opt->sm_inactive)); > > > > + fprintf(out, "babbling_port_policy : %s\n", > > > > + BOOLEAN_STR(opt->babbling_port_policy)); > > > > + dump_qos_options(&opt->qos_options, &opt->qos_options, "qos", out); > > > > + dump_qos_options(&opt->qos_ca_options, &opt->qos_options, "qos_ca", out); > > > > + dump_qos_options(&opt->qos_sw0_options, &opt->qos_options, "qos_sw0", out); > > > > + dump_qos_options(&opt->qos_swe_options, &opt->qos_options, "qos_swe", out); > > > > + dump_qos_options(&opt->qos_rtr_options, &opt->qos_options, "qos_rtr", out); > > > > + fprintf(out, "enable_quirks : %s\n", > > > > + BOOLEAN_STR(opt->enable_quirks)); > > > > + fprintf(out, "no_clients_rereg : %s\n", > > > > + BOOLEAN_STR(opt->no_clients_rereg)); > > > > +#ifdef ENABLE_OSM_PERF_MGR > > > > + fprintf(out, "perfmgr : %s\n", > > > > + BOOLEAN_STR(opt->perfmgr)); > > > > + fprintf(out, "perfmgr_redir : %s\n", > > > > + BOOLEAN_STR(opt->perfmgr_redir)); > > > > + fprintf(out, "perfmgr_sweep_time_s : %u\n", opt->perfmgr_sweep_time_s); > > > > + fprintf(out, "perfmgr_max_outstanding_queries : %u\n", opt->perfmgr_max_outstanding_queries); > > > > + fprintf(out, "event_db_dump_file : %s\n", > > > > + opt->event_db_dump_file ? opt->event_db_dump_file : NULL_STR); > > > > +#endif > > > > + fprintf(out, "event_plugin_name : %s\n", > > > > + opt->event_plugin_name ? opt->event_plugin_name : NULL_STR); > > > > + fprintf(out, "node_name_map_name : %s\n", > > > > + opt->node_name_map_name ? opt->node_name_map_name : NULL_STR); > > > > + fprintf(out, "prefix_routes_file : %s\n", > > > > + opt->prefix_routes_file ? opt->prefix_routes_file : NULL_STR); > > > > + fprintf(out, "consolidate_ipv6_snm_req : %s\n", > > > > + BOOLEAN_STR(opt->consolidate_ipv6_snm_req)); > > > > +} > > > > + > > > > static void quit_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) > > > > { > > > > osm_console_exit(&p_osm->console, &p_osm->log); > > > > @@ -1166,6 +1323,7 @@ static const struct command console_cmds[] = { > > > > {"portstatus", &help_portstatus, &portstatus_parse}, > > > > {"switchbalance", &help_switchbalance, &switchbalance_parse}, > > > > {"lidbalance", &help_lidbalance, &lidbalance_parse}, > > > > + {"dump_conf", &help_dump_conf, &dump_conf_parse}, > > > > {"version", &help_version, &version_parse}, > > > > #ifdef ENABLE_OSM_PERF_MGR > > > > {"perfmgr", &help_perfmgr, &perfmgr_parse}, > > > > -- > > > > 1.5.4.5 > > > > > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0003-support-dump_conf-console-command.patch Type: text/x-patch Size: 11772 bytes Desc: not available URL: From rpearson at systemfabricworks.com Mon Nov 10 13:47:58 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Mon, 10 Nov 2008 15:47:58 -0600 Subject: [ofa-general] opensm support for toroidal meshes Message-ID: <000501c9437d$ffa7cd90$fef768b0$@com> We have been involved in a project to deliver a large system based on a toroidal mesh fabric. One of the requirements for this system is to be able to guarantee a deadlock free routing of the fabric. The lash routing engine in opensm did not work in this case because required number of VLs for the machine as configured was 12 which exceeded the number of VLs supported by Mellanox switch ASICs. It turns out that if one has the freedom to reorder the order of the port assignments used by lash optimally that lash can successfully route the fabric but that is impractical in the hardware. The attached note describes an algorithm for automatically recognizing when a Cartesian mesh fabric is a torus, determining its size and optimally reordering the ports in opensm so that lash can generate a route with the smallest number of VLs. We have implemented a set of changes to opensm that implement this algorithm and will submit the changes as patches. This note will help to understand the code. -------------- next part -------------- A non-text attachment was scrubbed... Name: lash_changes.doc Type: application/msword Size: 411136 bytes Desc: not available URL: From rdreier at cisco.com Mon Nov 10 14:37:08 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 10 Nov 2008 14:37:08 -0800 Subject: [ofa-general] Higher than usual latency (new baby) Message-ID: Hi everyone, My wife gave birth to a son on November 6. Everyone is healthy and doing well. But for obvious reasons you should expect me to be a lot less responsive than usual for the next few weeks. Thanks, Roland From pradeeps at linux.vnet.ibm.com Mon Nov 10 15:30:34 2008 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Mon, 10 Nov 2008 15:30:34 -0800 Subject: [ofa-general] [PATCH] ipoib: null tx/rx_ring skb pointers on free In-Reply-To: <4916B318.50503@voltaire.com> References: <20081106012307.GP31163@sgi.com> <200811061712.50605.jackm@dev.mellanox.co.il> <20081106164005.GS31163@sgi.com> <49147107.2090600@linux.vnet.ibm.com> <4916B318.50503@voltaire.com> Message-ID: <4918C41A.9060609@linux.vnet.ibm.com> Or Gerlitz wrote: > Pradeep Satyanarayana wrote: >> If I am not mistaken we saw a problem that showed similar >> characteristics more than two years ago on IBM platforms. The same >> issue of rx_ring reusing tx_ring skbs and so on and would show up only >> under stress. This was with UD mode (before CM came into the picture) >> and it turned out to be a driver issue. > Can you send pointer to the relevant thread / commit that solved this > issue? Or, Even though I searched in the archives could not locate that particular one. I know that Nam submitted the patch and it was in the June/July 2006 time frame. It was a missing read memory barrier in the ehca driver. I am copying him so that he might help. Pradeep From rpearson at systemfabricworks.com Mon Nov 10 22:44:49 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Tue, 11 Nov 2008 00:44:49 -0600 Subject: [ofa-general] [PATCH] opensm: skeleton for toroidal mesh analysis Message-ID: <000001c943c8$fef921f0$fceb65d0$@com> Sasha, Here is the first patch in a series to implement the algorithm described in the file lash_changes.doc. This patch - creates a new command line flag --do_mesh_analysis and a new Boolean that is set if the flag is used. - adds code to main to implement the flag and option. - creates a new file osm_mesh.c to hold the algorithm code - moves declarations from osm_ucast_lash.c and osm_mesh.c into header files - adds these files to Makefile.am - adds a stub do_mesh_analysis() that is called from lash_core. Signed-off-by: Bob Pearson ----- diff --git a/opensm/include/opensm/osm_mesh.h b/opensm/include/opensm/osm_mesh.h new file mode 100644 index 0000000..1467440 --- /dev/null +++ b/opensm/include/opensm/osm_mesh.h @@ -0,0 +1,46 @@ +/* + * Copyright (c) 2088 System Fabric Works, Inc. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +/* + * Abstract: + * Declarations for mesh analysis + */ + +#ifndef OSM_UCAST_MESH_H +#define OSM_UCAST_MESH_H + +struct _lash; + +int do_mesh_analysis(struct _lash *p_lash); + +#endif diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index 7259587..2abe36d 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -215,6 +215,7 @@ typedef struct osm_subn_opt { char *node_name_map_name; char *prefix_routes_file; boolean_t consolidate_ipv6_snm_req; + boolean_t do_mesh_analysis; } osm_subn_opt_t; /* * FIELDS diff --git a/opensm/include/opensm/osm_ucast_lash.h b/opensm/include/opensm/osm_ucast_lash.h new file mode 100644 index 0000000..646e9a3 --- /dev/null +++ b/opensm/include/opensm/osm_ucast_lash.h @@ -0,0 +1,100 @@ +/* + * Copyright (c) 2008 System Fabric Works, Inc. + * Copyright (c) 2004-2007 Voltaire, Inc. All rights reserved. + * Copyright (c) 2002-2006 Mellanox Technologies LTD. All rights reserved. + * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. + * Copyright (c) 2007 Simula Research Laboratory. All rights reserved. + * Copyright (c) 2007 Silicon Graphics Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +/* + * Abstract: + * Declarations for LASH algorithm + */ + +#ifndef OSM_UCAST_LASH_H +#define OSM_UCAST_LASH_H + +enum { + UNQUEUED, + Q_MEMBER, + MST_MEMBER, + MAX_INT = 9999, + NONE = MAX_INT +}; + +typedef struct _cdg_vertex { + int num_dependencies; + struct _cdg_vertex **dependency; + int from; + int to; + int seen; + int temp; + int visiting_number; + struct _cdg_vertex *next; + int num_temp_depend; + int num_using_vertex; + int *num_using_this_depend; +} cdg_vertex_t; + +typedef struct _reachable_dest { + int switch_id; + struct _reachable_dest *next; +} reachable_dest_t; + +typedef struct _switch { + osm_switch_t *p_sw; + int *dij_channels; + int id; + int used_channels; + int q_state; + struct routing_table { + unsigned out_link; + unsigned lane; + } *routing_table; + unsigned int num_connections; + int *virtual_physical_port_table; + int *phys_connections; +} switch_t; + +typedef struct _lash { + osm_opensm_t *p_osm; + int num_switches; + uint8_t vl_min; + int balance_limit; + switch_t **switches; + cdg_vertex_t ****cdg_vertex_matrix; + int *num_mst_in_lane; + int ***virtual_location; +} lash_t; + +#endif diff --git a/opensm/opensm/Makefile.am b/opensm/opensm/Makefile.am index 01573d2..7b9da18 100644 --- a/opensm/opensm/Makefile.am +++ b/opensm/opensm/Makefile.am @@ -31,7 +31,7 @@ opensm_SOURCES = main.c osm_console_io.c osm_console.c osm_db_files.c \ osm_inform.c osm_lid_mgr.c osm_lin_fwd_rcv.c \ osm_link_mgr.c osm_mcast_fwd_rcv.c \ osm_mcast_mgr.c osm_mcast_tbl.c osm_mcm_info.c \ - osm_mcm_port.c osm_mtree.c osm_multicast.c osm_node.c \ + osm_mcm_port.c osm_mesh.c osm_mtree.c osm_multicast.c osm_node.c \ osm_node_desc_rcv.c osm_node_info_rcv.c \ osm_opensm.c osm_pkey.c osm_pkey_mgr.c osm_pkey_rcv.c \ osm_port.c osm_port_info_rcv.c \ @@ -76,6 +76,7 @@ opensminclude_HEADERS = \ $(srcdir)/../include/opensm/osm_errors.h \ $(srcdir)/../include/opensm/osm_helper.h \ $(srcdir)/../include/opensm/osm_inform.h \ + $(srcdir)/../include/opensm/osm_ucast_lash.h \ $(srcdir)/../include/opensm/osm_lid_mgr.h \ $(srcdir)/../include/opensm/osm_log.h \ $(srcdir)/../include/opensm/osm_mad_pool.h \ @@ -83,6 +84,7 @@ opensminclude_HEADERS = \ $(srcdir)/../include/opensm/osm_mcast_tbl.h \ $(srcdir)/../include/opensm/osm_mcm_info.h \ $(srcdir)/../include/opensm/osm_mcm_port.h \ + $(srcdir)/../include/opensm/osm_mesh.h \ $(srcdir)/../include/opensm/osm_mtree.h \ $(srcdir)/../include/opensm/osm_multicast.h \ $(srcdir)/../include/opensm/osm_msgdef.h \ diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index 53648d6..63bd5a6 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -585,6 +585,7 @@ int main(int argc, char *argv[]) #endif {"prefix_routes_file", 1, NULL, 3}, {"consolidate_ipv6_snm_req", 0, NULL, 4}, + {"do_mesh_analysis", 0, NULL, 5}, {NULL, 0, NULL, 0} /* Required at the end of the array */ }; @@ -922,6 +923,9 @@ int main(int argc, char *argv[]) case 4: opt.consolidate_ipv6_snm_req = TRUE; break; + case 5: + opt.do_mesh_analysis = TRUE; + break; case 'h': case '?': case ':': diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c new file mode 100644 index 0000000..7943274 --- /dev/null +++ b/opensm/opensm/osm_mesh.c @@ -0,0 +1,65 @@ +/* + * Copyright (c) 2008 System Fabric Works, Inc. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +/* + * Abstract: + * routines to analyze certain meshes + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include + +/* + * do_mesh_analysis + */ +int do_mesh_analysis(lash_t *p_lash) +{ + int ret = 0; + osm_log_t *p_log = &p_lash->p_osm->log; + + OSM_LOG_ENTER(p_log); + + printf("lash: do_mesh_analysis stub called\n"); + + OSM_LOG_EXIT(p_log); + + return ret; +} diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c index c082798..e10371c 100644 --- a/opensm/opensm/osm_ucast_lash.c +++ b/opensm/opensm/osm_ucast_lash.c @@ -52,64 +52,13 @@ #include #include #include +#include +#include /* //////////////////////////// */ /* Local types */ /* //////////////////////////// */ -enum { - UNQUEUED, - Q_MEMBER, - MST_MEMBER, - MAX_INT = 9999, - NONE = MAX_INT -}; - -typedef struct _cdg_vertex { - int num_dependencies; - struct _cdg_vertex **dependency; - int from; - int to; - int seen; - int temp; - int visiting_number; - struct _cdg_vertex *next; - int num_temp_depend; - int num_using_vertex; - int *num_using_this_depend; -} cdg_vertex_t; - -typedef struct _reachable_dest { - int switch_id; - struct _reachable_dest *next; -} reachable_dest_t; - -typedef struct _switch { - osm_switch_t *p_sw; - int *dij_channels; - int id; - int used_channels; - int q_state; - struct routing_table { - unsigned out_link; - unsigned lane; - } *routing_table; - unsigned int num_connections; - int *virtual_physical_port_table; - int *phys_connections; -} switch_t; - -typedef struct _lash { - osm_opensm_t *p_osm; - int num_switches; - uint8_t vl_min; - int balance_limit; - switch_t **switches; - cdg_vertex_t ****cdg_vertex_matrix; - int *num_mst_in_lane; - int ***virtual_location; -} lash_t; - static cdg_vertex_t *create_cdg_vertex(unsigned num_switches) { cdg_vertex_t *cdg_vertex = (cdg_vertex_t *) malloc(sizeof(cdg_vertex_t)); @@ -872,10 +821,15 @@ static int lash_core(lash_t * p_lash) int output_link2, i_next_switch2; int cycle_found2 = 0; int status = 0; - int *switch_bitmap; /* Bitmap to check if we have processed this pair */ + int *switch_bitmap = NULL; /* Bitmap to check if we have processed this pair */ OSM_LOG_ENTER(p_log); + if (p_lash->p_osm->subn.opt.do_mesh_analysis && do_mesh_analysis(p_lash)) { + OSM_LOG(p_log, OSM_LOG_ERROR, "Mesh analysis failed\n"); + goto Exit; + } + for (i = 0; i < num_switches; i++) { shortest_path(p_lash, i); From rpearson at systemfabricworks.com Mon Nov 10 23:26:32 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Tue, 11 Nov 2008 01:26:32 -0600 Subject: [ofa-general] {PATCH] [2] opensm: per mesh data Message-ID: <000101c943ce$d2707880$77516980$@com> Sasha, Here is the second patch implementing the mesh analysis algorithm. This patch: - creates a data structure, mesh_t, that holds per mesh information - adds a pointer to this structure in lash_t - creates methods to allocate and free memory for mesh_t - adds osm_ prefix to global routine names (oops) - calls create and cleanup methods Regards, Bob Pearson Signed-off-by: Bob Pearson ---- diff --git a/opensm/include/opensm/osm_mesh.h b/opensm/include/opensm/osm_mesh.h index 1467440..8313614 100644 --- a/opensm/include/opensm/osm_mesh.h +++ b/opensm/include/opensm/osm_mesh.h @@ -41,6 +41,18 @@ struct _lash; -int do_mesh_analysis(struct _lash *p_lash); +/* + * per fabric mesh info + */ +typedef struct _mesh { + int num_class; /* number of switch classes */ + int *class_type; /* index of first switch found for each class */ + int *class_count; /* population of each class */ + int dimension; /* mesh dimension */ + int *size; /* an array to hold size of mesh */ +} mesh_t; + +void osm_mesh_cleanup(struct _lash *p_lash); +int osm_do_mesh_analysis(struct _lash *p_lash); #endif diff --git a/opensm/include/opensm/osm_ucast_lash.h b/opensm/include/opensm/osm_ucast_lash.h index 646e9a3..1ae3bb6 100644 --- a/opensm/include/opensm/osm_ucast_lash.h +++ b/opensm/include/opensm/osm_ucast_lash.h @@ -95,6 +95,7 @@ typedef struct _lash { cdg_vertex_t ****cdg_vertex_matrix; int *num_mst_in_lane; int ***virtual_location; + mesh_t *mesh; } lash_t; #endif diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c index 7943274..c97925b 100644 --- a/opensm/opensm/osm_mesh.c +++ b/opensm/opensm/osm_mesh.c @@ -41,6 +41,7 @@ #endif /* HAVE_CONFIG_H */ #include +#include #include #include #include @@ -48,15 +49,72 @@ #include /* + * osm_mesh_cleanup - free per mesh resources + */ +void osm_mesh_cleanup(lash_t *p_lash) +{ + mesh_t *mesh = p_lash->mesh; + + if (mesh) { + if (mesh->class_type) + free(mesh->class_type); + + if (mesh->class_count) + free(mesh->class_count); + + free(mesh); + + p_lash->mesh = NULL; + } +} + +/* + * mesh_create - allocate per mesh resources + */ +static int mesh_create(lash_t *p_lash) +{ + osm_log_t *p_log = &p_lash->p_osm->log; + mesh_t *mesh; + + if(!(mesh = p_lash->mesh = calloc(1, sizeof(mesh_t)))) { + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating mesh - out of memory\n"); + return -1; + } + + if (!(mesh->class_type = calloc(p_lash->num_switches, sizeof(int)))) { + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating mesh->class_type - out of memory\n"); + free(mesh); + return -1; + } + + if (!(mesh->class_count = calloc(p_lash->num_switches, sizeof(int)))) { + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating mesh->class_count - out of memory\n"); + free(mesh->class_type); + free(mesh); + return -1; + } + + return 0; +} + +/* * do_mesh_analysis */ -int do_mesh_analysis(lash_t *p_lash) +int osm_do_mesh_analysis(lash_t *p_lash) { int ret = 0; osm_log_t *p_log = &p_lash->p_osm->log; OSM_LOG_ENTER(p_log); + /* + * allocate per mesh data structures + */ + if (mesh_create(p_lash)) { + OSM_LOG_EXIT(p_log); + return -1; + } + printf("lash: do_mesh_analysis stub called\n"); OSM_LOG_EXIT(p_log); diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c index e10371c..3577cca 100644 --- a/opensm/opensm/osm_ucast_lash.c +++ b/opensm/opensm/osm_ucast_lash.c @@ -825,7 +825,7 @@ static int lash_core(lash_t * p_lash) OSM_LOG_ENTER(p_log); - if (p_lash->p_osm->subn.opt.do_mesh_analysis && do_mesh_analysis(p_lash)) { + if (p_lash->p_osm->subn.opt.do_mesh_analysis && osm_do_mesh_analysis(p_lash)) { OSM_LOG(p_log, OSM_LOG_ERROR, "Mesh analysis failed\n"); goto Exit; } @@ -1124,6 +1124,8 @@ static void lash_cleanup(lash_t * p_lash) free(p_lash->switches); } p_lash->switches = NULL; + + osm_mesh_cleanup(p_lash); } /* From rpearson at systemfabricworks.com Tue Nov 11 00:06:03 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Tue, 11 Nov 2008 02:06:03 -0600 Subject: [ofa-general] [PATCH][3] opensm: per mesh node information Message-ID: <000501c943d4$57b3f8f0$071bead0$@com> Sasha, This is the third patch implementing the mesh analysis algorithm This patch - creates per mesh node (e.g. switch) data structure mesh_node_t - adds a pointer to mesh_node_t in the switch_t structure - implements create and cleanup methods for node_t - calls these in switch_create and swich_delete in *lash.c Regards, Bob Pearson Signed-off-by: Bob Pearson ---- diff --git a/opensm/include/opensm/osm_mesh.h b/opensm/include/opensm/osm_mesh.h index 8313614..78af086 100644 --- a/opensm/include/opensm/osm_mesh.h +++ b/opensm/include/opensm/osm_mesh.h @@ -40,6 +40,39 @@ #define OSM_UCAST_MESH_H struct _lash; +struct _switch; + +enum mesh_node_type { + mesh_type_none, + mesh_type_cartesian, +}; + +/* + * per switch to switch link info + */ +typedef struct _link { + int switch_id; + int link_id; + int *ports; + int num_ports; + int next_port; +} link_t; + +/* + * per switch node mesh info + */ +typedef struct _mesh_node { + unsigned int num_links; /* number of 'links' to adjacent switches */ + link_t **links; /* per link information */ + int *axes; /* used to hold and reorder assigned axes */ + int *coord; /* mesh coordinates of switch */ + int **matrix; /* distances between adjacent switches */ + int *poly; /* characteristic polynomial of matrix */ + /* used as an invariant classification */ + enum mesh_node_type type; + int dimension; /* apparent dimension of mesh around node */ + int temp; /* temporary holder for distance info */ +} mesh_node_t; /* * per fabric mesh info @@ -55,4 +88,7 @@ typedef struct _mesh { void osm_mesh_cleanup(struct _lash *p_lash); int osm_do_mesh_analysis(struct _lash *p_lash); +void osm_mesh_node_cleanup(struct _switch *sw); +int osm_mesh_node_create(struct _lash *p_lash, struct _switch *sw); + #endif diff --git a/opensm/include/opensm/osm_ucast_lash.h b/opensm/include/opensm/osm_ucast_lash.h index 1ae3bb6..c037571 100644 --- a/opensm/include/opensm/osm_ucast_lash.h +++ b/opensm/include/opensm/osm_ucast_lash.h @@ -81,6 +81,7 @@ typedef struct _switch { unsigned out_link; unsigned lane; } *routing_table; + mesh_node_t *node; unsigned int num_connections; int *virtual_physical_port_table; int *phys_connections; diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c index c97925b..6ef397c 100644 --- a/opensm/opensm/osm_mesh.c +++ b/opensm/opensm/osm_mesh.c @@ -98,7 +98,7 @@ static int mesh_create(lash_t *p_lash) } /* - * do_mesh_analysis + * osm_do_mesh_analysis */ int osm_do_mesh_analysis(lash_t *p_lash) { @@ -121,3 +121,83 @@ int osm_do_mesh_analysis(lash_t *p_lash) return ret; } + +/* + * osm_mesh_node_cleanup - cleanup per switch resources + */ +void osm_mesh_node_cleanup(switch_t *sw) +{ + int i; + mesh_node_t *node = sw->node; + unsigned num_ports = sw->p_sw->num_ports; + + if (node) { + if (node->links) { + for (i = 0; i < num_ports; i++) { + if (node->links[i]) { + if (node->links[i]->ports) + free(node->links[i]->ports); + free(node->links[i]); + } + } + free(node->links); + } + + if (node->poly) + free(node->poly); + + if (node->matrix) { + for (i = 0; i < node->num_links; i++) { + if (node->matrix[i]) + free(node->matrix[i]); + } + free(node->matrix); + } + + if (node->axes) + free(node->axes); + + free(node); + + sw->node = NULL; + } +} + +/* + * osm_mesh_node_create - allocate per switch resources + */ +int osm_mesh_node_create(lash_t *p_lash, switch_t *sw) +{ + osm_log_t *p_log = &p_lash->p_osm->log; + int i; + mesh_node_t *node; + unsigned num_ports = sw->p_sw->num_ports; + + if (!(node = sw->node = calloc(1, sizeof(mesh_node_t)))) { + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating mesh node - out of memory\n"); + return -1; + } + + if (!(node->links = calloc(num_ports, sizeof(link_t *)))) + goto err; + + for (i = 0; i < num_ports; i++) { + if (!(node->links[i] = calloc(1, sizeof(link_t))) || + !(node->links[i]->ports = calloc(num_ports, sizeof(int)))) + goto err; + } + + if (!(node->axes = calloc(num_ports, sizeof(int)))) + goto err; + + for (i = 0; i < num_ports; i++) { + node->links[i]->switch_id = NONE; + } + + return 0; + +err: + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating mesh node - out of memory\n"); + osm_mesh_node_cleanup(sw); + return -1; +} diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c index 3577cca..b9394af 100644 --- a/opensm/opensm/osm_ucast_lash.c +++ b/opensm/opensm/osm_ucast_lash.c @@ -651,6 +651,9 @@ static switch_t *switch_create(lash_t * p_lash, unsigned id, osm_switch_t * p_sw sw->phys_connections[i] = NONE; } + if (osm_mesh_node_create(p_lash, sw)) + return -1; + sw->p_sw = p_sw; if (p_sw) p_sw->priv = sw; @@ -660,6 +663,8 @@ static switch_t *switch_create(lash_t * p_lash, unsigned id, osm_switch_t * p_sw static void switch_delete(switch_t * sw) { + osm_mesh_node_cleanup(sw); + if (sw->dij_channels) free(sw->dij_channels); if (sw->virtual_physical_port_table) From tziporet at mellanox.co.il Tue Nov 11 01:02:45 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 11 Nov 2008 11:02:45 +0200 Subject: [ofa-general] OFED Nov 10 2008 meeting minutes on OFED 1.4 status In-Reply-To: <458BC6B0F287034F92FE78908BD01CE84EF33FC0@mtlexch01.mtl.com> Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD0FE6FE@mtlexch01.mtl.com> OFED Nov 10 2008 meeting minutes on OFED 1.4 status: Meeting minutes on the web: http://www.openfabrics.org/txt/documentation/linux/EWG_meeting_minutes/ Meeting Summary: ============== * RC4 will be released on Tuesday Nov 11 (today) * RC5 will be released next week on Monday Nov 17 * GA is planned for Nov 24 * All must send update for the documents and release notes for RC5 * We may need a short meeting to follow up on RC5 and the GA release next Monday - stay tuned Details: ======= * Since there are some critical bugs we must have RC5. * We do not wish to delay the release to Dec. thus we all must focus on these bugs and do everything to resolve on time Bugs to be fixed in RC5: 1323 blo stefan.roscher at de.ibm.com IB/ehca: possibility of kernel panic under certain circu... - in rc4 1370 blo vlad at mellanox.co.il Ping over IPoIB I/F fails after ifconfig down and up - ongoing 1364 cri swise at opengridcomputing.com system hang on rmmod cxgb3 in rhel4.7 - Steve please update 1365 cri swise at opengridcomputing.com Panic on loading iw_cxgb3 in RHEL 4.6 - Steve please update 1366 cri swise at opengridcomputing.com Panic during boot-up after an OFED install in RHEL 4.5 - Steve please update 1242 cri yannick.cote at qlogic.com kernel panic while running mpi2007 against ofed1.4 -- ib_... - ongoing 1289 maj amirv at mellanox.co.il Ib and ipoib doesnt respond while running multiple tests ... -ongoing 1349 maj amirv at mellanox.co.il Kernel panic on sdp - ongoing 1336 maj vlad at mellanox.co.il Can't to unloading the mlx4_ib module on ppc64 - we will try to reproduce Tziporet From FENKES at de.ibm.com Tue Nov 11 01:04:04 2008 From: FENKES at de.ibm.com (Joachim Fenkes) Date: Tue, 11 Nov 2008 10:04:04 +0100 Subject: [ofa-general] Re: [PATCH] IB/ehca: Fix suppression of port activation events In-Reply-To: References: <200806061835.43802.fenkes@de.ibm.com> <48499C11.7030504@gmail.com> <200811071742.51867.fenkes@de.ibm.com> Message-ID: Roland Dreier wrote on 10.11.2008 21:36:23: > > A previous fix introduced a regression where port activation events were > > dropped unconditionally if port autodetection was not enabled. Fixed. > > Is this a fix to "IB/ehca: Remove reference to special QP in case of > port activation failure"? Because if so I can roll it into that patch, > since Linus hasn't pulled it yet. Yes, that would be splendid, thank you! Cheers, Joachim From vlad at lists.openfabrics.org Tue Nov 11 03:20:07 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Tue, 11 Nov 2008 03:20:07 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081111-0200 daily build status Message-ID: <20081111112007.D76FEE60BD5@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From sashak at voltaire.com Tue Nov 11 04:08:43 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 11 Nov 2008 14:08:43 +0200 Subject: [ofa-general] Re: [PATCH] opensm: fix iser service-id used for SL assignment In-Reply-To: References: Message-ID: <20081111120843.GC3927@sashak.voltaire.com> On 14:57 Thu 06 Nov , Or Gerlitz wrote: > RFC3720 says: > > The well-known user TCP port number for iSCSI connections assigned by IANA is 3260 > and this is the default iSCSI port. Implementations needing a system TCP port number > may use port 860, the port assigned by IANA as the iSCSI system port; however in > order to use port 860, it MUST be explicitly specified - implementations MUST NOT > default to use of port 860, as 3260 is the only allowed default. > > Hence the SID used by iser is 0x0000000001060CBC and not 0x000000000106035C > > Signed-off-by: Or Gerlitz > Signed-off-by: Eli Dorfman Applied. Thanks. Sasha From kliteyn at dev.mellanox.co.il Tue Nov 11 04:08:22 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 11 Nov 2008 14:08:22 +0200 Subject: [ofa-general] [PATCH] opensm/Makefile.am: install QoS_management_in_OpenSM.txt Message-ID: <491975B6.4070105@dev.mellanox.co.il> Hi Sasha, Following the patch from yesterday - adding QoS_management_in_OpenSM.txt to tarball. Signed-off-by: Yevgeny Kliteynik --- opensm/Makefile.am | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/opensm/Makefile.am b/opensm/Makefile.am index f8b66b3..02c693d 100644 --- a/opensm/Makefile.am +++ b/opensm/Makefile.am @@ -21,7 +21,7 @@ endif man_MANS = man/opensm.8 man/osmtest.8 various_scripts = $(wildcard scripts/*) -docs = doc/performance-manager-HOWTO.txt +docs = doc/performance-manager-HOWTO.txt doc/QoS_management_in_OpenSM.txt EXTRA_DIST = autogen.sh opensm.spec $(various_scripts) $(man_MANS) $(docs) -- 1.5.1.4 From sashak at voltaire.com Tue Nov 11 04:16:54 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 11 Nov 2008 14:16:54 +0200 Subject: [ofa-general] Re: [PATCH] opensm/Makefile.am: install QoS_management_in_OpenSM.txt In-Reply-To: <491975B6.4070105@dev.mellanox.co.il> References: <491975B6.4070105@dev.mellanox.co.il> Message-ID: <20081111121654.GF3927@sashak.voltaire.com> On 14:08 Tue 11 Nov , Yevgeny Kliteynik wrote: > Hi Sasha, > > Following the patch from yesterday - adding > QoS_management_in_OpenSM.txt to tarball. > > Signed-off-by: Yevgeny Kliteynik Nice finding. Applied. Thanks. Sasha > --- > opensm/Makefile.am | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/opensm/Makefile.am b/opensm/Makefile.am > index f8b66b3..02c693d 100644 > --- a/opensm/Makefile.am > +++ b/opensm/Makefile.am > @@ -21,7 +21,7 @@ endif > man_MANS = man/opensm.8 man/osmtest.8 > > various_scripts = $(wildcard scripts/*) > -docs = doc/performance-manager-HOWTO.txt > +docs = doc/performance-manager-HOWTO.txt doc/QoS_management_in_OpenSM.txt > > EXTRA_DIST = autogen.sh opensm.spec $(various_scripts) $(man_MANS) $(docs) > > -- > 1.5.1.4 > From halr at obsidianresearch.com Tue Nov 11 06:23:34 2008 From: halr at obsidianresearch.com (Hal Rosenstock) Date: Tue, 11 Nov 2008 07:23:34 -0700 Subject: [ofa-general] [PATCH] OpenSM/osm_subnet.c: Fix log_max_size conversion to MB Message-ID: <49199566.2010505@obsidianresearch.com> Sasha, This patch fixes the conversion of log_max_size to MB introduced by commit 9954ead20c84586c6daaec5a1fba835eda0b4738 It does not address the overflow issues introduced by that change though. -- Hal -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch-osm-logfilesize1 URL: From vlad at mellanox.co.il Tue Nov 11 07:27:46 2008 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 11 Nov 2008 17:27:46 +0200 Subject: [ofa-general] OFED-1.4-rc4 is available Message-ID: <1226417266.18330.77.camel@vlad-laptop> Hi, OFED-1.4-rc4 release is available on http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-rc4.tgz To get BUILD_ID run ofed_info Please report any issues in bugzilla https://bugs.openfabrics.org/ for OFED 1.4 Vladimir & Tziporet ======================================================================== Release information: ------------------------------ Linux Operating Systems: - RedHat EL4 up4: 2.6.9-42.ELsmp * - RedHat EL4 up5: 2.6.9-55.ELsmp - RedHat EL4 up6: 2.6.9-67.ELsmp - RedHat EL4 up7: 2.6.9-78.ELsmp - RedHat EL5: 2.6.18-8.el5 - RedHat EL5 up1: 2.6.18-53.el5 - RedHat EL5 up2: 2.6.18-92.el5 - CentOS 5.2: 2.6.18-92.el5 - Fedora C9: 2.6.25-14.fc9 * - SLES10: 2.6.16.21-0.8-smp - SLES10 SP1: 2.6.16.46-0.12-smp - SLES10 SP1 up1: 2.6.16.53-0.16-smp - SLES10 SP2: 2.6.16.60-0.21-smp - OpenSuSE 10.3: 2.6.22.5-31 * - kernel.org: 2.6.26 and 2.6.27 * Minimal QA for these versions Systems: * x86_64 * x86 * ia64 * ppc64 Main Changes from OFED-1.4-rc3 ============================== - Updated MPI packages: mvapich-1.1.0-3128, mvapich2-1.2-1 - Updated bonding package: ib-bonding-0.9.0-33 - Updated uDAPL: compat-dapl-1.2.12-1, dapl-2.0.15-1 - Updated management packages: opensm-3.2.3, infiniband-diags-1.4.2, libibcommon-1.1.2, libibmad-1.2.2, libibumad-1.2.2 - NFS-RDMA to work on 2.6.26 and 2.6.27 - Cleanup compilation warning - 46 bugs fixed (see attached for details) - Kernel git tree changes: Tasks that should be completed for the rc5: ================================ 1. High priority bug fixes 2. Documentation update -------------- next part -------------- A non-text attachment was scrubbed... Name: ofed-1.4-rc4-fixed-bugs.csv Type: text/csv Size: 5834 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ofed_kernel-1.4-rc3_rc4.log Type: text/x-log Size: 40345 bytes Desc: not available URL: From Thomas.Talpey at netapp.com Tue Nov 11 08:02:18 2008 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Tue, 11 Nov 2008 11:02:18 -0500 Subject: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ? In-Reply-To: <7391130E01ED404FBD7A3C86731EEB7D20ECAB457D@GVW1087EXB.amer icas.hpqcorp.net> References: <7391130E01ED404FBD7A3C86731EEB7D20EC0F8737@GVW1087EXB.americas.hpqcorp.net> <49160618.3050409@nasa.gov> <7391130E01ED404FBD7A3C86731EEB7D20ECAB457D@GVW1087EXB.americas.hpqcorp.net> Message-ID: At 11:27 AM 11/10/2008, Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote: >That's great, thanks. > >I ran some tests with the 2.6.27 kernel as server and client, and >basically it works fine. > >I could not find yet any situation where NFS-RDMA would outperform >NFS/IPoIB, at least when you compare apples to apples (same clients, >same server, same protocol, and not just write to/read from the >caches), and it even seems to have severe performance issues for >reading with files larger than the memory size of the client and the server. >Hopefully this will improve when more users will be able to give >valuable feedback... I have a couple of questions, and perhaps suggestions as well. First the questions... - Have you tried with a 2.6.28-rc4 client and server at all? There are a number of significant NFS/RDMA improvements queued in kernel.org, especially around RDMA memory registration as well as RDMA operation scheduling. We've seen some significant throughput improvement even for basic tunings. - What type of storage are you using at the server, and have you attempted to tune the server at all? For example, if you are storage (spindle) limited, no network tuning is likely to help and you should address that first. Also, there are tunings such as nfsd thread count, export options, and adapter choice that can make a large difference. Bottom line, you should be able to reach multi-hundred-MB/sec of read/write throughput with NFS/RDMA, but there may be issues on specific systems, or perhaps with the OFED1.4 code, that need to be accounted for. If possible, you may want to set expectations based on mainline, then try to duplicate them in the OFED backport. The current OFED NFS/RDMA support is still evolving, while we consider the mainline kernel.org version to be rather solid. Tom. > >Fred. > >-----Original Message----- >From: Jeff Becker [mailto:Jeffrey.C.Becker at nasa.gov] >Sent: Saturday, 08 November, 2008 22:35 >To: Ciesielski, Frederic (EMEA HPC&OSLO CC) >Cc: general at lists.openfabrics.org >Subject: Re: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ? > >Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote: >> Is there any chance that the new NFS-RDMA features coming with OFED >> 1.4 work with standard and current distributions, like RHEL5, SLES10 ? >Not yet, but I'm working on it. I intend for NFSRDMA to work on 2.6.27 >and 2.6.26 for OFED 1.4. The RHEL5 and SLES10 backports will likely be >done for OFED 1.4.1. Thanks. > >-jeff > >> Did anybody test this, or would pretend it is supposed to work ? >> >> I mean without building a 2.6.27 or equivalent kernel on top of it, >> keeping almost full support from the vendors. >> >> Enhanced kernel modules may not be sufficient to work around the >> limitations of old kernels... >> >> >> From rpearson at systemfabricworks.com Tue Nov 11 08:44:50 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Tue, 11 Nov 2008 10:44:50 -0600 Subject: [ofa-general] [PATCH][4] opensm: vector and matrix utilities Message-ID: <003201c9441c$d23ce8f0$76b6bad0$@com> Sasha, Here is the fourth patch in a series implementing the mesh analysis algorithm. This patch implements - create and cleanup methods for polynomial with integer coefficients - create and cleanup methods for square matrix with integer coefficients - create and cleanup methods for square matrix with polynomial coefficients - routine to compute the determinant of a matrix with polynomial coefficients (Note the determinant is restricted to computing the characteristic polynomial) Regards, Bob Pearson Signed-off-by: Bob Pearson ---- diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c index 6ef397c..5dee1d0 100644 --- a/opensm/opensm/osm_mesh.c +++ b/opensm/opensm/osm_mesh.c @@ -49,6 +49,295 @@ #include /* + * poly_alloc + * + * allocate a polynomial of degree n + */ +static int *poly_alloc(lash_t *p_lash, int n) +{ + osm_log_t *p_log = &p_lash->p_osm->log; + int *p; + + if (!(p = calloc(n+1, sizeof(int)))) { + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating poly - out of memory\n"); + } + + return p; +} + +/* + * poly_diff + * + * return a nonzero value if polynomials differ else 0 + */ +static int poly_diff(int n, int *p, switch_t *s) +{ + int i; + + if (s->node->num_links != n) + return 1; + + for (i = 0; i <= n; i++) { + if (s->node->poly[i] != p[i]) + return 1; + } + + return 0; +} + +/* + * m_free + * + * free a square matrix of rank l + */ +static void m_free(int **m, int l) +{ + int i; + + if (m) { + for (i = 0; i < l; i++) { + if (m[i]) + free(m[i]); + } + free(m); + } +} + +/* + * m_alloc + * + * allocate a square matrix of rank l + */ +static int **m_alloc(lash_t *p_lash, int l) +{ + osm_log_t *p_log = &p_lash->p_osm->log; + int i; + int **m = NULL; + + do { + if (!(m = calloc(l, sizeof(int *)))) + break; + + for (i = 0; i < l; i++) { + if (!(m[i] = calloc(l, sizeof(int)))) + break; + } + if (i != l) + break; + + return m; + } while(0); + + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating matrix - out of memory\n"); + + m_free(m, l); + return NULL; +} + +/* + * pm_free + * + * free a square matrix of rank l of polynomials + */ +static void pm_free(int ***m, int l) +{ + int i, j; + + if (m) { + for (i = 0; i < l; i++) { + if (m[i]) { + for (j = 0; j < l; j++) { + if (m[i][j]) + free(m[i][j]); + } + free(m[i]); + } + } + free(m); + } +} + +/* + * pm_alloc + * + * allocate a square matrix of rank l of polynomials of degree n + */ +static int ***pm_alloc(lash_t *p_lash, int l, int n) +{ + osm_log_t *p_log = &p_lash->p_osm->log; + int i, j; + int ***m = NULL; + + do { + if (!(m = calloc(l, sizeof(int **)))) + break; + + for (i = 0; i < l; i++) { + if (!(m[i] = calloc(l, sizeof(int *)))) + break; + + for (j = 0; j < l; j++) { + if (!(m[i][j] = calloc(n+1, sizeof(int)))) + break; + } + if (j != l) + break; + } + if (i != l) + break; + + return m; + } while(0); + + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating matrix - out of memory\n"); + + pm_free(m, l); + return NULL; +} + +static int determinant(lash_t *p_lash, int n, int rank, int ***m, int *p); + +/* + * sub_determinant + * + * compute the determinant of a submatrix of matrix of rank l of polynomials of degree n + * with row and col removed in poly. caller must free poly + */ +static int sub_determinant(lash_t *p_lash, int n, int l, int row, int col, int ***matrix, int **poly) +{ + int ret = -1; + int ***m = NULL; + int *p = NULL; + int i, j, k, x, y; + int rank = l - 1; + + do { + if (!(p = poly_alloc(p_lash, n))) { + break; + } + + if (rank <= 0) { + p[0] = 1; + ret = 0; + break; + } + + if (!(m = pm_alloc(p_lash, rank, n))) { + free(p); + p = NULL; + break; + } + + x = 0; + for (i = 0; i < l; i++) { + if (i == row) + continue; + + y = 0; + for (j = 0; j < l; j++) { + if (j == col) + continue; + + for (k = 0; k <= n; k++) + m[x][y][k] = matrix[i][j][k]; + + y++; + } + x++; + } + + if (determinant(p_lash, n, rank, m, p)) { + free(p); + p = NULL; + break; + } + + ret = 0; + } while(0); + + pm_free(m, rank); + *poly = p; + return ret; +} + +/* + * determinant + * + * compute the determinant of matrix m of rank of polynomials of degree deg + * and add the result to polynomial p allocated by caller + */ +static int determinant(lash_t *p_lash, int deg, int rank, int ***m, int *p) +{ + int i, j, k; + int *q; + int sign = 1; + + /* + * handle simple case of 1x1 matrix + */ + if (rank == 1) { + for (i = 0; i <= deg; i++) + p[i] += m[0][0][i]; + } + + /* + * handle simple case of 2x2 matrix + */ + else if (rank == 2) { + for (i = 0; i <= deg; i++) { + if (m[0][0][i] == 0) + continue; + + for (j = 0; j <= deg; j++) { + if (m[1][1][j] == 0) + continue; + + p[i+j] += m[0][0][i]*m[1][1][j]; + } + } + + for (i = 0; i <= deg; i++) { + if (m[0][1][i] == 0) + continue; + + for (j = 0; j <= deg; j++) { + if (m[1][0][j] == 0) + continue; + + p[i+j] -= m[0][1][i]*m[1][0][j]; + } + } + } + + /* + * handle the general case + */ + else { + for (i = 0; i < rank; i++) { + if (sub_determinant(p_lash, deg, rank, 0, i, m, &q)) + return -1; + + for (j = 0; j <= deg; j++) { + if (m[0][i][j] == 0) + continue; + + for (k = 0; k <= deg; k++) { + if (q[k] == 0) + continue; + + p[j+k] += sign*m[0][i][j]*q[k]; + } + } + + free(q); + sign = -sign; + } + } + + return 0; +} + +/* * osm_mesh_cleanup - free per mesh resources */ void osm_mesh_cleanup(lash_t *p_lash) From rpearson at systemfabricworks.com Tue Nov 11 08:59:58 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Tue, 11 Nov 2008 10:59:58 -0600 Subject: [ofa-general] [PATCH][5] opensm: compute local geometry Message-ID: <003301c9441e$eed2f480$cc78dd80$@com> Sasha, Here is the fifth patch implementing the mesh analysis algorithm. This patch implements - routine to compute characteristics polynomial of a matrix - routine to compute the local 'metric' around each switch - routine to classify switches into a histogram of local geometry classes Regards, Bob Pearson Signed-off-by: Bob Pearson ---- diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c index 7434fee..9254de3 100644 --- a/opensm/opensm/osm_mesh.c +++ b/opensm/opensm/osm_mesh.c @@ -338,6 +338,172 @@ static int determinant(lash_t *p_lash, int deg, int rank, int ***m, int *p) } /* + * char_poly + * + * compute the characteristic polynomial of matrix of rank + * by computing the determinant of m-x*I and return in poly + * as an array. caller must free poly + */ +static int char_poly(lash_t *p_lash, int rank, int **matrix, int **poly) +{ + int ret = -1; + int i, j; + int ***m = NULL; + int *p = NULL; + int deg = rank; + + do { + if (!(p = poly_alloc(p_lash, deg))) { + break; + } + + if (!(m = pm_alloc(p_lash, rank, deg))) { + free(p); + p = NULL; + break; + } + + for (i = 0; i < rank; i++) { + for (j = 0; j < rank; j++) { + m[i][j][0] = matrix[i][j]; + } + m[i][i][1] = -1; + } + + if (determinant(p_lash, deg, rank, m, p)) { + free(p); + p = NULL; + break; + } + + ret = 0; + } while(0); + + pm_free(m, rank); + *poly = p; + return ret; +} + +/* + * get_switch_metric + * + * compute the matrix of minimum distances between each of + * the adjacent switch nodes to sw along paths + * that do not go through sw. do calculation by + * relaxation method + * allocate space for the matrix and save in node_t structure + */ +static int get_switch_metric(lash_t *p_lash, int sw) +{ + int ret = -1; + int i, j, change; + int sw1, sw2, sw3; + switch_t *s = p_lash->switches[sw]; + switch_t *s1, *s2, *s3; + int **m; + mesh_node_t *node = s->node; + int num_links = node->num_links; + + do { + if (!(m = m_alloc(p_lash, num_links))) + break; + + for (i = 0; i < num_links; i++) { + sw1 = node->links[i]->switch_id; + s1 = p_lash->switches[sw1]; + + /* make all distances big except s1 to itself */ + for (sw2 = 0; sw2 < p_lash->num_switches; sw2++) + p_lash->switches[sw2]->node->temp = 0x7fffffff; + + s1->node->temp = 0; + + do { + change = 0; + + for (sw2 = 0; sw2 < p_lash->num_switches; sw2++) { + s2 = p_lash->switches[sw2]; + if (s2->node->temp == 0x7fffffff) + continue; + for (j = 0; j < s2->node->num_links; j++) { + sw3 = s2->node->links[j]->switch_id; + s3 = p_lash->switches[sw3]; + + if (sw3 == sw) + continue; + + if ((s2->node->temp + 1) < s3->node->temp) { + s3->node->temp = s2->node->temp + 1; + change++; + } + } + } + } while(change); + + for (j = 0; j < num_links; j++) { + sw2 = node->links[j]->switch_id; + s2 = p_lash->switches[sw2]; + m[i][j] = s2->node->temp; + } + } + + if (char_poly(p_lash, num_links, m, &node->poly)) { + m_free(m, num_links); + m = NULL; + break; + } + + ret = 0; + } while(0); + + node->matrix = m; + return ret; +} + +/* + * classify_switch + * + * add switch to histogram of switch types + */ +static void classify_switch(lash_t *p_lash, int sw) +{ + int i; + switch_t *s = p_lash->switches[sw]; + switch_t *s1; + mesh_t *mesh = p_lash->mesh; + + for (i = 0; i < mesh->num_class; i++) { + s1 = p_lash->switches[mesh->class_type[i]]; + + if (poly_diff(s->node->num_links, s->node->poly, s1)) + continue; + + mesh->class_count[i]++; + return; + } + + mesh->class_type[mesh->num_class] = sw; + mesh->class_count[mesh->num_class] = 1; + mesh->num_class++; + return; +} + +/* + * get_local_geometry + * + * analyze the local geometry around each switch + */ +static void get_local_geometry(lash_t *p_lash) +{ + int sw; + + for (sw = 0; sw < p_lash->num_switches; sw++) { + get_switch_metric(p_lash, sw); + classify_switch(p_lash, sw); + } +} + +/* * osm_mesh_cleanup - free per mesh resources */ void osm_mesh_cleanup(lash_t *p_lash) @@ -404,6 +570,12 @@ int osm_do_mesh_analysis(lash_t *p_lash) return -1; } + /* + * get local metric and invariant for each switch + * also classify each switch + */ + get_local_geometry(p_lash); + printf("lash: do_mesh_analysis stub called\n"); OSM_LOG_EXIT(p_log); From rpearson at systemfabricworks.com Tue Nov 11 09:10:31 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Tue, 11 Nov 2008 11:10:31 -0600 Subject: [ofa-general] ***SPAM*** Message-ID: <003701c94420$67840f80$368c2e80$@com> Sasha, Here is the sixth patch implementing the mesh analysis algorithm. This patch implements - a table of polynomials for all 2D and 3D regular Cartesian meshes - a routine to classify each switch based on the table Regards, Bob Pearson Signed-off-by: Bob Pearson ---- diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c index 9254de3..30d09c2 100644 --- a/opensm/opensm/osm_mesh.c +++ b/opensm/opensm/osm_mesh.c @@ -48,6 +48,76 @@ #include #include +#define MAX_DIMENSION (4) +#define MAX_DEGREE (10) + +/* + * characteristic polynomials for 2d and 3d regular tori + * since 4 == 2x2 we choose to take 2x2 + */ +struct _mesh_info { + int dimension; /* dimension of the torus */ + int size[MAX_DIMENSION]; /* size of the torus */ + int degree; /* degree of polynomial */ + int poly[MAX_DEGREE+1]; /* polynomial */ +} mesh_info[] = { + {0, {0}, 0, {0}, }, + + {2, {2, 2}, 2, {-4, 0, 1}, }, + {2, {3, 2}, 3, {8, 9, 0, -1}, }, + //{2, {4, 2}, 3, {16, 12, 0, -1}, }, + {2, {5, 2}, 3, {24, 17, 0, -1}, }, + {2, {6, 2}, 3, {32, 24, 0, -1}, }, + {2, {3, 3}, 4, {-15, -32, -18, 0, 1}, }, + //{2, {4, 3}, 4, {-28, -48, -21, 0, 1}, }, + {2, {5, 3}, 4, {-39, -64, -26, 0, 1}, }, + {2, {6, 3}, 4, {-48, -80, -33, 0, 1}, }, + //{2, {4, 4}, 4, {-48, -64, -24, 0, 1}, }, + //{2, {5, 4}, 4, {-60, -80, -29, 0, 1}, }, + //{2, {6, 4}, 4, {-64, -96, -36, 0, 1}, }, + {2, {5, 5}, 4, {-63, -96, -34, 0, 1}, }, + {2, {6, 5}, 4, {-48, -112, -41, 0, 1}, }, + {2, {6, 6}, 4, {0, -128, -48, 0, 1}, }, + + {3, {2, 2, 2}, 3, {16, 12, 0, -1}, }, + {3, {3, 2, 2}, 4, {-28, -48, -21, 0, 1}, }, + {3, {4, 2, 2}, 4, {-48, -64, -24, 0, 1}, }, + {3, {5, 2, 2}, 4, {-60, -80, -29, 0, 1}, }, + {3, {6, 2, 2}, 4, {-64, -96, -36, 0, 1}, }, + {3, {3, 3, 2}, 5, {48, 127, 112, 34, 0, -1}, }, + {3, {4, 3, 2}, 5, {80, 180, 136, 37, 0, -1}, }, + {3, {5, 3, 2}, 5, {96, 215, 160, 42, 0, -1}, }, + {3, {6, 3, 2}, 5, {96, 232, 184, 49, 0, -1}, }, + {3, {4, 4, 2}, 5, {128, 240, 160, 40, 0, -1}, }, + {3, {5, 4, 2}, 5, {144, 276, 184, 45, 0, -1}, }, + {3, {6, 4, 2}, 5, {128, 288, 208, 52, 0, -1}, }, + {3, {5, 5, 2}, 5, {144, 303, 208, 50, 0, -1}, }, + {3, {6, 5, 2}, 5, {96, 296, 232, 57, 0, -1}, }, + {3, {6, 6, 2}, 5, {0, 256, 256, 64, 0, -1}, }, + {3, {3, 3, 3}, 6, {-81, -288, -381, -224, -51, 0, 1}, }, + {3, {4, 3, 3}, 6, {-132, -416, -487, -256, -54, 0, 1}, }, + {3, {5, 3, 3}, 6, {-153, -480, -557, -288, -59, 0, 1}, }, + {3, {6, 3, 3}, 6, {-144, -480, -591, -320, -66, 0, 1}, }, + {3, {4, 4, 3}, 6, {-208, -576, -600, -288, -57, 0, 1}, }, + {3, {5, 4, 3}, 6, {-228, -640, -671, -320, -62, 0, 1}, }, + {3, {6, 4, 3}, 6, {-192, -608, -700, -352, -69, 0, 1}, }, + {3, {5, 5, 3}, 6, {-225, -672, -733, -352, -67, 0, 1}, }, + {3, {6, 5, 3}, 6, {-144, -576, -743, -384, -74, 0, 1}, }, + {3, {6, 6, 3}, 6, {0, -384, -720, -416, -81, 0, 1}, }, + {3, {4, 4, 4}, 6, {-320, -768, -720, -320, -60, 0, 1}, }, + {3, {5, 4, 4}, 6, {-336, -832, -792, -352, -65, 0, 1}, }, + {3, {6, 4, 4}, 6, {-256, -768, -816, -384, -72, 0, 1}, }, + {3, {5, 5, 4}, 6, {-324, -864, -855, -384, -70, 0, 1}, }, + {3, {6, 5, 4}, 6, {-192, -736, -860, -416, -77, 0, 1}, }, + {3, {6, 6, 4}, 6, {0, -512, -832, -448, -84, 0, 1}, }, + {3, {5, 5, 5}, 6, {-297, -864, -909, -416, -75, 0, 1}, }, + {3, {6, 5, 5}, 6, {-144, -672, -895, -448, -82, 0, 1}, }, + {3, {6, 6, 5}, 6, {0, -384, -848, -480, -89, 0, 1}, }, + {3, {6, 6, 6}, 6, {0, 0, -768, -512, -96, 0, 1}, }, + + {-1, {0,}, 0, {0, }, }, +}; + /* * poly_alloc * @@ -489,6 +559,30 @@ static void classify_switch(lash_t *p_lash, int sw) } /* + * classify_mesh_type + * + * try to look up node polynomial in table + */ +static void classify_mesh_type(lash_t *p_lash, int sw) +{ + int i; + switch_t *s = p_lash->switches[sw]; + struct _mesh_info *t; + + for (i = 1; (t = &mesh_info[i])->dimension != -1; i++) { + if (poly_diff(t->degree, t->poly, s)) + continue; + + s->node->type = i; + s->node->dimension = t->dimension; + return; + } + + s->node->type = 0; + return; +} + +/* * get_local_geometry * * analyze the local geometry around each switch @@ -500,6 +594,7 @@ static void get_local_geometry(lash_t *p_lash) for (sw = 0; sw < p_lash->num_switches; sw++) { get_switch_metric(p_lash, sw); classify_switch(p_lash, sw); + classify_mesh_type(p_lash, sw); } } From rpearson at systemfabricworks.com Tue Nov 11 09:28:40 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Tue, 11 Nov 2008 11:28:40 -0600 Subject: [ofa-general] [PATCH][7] opensm: build global geometry Message-ID: <004401c94422$f04684e0$d0d38ea0$@com> Sasha, Here is the seventh patch implementing the mesh analysis algorithm. This patch implements - routine to induce axes on mesh starting from seed node - code to report results of local analysis (should have been in patch6) Regards, Bob Pearson Signed-off-by: Bob Pearson ---- diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c index 30d09c2..65afae6 100644 --- a/opensm/opensm/osm_mesh.c +++ b/opensm/opensm/osm_mesh.c @@ -599,6 +599,239 @@ static void get_local_geometry(lash_t *p_lash) } /* + * seed_axes + * + * assign axes to the links of the seed switch + * assumes switch is of type cartesian mesh + * axes are numbered 1 to n i.e. +x => 1 -x => 2 etc. + * this assumes that if all distances are 2 that + * an axis has only 2 nodes so +A and -A collapse to +A + */ +static void seed_axes(lash_t *p_lash, int sw) +{ + mesh_node_t *node = p_lash->switches[sw]->node; + int n = node->num_links; + int i, j, c; + + for (c = 1; c <= 2*node->dimension; c++) { + /* + * find the next unassigned axis + */ + for (i = 0; i < n; i++) { + if (!node->axes[i]) + break; + } + + node->axes[i] = c++; + + /* + * find the matching opposite direction + */ + for (j = 0; j < n; j++) { + if (node->axes[j] || j == i) + continue; + + if (node->matrix[i][j] != 2) + break; + } + + if (j != n) { + node->axes[j] = c; + } + } +} + +/* + * opposite + * + * compute the opposite of axis for switch + */ +static inline int opposite(switch_t *s, int axis) +{ + int i, j; + int negaxis = 1 + (1 ^ (axis - 1)); + + for (i = 0; i < s->node->num_links; i++) { + if (s->node->axes[i] == axis) { + for (j = 0; j < s->node->num_links; j++) { + if (j == i) + continue; + if (s->node->matrix[i][j] != 2) + return negaxis; + } + + return axis; + } + } + + return 0; +} + +/* + * make_geometry + * + * induce a geometry on the switches + */ +static void make_geometry(lash_t *p_lash, int sw) +{ + osm_log_t *p_log = &p_lash->p_osm->log; + int num_switches = p_lash->num_switches; + int sw1, sw2; + switch_t *s, *s1, *s2, *seed; + int i, j, k, l, n, m; + int change; + + /* + * assign axes to seed switch + */ + seed_axes(p_lash, sw); + seed = p_lash->switches[sw]; + + /* + * induce axes in other switches until + * there is no more change + */ + do { + change = 0; + + /* phase 1 opposites */ + for (sw1 = 0; sw1 < num_switches; sw1++) { + s1 = p_lash->switches[sw1]; + n = s1->node->num_links; + + for (i = 0; i < n; i++) { + if (!s1->node->axes[i]) + continue; + + /* + * can't tell across if more than one + * likely looking link + */ + m = 0; + for (j = 0; j < n; j++) { + if (j == i) + continue; + + if (s1->node->matrix[i][j] != 2) + m++; + } + + if (m != 1) { + continue; + } + + for (j = 0; j < n; j++) { + if (j == i) + continue; + + if (s1->node->matrix[i][j] != 2) { + if (s1->node->axes[j]) { + if (s1->node->axes[j] != opposite(seed, s1->node->axes[i])) { + OSM_LOG(p_log, OSM_LOG_DEBUG, "phase 1 mismatch\n"); + } + } else { + s1->node->axes[j] = opposite(seed, s1->node->axes[i]); + change++; + } + } + } + } + } + + /* phase 2 switch to switch */ + for (sw1 = 0; sw1 < num_switches; sw1++) { + s1 = p_lash->switches[sw1]; + n = s1->node->num_links; + + for (i = 0; i < n; i++) { + int l2 = s1->node->links[i]->link_id; + + if (!s1->node->axes[i]) + continue; + + if (l2 == -1) { + printf("ERROR no reverse link\n"); + continue; + } + + sw2 = s1->node->links[i]->switch_id; + s2 = p_lash->switches[sw2]; + + if (!s2->node->axes[l2]) { + /* + * set axis to opposite of s1->axes[i] + */ + s2->node->axes[l2] = opposite(seed, s1->node->axes[i]); + change++; + } else { + if (s2->node->axes[l2] != opposite(seed, s1->node->axes[i])) { + OSM_LOG(p_log, OSM_LOG_DEBUG, "phase 2 mismatch\n"); + } + } + } + } + + /* Phase 3 corners */ + for (sw1 = 0; sw1 < num_switches; sw1++) { + s = p_lash->switches[sw1]; + n = s->node->num_links; + + for (i = 0; i < n; i++) { + if (!s->node->axes[i]) + continue; + + for (j = 0; j < n; j++) { + if (i == j || !s->node->axes[j] || s->node->matrix[i][j] != 2) + continue; + + s1 = p_lash->switches[s->node->links[i]->switch_id]; + s2 = p_lash->switches[s->node->links[j]->switch_id]; + + /* + * find switch (other than s1) that neighbors i and j + * have in common + */ + for (k = 0; k < s1->node->num_links; k++) { + if (s1->node->links[k]->switch_id == sw1) + continue; + + for (l = 0; l < s2->node->num_links; l++) { + if (s2->node->links[l]->switch_id == sw1) + continue; + + if (s1->node->links[k]->switch_id == s2->node->links[l]->switch_id) { + if (s1->node->axes[k]) { + if (s1->node->axes[k] != s->node->axes[j]) { + OSM_LOG(p_log, OSM_LOG_DEBUG, "phase 3 mismatch\n"); + } + } else { + s1->node->axes[k] = s->node->axes[j]; + change++; + } + + if (s2->node->axes[l]) { + if (s2->node->axes[l] != s->node->axes[i]) { + OSM_LOG(p_log, OSM_LOG_DEBUG, "phase 3 mismatch\n"); + } + } else { + s2->node->axes[l] = s->node->axes[i]; + change++; + } + goto next_j; + } + } + } +next_j: + ; + } + } + } + } while(change); + + return; +} + +/* * osm_mesh_cleanup - free per mesh resources */ void osm_mesh_cleanup(lash_t *p_lash) @@ -652,8 +885,13 @@ static int mesh_create(lash_t *p_lash) */ int osm_do_mesh_analysis(lash_t *p_lash) { - int ret = 0; osm_log_t *p_log = &p_lash->p_osm->log; + int max_class = -1; + int max_class_num = 0; + int max_class_type = -1; + int i; + mesh_t *mesh; + switch_t *s; OSM_LOG_ENTER(p_log); @@ -671,11 +909,43 @@ int osm_do_mesh_analysis(lash_t *p_lash) */ get_local_geometry(p_lash); - printf("lash: do_mesh_analysis stub called\n"); + /* + * find dominant switch class + */ + for (i = 0; i < mesh->num_class; i++) { + if (mesh->class_count[i] > max_class_num) { + max_class = i; + max_class_num = mesh->class_count[i]; + max_class_type = mesh->class_type[i]; + } + } + + s = p_lash->switches[max_class_type]; + + printf("lash: found %d node type%s\n", mesh->num_class, (mesh->num_class == 1)? "" : "s"); + printf("lash: %snode type is ", (mesh->num_class == 1)? "" : "most common "); + + if (s->node->type) { + struct _mesh_info *t = &mesh_info[s->node->type]; + + for (i = 0; i < t->dimension; i++) { + printf("%s%d%s", i? "X" : "", t->size[i], + (t->size[i] == 6)? "+" : ""); + } + printf(" mesh\n"); + + p_lash->mesh->dimension = t->dimension; + } else { + printf("unknown geometry\n"); + } + + if (s->node->type) { + make_geometry(p_lash, max_class_type); + } OSM_LOG_EXIT(p_log); - return ret; + return 0; } /* From sashak at voltaire.com Tue Nov 11 09:28:58 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 11 Nov 2008 19:28:58 +0200 Subject: [ofa-general] Re: [PATCH] OpenSM/osm_subnet.c: Fix log_max_size conversion to MB In-Reply-To: <49199566.2010505@obsidianresearch.com> References: <49199566.2010505@obsidianresearch.com> Message-ID: <20081111172858.GF30865@sashak.voltaire.com> On 07:23 Tue 11 Nov , Hal Rosenstock wrote: > Sasha, > > This patch fixes the conversion of log_max_size to MB introduced by commit > 9954ead20c84586c6daaec5a1fba835eda0b4738 > > It does not address the overflow issues introduced by that change though. > > -- Hal > OpenSM/osm_subnet.c: Convert log_max_size to MB > > Fixes commit 9954ead20c84586c6daaec5a1fba835eda0b4738 > which should preceed commit 12b0e65b2dd198c1764ffb23dd8d6572f0fac5e6 > > Signed-off-by: Hal Rosenstock Nice catch. Applied. Thanks. Sasha > > diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c > index 750bdc6..5447e95 100644 > --- a/opensm/opensm/osm_subnet.c > +++ b/opensm/opensm/osm_subnet.c > @@ -1278,7 +1278,7 @@ int osm_subn_parse_conf_file(char *file_name, osm_subn_opt_t * const p_opts) > opts_unpack_uint32("log_max_size", > p_key, p_val, > (void *) & p_opts->log_max_size); > - p_opts->log_max_size * 1024 *1024; /* convert to MB */ > + p_opts->log_max_size *= 1024 * 1024; /* convert to MB */ > > opts_unpack_charp("partition_config_file", > p_key, p_val, &p_opts->partition_config_file); From rpearson at systemfabricworks.com Tue Nov 11 09:37:14 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Tue, 11 Nov 2008 11:37:14 -0600 Subject: [ofa-general] [PATCH][8] opensm: measure size and reorder links Message-ID: <004501c94424$23551620$69ff4260$@com> Sasha, Here is the eighth patch implementing the mesh analysis algorithm. This patch implements - routine to reorder links and measure the size of the mesh Regards, Bob Pearson Signed-off-by: Bob Pearson ---- diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c index 65afae6..a248522 100644 --- a/opensm/opensm/osm_mesh.c +++ b/opensm/opensm/osm_mesh.c @@ -832,6 +832,183 @@ next_j: } /* + * return |a| < |b| + */ +static inline int ltmag(int a, int b) +{ + int a1 = (a >= 0)? a : -a; + int b1 = (b >= 0)? b : -b; + + return (a1 < b1) || (a1 == b1 && a > b); +} + +/* + * reorder_links + * + * reorder the links out of a switch in sign/dimension order + */ +static int reorder_links(lash_t *p_lash, int sw) +{ + osm_log_t *p_log = &p_lash->p_osm->log; + switch_t *s = p_lash->switches[sw]; + mesh_node_t *node = s->node; + int n = node->num_links; + link_t **links; + int *axes; + int i, j; + int c; + int next = 0; + + if (!(links = calloc(n, sizeof(link_t *)))) { + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating temp array - out of memory\n"); + return -1; + } + + if (!(axes = calloc(n, sizeof(int)))) { + free(links); + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating temp array - out of memory\n"); + return -1; + } + + /* + * find the links with axes + */ + for (j = 1; j <= 2*node->dimension; j++) { + c = j; + if (node->coord[(c-1)/2] > 0) + c = opposite(s, c); + + for (i = 0; i < n; i++) { + if (!node->links[i]) + continue; + if (node->axes[i] == c) { + links[next] = node->links[i]; + axes[next] = node->axes[i]; + node->links[i] = NULL; + next++; + } + } + } + + /* + * get the rest + */ + for (i = 0; i < n; i++) { + if (!node->links[i]) + continue; + + links[next] = node->links[i]; + axes[next] = node->axes[i]; + node->links[i] = NULL; + next++; + } + + for (i = 0; i < n; i++) { + node->links[i] = links[i]; + node->axes[i] = axes[i]; + } + + free(links); + free(axes); + + return 0; +} + +/* + * measure geometry + */ +static int measure_geometry(lash_t *p_lash, int seed) +{ + int i, j, k; + int sw; + switch_t *s, *s1; + int change; + int dimension = p_lash->mesh->dimension; + int num_switches = p_lash->num_switches; + int assigned_axes = 0, unassigned_axes = 0; + int *max, *min; + + for (sw = 0; sw < num_switches; sw++) { + s = p_lash->switches[sw]; + + s->node->coord = calloc(dimension, sizeof(int)); + for (i = 0; i < dimension; i++) + s->node->coord[i] = (sw == seed)? 0 : 0x7fffffff; + + for (i = 0; i < s->node->num_links; i++) + if (s->node->axes[i] == 0) + unassigned_axes++; + else + assigned_axes++; + } + + printf("lash: %d/%d unassigned/assigned axes\n", unassigned_axes, assigned_axes); + + do { + change = 0; + + for (sw = 0; sw < num_switches; sw++) { + s = p_lash->switches[sw]; + + if (s->node->coord[0] == 0x7fffffff) + continue; + + for (j = 0; j < s->node->num_links; j++) { + if (!s->node->axes[j]) + continue; + + s1 = p_lash->switches[s->node->links[j]->switch_id]; + + for (k = 0; k < dimension; k++) { + int coord = s->node->coord[k]; + int axis = s->node->axes[j] - 1; + + if (k == axis/2) + coord += (axis & 1)? -1 : +1; + + if (ltmag(coord, s1->node->coord[k])) { + s1->node->coord[k] = coord; + change++; + } + } + } + } + } while (change); + + for (sw = 0; sw < num_switches; sw++) { + if (reorder_links(p_lash, sw)) + return -1; + } + + max = calloc(dimension, sizeof(int)); + min = calloc(dimension, sizeof(int)); + p_lash->mesh->size = calloc(dimension, sizeof(int)); + + for (i = 0; i < dimension; i++) { + max[i] = -0x7fffffff; + min[i] = 0x7fffffff; + } + + for (sw = 0; sw < num_switches; sw++) { + s = p_lash->switches[sw]; + + for (i = 0; i < dimension; i++) { + if (s->node->coord[i] == 0x7fffffff) + continue; + if (s->node->coord[i] > max[i]) + max[i] = s->node->coord[i]; + if (s->node->coord[i] < min[i]) + min[i] = s->node->coord[i]; + } + } + + for (i = 0; i < dimension; i++) + p_lash->mesh->size[i] = max[i] - min[i] + 1; + + return 0; +} + +/* * osm_mesh_cleanup - free per mesh resources */ void osm_mesh_cleanup(lash_t *p_lash) @@ -941,6 +1118,14 @@ int osm_do_mesh_analysis(lash_t *p_lash) if (s->node->type) { make_geometry(p_lash, max_class_type); + + if (measure_geometry(p_lash, max_class_type)) + return -1; + + printf("lash: found "); + for (i = 0; i < mesh->dimension; i++) + printf("%s%d", i? "X" : "", mesh->size[i]); + printf(" mesh\n"); } OSM_LOG_EXIT(p_log); -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederic.ciesielski at hp.com Tue Nov 11 10:06:21 2008 From: frederic.ciesielski at hp.com (Ciesielski, Frederic (EMEA HPC&OSLO CC)) Date: Tue, 11 Nov 2008 18:06:21 +0000 Subject: FW: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ? Message-ID: <7391130E01ED404FBD7A3C86731EEB7D20ECAB4C0F@GVW1087EXB.americas.hpqcorp.net> Well, I did not plan to test all the possible versions of the kernel; for sure improvements are on their way, what just confirms the assumption that this 'technology' is not mature yet. With IPoIB an NFS server can easily export (for instance) up to 1.2GB/s (at least this is what I can measure), with the data in the page cache. No problem up to that point at least. I clearly understand the theoretical benefits of RDMA and it's a clear improvement over TCP, for MPI. However, the drastic change for MPI is even more on the latency side, though the peak message bandwidth is also improved as one might expect for NFS. Registration/deregistration issues are also well-known to the MPI developpers, and all this is certainly not that easy to manage in other areas. Still, NFS-RDMA remains NFS. If the bottleneck is not in the transport, nothing will be improved by RDMA from the performance point of view. Even worse, what I saw with the 2.6.27 kernel + OFED1.4-rc3 is the inability of NFS-RDMA to match the performance of NFS-TCP for some patterns of IOzone, with a filesystem able to sustain itself several hundreds of MB/s (using exactly the same hardware and software in both cases). We are far from a pure IB bandwidth issue here, we are just facing an issue in how the requests are handled probably, perhaps when paging occurs, I can't tell. I could not find any tuning to solve the more obvious problem, i.e. the low bandwidth for reading, except mounting with '-o rsize=4096'; probably not what people expect, as this will have other effects. Anyway this does improve only the sequential read bandwidth. But of course I will repeat my tests with the latest release of everything when I have time, still making sure I compare apples to apples... Again, I'm sure improvements are on their way ! Fred. -----Original Message----- From: Talpey, Thomas [mailto:Thomas.Talpey at netapp.com] Sent: Tuesday, 11 November, 2008 17:02 To: Ciesielski, Frederic (EMEA HPC&OSLO CC) Cc: Jeff Becker; general at lists.openfabrics.org Subject: RE: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ? At 11:27 AM 11/10/2008, Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote: >That's great, thanks. > >I ran some tests with the 2.6.27 kernel as server and client, and >basically it works fine. > >I could not find yet any situation where NFS-RDMA would outperform >NFS/IPoIB, at least when you compare apples to apples (same clients, >same server, same protocol, and not just write to/read from the >caches), and it even seems to have severe performance issues for >reading with files larger than the memory size of the client and the server. >Hopefully this will improve when more users will be able to give >valuable feedback... I have a couple of questions, and perhaps suggestions as well. First the questions... - Have you tried with a 2.6.28-rc4 client and server at all? There are a number of significant NFS/RDMA improvements queued in kernel.org, especially around RDMA memory registration as well as RDMA operation scheduling. We've seen some significant throughput improvement even for basic tunings. - What type of storage are you using at the server, and have you attempted to tune the server at all? For example, if you are storage (spindle) limited, no network tuning is likely to help and you should address that first. Also, there are tunings such as nfsd thread count, export options, and adapter choice that can make a large difference. Bottom line, you should be able to reach multi-hundred-MB/sec of read/write throughput with NFS/RDMA, but there may be issues on specific systems, or perhaps with the OFED1.4 code, that need to be accounted for. If possible, you may want to set expectations based on mainline, then try to duplicate them in the OFED backport. The current OFED NFS/RDMA support is still evolving, while we consider the mainline kernel.org version to be rather solid. Tom. > >Fred. > >-----Original Message----- >From: Jeff Becker [mailto:Jeffrey.C.Becker at nasa.gov] >Sent: Saturday, 08 November, 2008 22:35 >To: Ciesielski, Frederic (EMEA HPC&OSLO CC) >Cc: general at lists.openfabrics.org >Subject: Re: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ? > >Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote: >> Is there any chance that the new NFS-RDMA features coming with OFED >> 1.4 work with standard and current distributions, like RHEL5, SLES10 ? >Not yet, but I'm working on it. I intend for NFSRDMA to work on 2.6.27 >and 2.6.26 for OFED 1.4. The RHEL5 and SLES10 backports will likely be >done for OFED 1.4.1. Thanks. > >-jeff > >> Did anybody test this, or would pretend it is supposed to work ? >> >> I mean without building a 2.6.27 or equivalent kernel on top of it, >> keeping almost full support from the vendors. >> >> Enhanced kernel modules may not be sufficient to work around the >> limitations of old kernels... >> >> >> From Thomas.Talpey at netapp.com Tue Nov 11 10:57:04 2008 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Tue, 11 Nov 2008 13:57:04 -0500 Subject: FW: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ? In-Reply-To: <7391130E01ED404FBD7A3C86731EEB7D20ECAB4C0F@GVW1087EXB.amer icas.hpqcorp.net> References: <7391130E01ED404FBD7A3C86731EEB7D20ECAB4C0F@GVW1087EXB.americas.hpqcorp.net> Message-ID: At 01:06 PM 11/11/2008, Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote: >Well, I did not plan to test all the possible versions of the kernel; >for sure improvements are on their way, what just confirms the >assumption that this 'technology' is not mature yet. First, let's be sure to separate NFS/RDMA OFED issues from core NFS/RDMA. The OFED1.4 release is the first to support NFS/RDMA, and there are certainly issues remaining in this new backport. Depending on which kernel you're targeting, there can be other issues - SLES10 is 2.6.16-based, for example, and RHEL5 is 2.6.18. The NFS code itself (not just NFS/RDMA) has evolved significantly since then, and continues to do so. >With IPoIB an NFS server can easily export (for instance) up to >1.2GB/s (at least this is what I can measure), with the data in the >page cache. No problem up to that point at least. This is impressive, by the way. I have not seen any results with NFS/IPoIB at this level. Most client machines run out of CPU far before this. >I clearly understand the theoretical benefits of RDMA and it's a clear >improvement over TCP, for MPI. However, the drastic change for MPI is >even more on the latency side, though the peak message bandwidth is >also improved as one might expect for NFS. >Registration/deregistration issues are also well-known to the MPI >developpers, and all this is certainly not that easy to manage in other areas. > >Still, NFS-RDMA remains NFS. If the bottleneck is not in the >transport, nothing will be improved by RDMA from the performance point of view. >Even worse, what I saw with the 2.6.27 kernel + OFED1.4-rc3 is the >inability of NFS-RDMA to match the performance of NFS-TCP for some >patterns of IOzone, with a filesystem able to sustain itself several >hundreds of MB/s (using exactly the same hardware and software in both >cases). We are far from a pure IB bandwidth issue here, we are just >facing an issue in how the requests are handled probably, perhaps when >paging occurs, I can't tell. I'd be very interested in any analysis of this which you may have done. One thought that comes to mind is the possibility that your server's filesystem performs less well at the 32KB read/write sizes that the NFS/RDMA client is currently limited to. If you were measuring large-sequential workloads, then you might be able to measure a difference, particularly when exporting the filesystem in the default "sync" mode. NFS/TCP can send up to 1MB writes. This is something we plan to address now that the FRMR memory registration mode is available. >I could not find any tuning to solve the more obvious problem, i.e. >the low bandwidth for reading, except mounting with '-o rsize=4096'; Ouch! That will severely limit the client, forcing it to send MANY more RPC requests. Did performance increase with this setting? For iozone with what options? >probably not what people expect, as this will have other effects. >Anyway this does improve only the sequential read bandwidth. >But of course I will repeat my tests with the latest release of >everything when I have time, still making sure I compare apples to apples... >Again, I'm sure improvements are on their way ! I would look forward to seeing your opinions of the new code, particularly for the server performance. Thanks for the info so far! Tom. > >Fred. > > >-----Original Message----- >From: Talpey, Thomas [mailto:Thomas.Talpey at netapp.com] >Sent: Tuesday, 11 November, 2008 17:02 >To: Ciesielski, Frederic (EMEA HPC&OSLO CC) >Cc: Jeff Becker; general at lists.openfabrics.org >Subject: RE: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ? > >At 11:27 AM 11/10/2008, Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote: >>That's great, thanks. >> >>I ran some tests with the 2.6.27 kernel as server and client, and >>basically it works fine. >> >>I could not find yet any situation where NFS-RDMA would outperform >>NFS/IPoIB, at least when you compare apples to apples (same clients, >>same server, same protocol, and not just write to/read from the >>caches), and it even seems to have severe performance issues for >>reading with files larger than the memory size of the client and the server. >>Hopefully this will improve when more users will be able to give >>valuable feedback... > >I have a couple of questions, and perhaps suggestions as well. >First the questions... > >- Have you tried with a 2.6.28-rc4 client and server at all? There are >a number of significant NFS/RDMA improvements queued in kernel.org, >especially around RDMA memory registration as well as RDMA operation >scheduling. We've seen some significant throughput improvement even >for basic tunings. > >- What type of storage are you using at the server, and have you >attempted to tune the server at all? For example, if you are storage >(spindle) limited, no network tuning is likely to help and you should >address that first. Also, there are tunings such as nfsd thread count, >export options, and adapter choice that can make a large difference. > >Bottom line, you should be able to reach multi-hundred-MB/sec of >read/write throughput with NFS/RDMA, but there may be issues on >specific systems, or perhaps with the OFED1.4 code, that need to be >accounted for. If possible, you may want to set expectations based on >mainline, then try to duplicate them in the OFED backport. >The current OFED NFS/RDMA support is still evolving, while we consider >the mainline kernel.org version to be rather solid. > >Tom. > >> >>Fred. >> >>-----Original Message----- >>From: Jeff Becker [mailto:Jeffrey.C.Becker at nasa.gov] >>Sent: Saturday, 08 November, 2008 22:35 >>To: Ciesielski, Frederic (EMEA HPC&OSLO CC) >>Cc: general at lists.openfabrics.org >>Subject: Re: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ? >> >>Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote: >>> Is there any chance that the new NFS-RDMA features coming with OFED >>> 1.4 work with standard and current distributions, like RHEL5, SLES10 ? >>Not yet, but I'm working on it. I intend for NFSRDMA to work on 2.6.27 >>and 2.6.26 for OFED 1.4. The RHEL5 and SLES10 backports will likely be >>done for OFED 1.4.1. Thanks. >> >>-jeff >> >>> Did anybody test this, or would pretend it is supposed to work ? >>> >>> I mean without building a 2.6.27 or equivalent kernel on top of it, >>> keeping almost full support from the vendors. >>> >>> Enhanced kernel modules may not be sufficient to work around the >>> limitations of old kernels... >>> >>> >>> From sashak at voltaire.com Tue Nov 11 11:19:58 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 11 Nov 2008 21:19:58 +0200 Subject: [ofa-general] Re: [opensm patch][1/2] fix qos config parsing bugs In-Reply-To: <1225404078.1197.533.camel@cardanus.llnl.gov> References: <1225404078.1197.533.camel@cardanus.llnl.gov> Message-ID: <20081111191958.GA8894@sashak.voltaire.com> Hi Al, On 15:01 Thu 30 Oct , Al Chu wrote: > > I found a bunch of qos config parsing issues, listed below: > > 1) > > If the user sets the qos default fields (i.e. qos_high_limit, > qos_vlarb_high. etc.), but do not have the qos_ca, qos_swe, qos_rtr, > etc. equivalent fields listed (i.e. qos_ca_high_limit, > qos_sw0_vlarb_high), the values set in teh qos default fields are not > loaded into the CAs, switches, etc. The reason is in qos_build_config() > we load defaults like this: > > p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; > > but we always set the fields to something non-NULL. > > static void subn_set_default_qos_options(IN osm_qos_options_t * opt) > { > opt->max_vls = OSM_DEFAULT_QOS_MAX_VLS; > opt->high_limit = OSM_DEFAULT_QOS_HIGH_LIMIT; > opt->vlarb_high = OSM_DEFAULT_QOS_VLARB_HIGH; > opt->vlarb_low = OSM_DEFAULT_QOS_VLARB_LOW; > opt->sl2vl = OSM_DEFAULT_QOS_SL2VL; > } Yes, we are setting this to the default qos set (if not explicitly specified by user). So finally we always have valid set. No? > 2) > > In qos_build_config() we load the high_limit like this: > > cfg->vl_high_limit = (uint8_t) opt->high_limit; > > So there is no way to tell the qos_ca, qos_swe, qos_rtr, etc. high_limit > options to "go back to" the default high_limit. It just assumes that > whatever is input (or was set by default) is what you should use. Right. What is a limitation here? That an user cannot set this to "no value"? But she/he can just skip it. > 3) > > Some fields like qos_vlarb_high are assumed to be correctly set and can > segfault opensm. qos_build_config() assumes that valid parameters are used. And we are using this this way (I hope :)) (finally it is not library API). > The attached patch fixes these up. Obviously there's tons of ways to > do this. I decided to ... > > A) only initialization qos_options to the real defaults > > B) init all qos_*_options to sentinel values (-1, NULL, etc.) to > indicate it should use the configured defaults if they aren't set by the > user. The high_limit was changed from an unsigned to an int b/c 0 is a > valid high_limit value. > > C) verify that the default qos inputs are definitely correct (i.e. can't > be NULL). Reset to hard coded defaults if need be. > > D) load the default vs. non-default appropriately in QoS. And I see that we have here much more sometimes not-trivial flows and default values are spread over many places... :( Sasha > > Al > > P.S. This patch does not rely on my previous "remove qos_max_vls > config" patch. I assume we're keeping the max_vls fields in this patch. > > -- > Albert Chu > chu11 at llnl.gov > Computer Scientist > High Performance Systems Division > Lawrence Livermore National Laboratory > From 00a15a1797b79fd5e3298d98742b6da3613fb9c3 Mon Sep 17 00:00:00 2001 > From: root > Date: Thu, 30 Oct 2008 09:32:29 -0700 > Subject: [PATCH] fix qos config parsing bugs > > > Signed-off-by: root > --- > opensm/include/opensm/osm_subnet.h | 12 +- > opensm/opensm/osm_qos.c | 6 +- > opensm/opensm/osm_subnet.c | 467 ++++++++++++++++++++++-------------- > 3 files changed, 293 insertions(+), 192 deletions(-) > > diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h > index 7259587..11063b7 100644 > --- a/opensm/include/opensm/osm_subnet.h > +++ b/opensm/include/opensm/osm_subnet.h > @@ -99,7 +99,7 @@ struct osm_qos_policy; > */ > typedef struct osm_qos_options { > unsigned max_vls; > - unsigned high_limit; > + int high_limit; > char *vlarb_high; > char *vlarb_low; > char *sl2vl; > @@ -108,20 +108,20 @@ typedef struct osm_qos_options { > * FIELDS > * > * max_vls > -* The number of maximum VLs on the Subnet > +* The number of maximum VLs on the Subnet (0 == use default) > * > * high_limit > * The limit of High Priority component of VL Arbitration > -* table (IBA 7.6.9) > +* table (IBA 7.6.9) (-1 == use default) > * > * vlarb_high > -* High priority VL Arbitration table template. > +* High priority VL Arbitration table template. (NULL == use default) > * > * vlarb_low > -* Low priority VL Arbitration table template. > +* Low priority VL Arbitration table template. (NULL == use default) > * > * sl2vl > -* SL2VL Mapping table (IBA 7.6.6) template. > +* SL2VL Mapping table (IBA 7.6.6) template. (NULL == use default) > * > *********/ > > diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c > index 1679ae0..b451c25 100644 > --- a/opensm/opensm/osm_qos.c > +++ b/opensm/opensm/osm_qos.c > @@ -382,7 +382,11 @@ static void qos_build_config(struct qos_config *cfg, > memset(cfg, 0, sizeof(*cfg)); > > cfg->max_vls = opt->max_vls > 0 ? opt->max_vls : dflt->max_vls; > - cfg->vl_high_limit = (uint8_t) opt->high_limit; > + > + if (opt->high_limit >= 0) > + cfg->vl_high_limit = (uint8_t) opt->high_limit; > + else > + cfg->vl_high_limit = (uint8_t) dflt->high_limit; > > p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; > for (i = 0; i < 2 * IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; i++) { > diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c > index 0422d0f..ab2ff9c 100644 > --- a/opensm/opensm/osm_subnet.c > +++ b/opensm/opensm/osm_subnet.c > @@ -370,6 +370,15 @@ static void subn_set_default_qos_options(IN osm_qos_options_t * opt) > opt->sl2vl = OSM_DEFAULT_QOS_SL2VL; > } > > +static void subn_init_qos_options(IN osm_qos_options_t * opt) > +{ > + opt->max_vls = 0; > + opt->high_limit = -1; > + opt->vlarb_high = NULL; > + opt->vlarb_low = NULL; > + opt->sl2vl = NULL; > +} > + > /********************************************************************** > **********************************************************************/ > void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) > @@ -458,10 +467,10 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) > p_opt->prefix_routes_file = OSM_DEFAULT_PREFIX_ROUTES_FILE; > p_opt->consolidate_ipv6_snm_req = FALSE; > subn_set_default_qos_options(&p_opt->qos_options); > - subn_set_default_qos_options(&p_opt->qos_ca_options); > - subn_set_default_qos_options(&p_opt->qos_sw0_options); > - subn_set_default_qos_options(&p_opt->qos_swe_options); > - subn_set_default_qos_options(&p_opt->qos_rtr_options); > + subn_init_qos_options(&p_opt->qos_ca_options); > + subn_init_qos_options(&p_opt->qos_sw0_options); > + subn_init_qos_options(&p_opt->qos_swe_options); > + subn_init_qos_options(&p_opt->qos_rtr_options); > } > > /********************************************************************** > @@ -497,6 +506,7 @@ opts_unpack_net64(IN char *p_req_key, > } > } > > + > /********************************************************************** > **********************************************************************/ > static void > @@ -511,6 +521,20 @@ opts_unpack_uint32(IN char *p_req_key, > } > } > } > +/********************************************************************** > + **********************************************************************/ > +static void > +opts_unpack_int32(IN char *p_req_key, > + IN char *p_key, IN char *p_val_str, IN int32_t * p_val) > +{ > + if (!strcmp(p_req_key, p_key)) { > + int32_t val = strtol(p_val_str, NULL, 0); > + if (val != *p_val) { > + log_config_value(p_key, "%d", val); > + *p_val = val; > + } > + } > +} > > /********************************************************************** > **********************************************************************/ > @@ -641,7 +665,7 @@ subn_parse_qos_options(IN const char *prefix, > snprintf(name, sizeof(name), "%s_max_vls", prefix); > opts_unpack_uint32(name, p_key, p_val_str, &opt->max_vls); > snprintf(name, sizeof(name), "%s_high_limit", prefix); > - opts_unpack_uint32(name, p_key, p_val_str, &opt->high_limit); > + opts_unpack_int32(name, p_key, p_val_str, &opt->high_limit); > snprintf(name, sizeof(name), "%s_vlarb_high", prefix); > opts_unpack_charp(name, p_key, p_val_str, &opt->vlarb_high); > snprintf(name, sizeof(name), "%s_vlarb_low", prefix); > @@ -653,7 +677,9 @@ subn_parse_qos_options(IN const char *prefix, > static int > subn_dump_qos_options(FILE * file, > const char *set_name, > - const char *prefix, osm_qos_options_t * opt) > + const char *prefix, > + osm_qos_options_t * opt, > + osm_qos_options_t * dflt) > { > return fprintf(file, "# %s\n" > "%s_max_vls %u\n" > @@ -662,10 +688,11 @@ subn_dump_qos_options(FILE * file, > "%s_vlarb_low %s\n" > "%s_sl2vl %s\n", > set_name, > - prefix, opt->max_vls, > - prefix, opt->high_limit, > - prefix, opt->vlarb_high, > - prefix, opt->vlarb_low, prefix, opt->sl2vl); > + prefix, opt->max_vls > 0 ? opt->max_vls : dflt->max_vls, > + prefix, opt->high_limit >= 0 ? opt->high_limit : dflt->high_limit, > + prefix, opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high, > + prefix, opt->vlarb_low ? opt->vlarb_low : dflt->vlarb_low, > + prefix, opt->sl2vl ? opt->sl2vl : dflt->sl2vl); > } > > /********************************************************************** > @@ -833,169 +860,182 @@ int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) > /********************************************************************** > **********************************************************************/ > > -static void subn_verify_max_vls(IN unsigned *max_vls, IN char *key) > +static void subn_verify_max_vls(IN unsigned *max_vls, IN char *key, IN unsigned dflt) > { > char buff[128]; > > - if (*max_vls > 15) { > + if (!(*max_vls) || *max_vls > 15) { > sprintf(buff, " Invalid Cached Option:%s=%u:" > - "Using Default:%u\n", > - key, *max_vls, OSM_DEFAULT_QOS_MAX_VLS); > + "Using Default\n", > + key, *max_vls); > printf(buff); > cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > - *max_vls = OSM_DEFAULT_QOS_MAX_VLS; > + *max_vls = dflt; > } > } > > -static void subn_verify_high_limit(IN unsigned *high_limit, IN char *key) > +static void subn_verify_high_limit(IN int *high_limit, IN char *key, IN int dflt) > { > char buff[128]; > > - if (*high_limit > 255) { > - sprintf(buff, " Invalid Cached Option:%s=%u:" > - "Using Default:%u\n", > - key, *high_limit, OSM_DEFAULT_QOS_HIGH_LIMIT); > + if (*high_limit < 0 || *high_limit > 255) { > + sprintf(buff, " Invalid Cached Option:%s=%d:" > + "Using Default\n", key, *high_limit); > printf(buff); > cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > - *high_limit = OSM_DEFAULT_QOS_HIGH_LIMIT; > + *high_limit = dflt; > } > } > > -static void subn_verify_vlarb(IN char *vlarb, IN char *key) > +static void subn_verify_vlarb(IN char **vlarb, IN char *key, IN char *dflt) > { > - if (vlarb) { > - char buff[128]; > - char *str, *tok, *end, *ptr; > - int count = 0; > - > - str = (char *)malloc(strlen(vlarb) + 1); > - strcpy(str, vlarb); > - > - tok = strtok_r(str, ",\n", &ptr); > - while (tok) { > - char *vl_str, *weight_str; > - > - vl_str = tok; > - weight_str = strchr(tok, ':'); > - > - if (weight_str) { > - long vl, weight; > - > - *weight_str = '\0'; > - weight_str++; > - > - vl = strtol(vl_str, &end, 0); > - > - if (*end) { > - sprintf(buff, > - " Warning: Cached Option %s:vl=%s improperly formatted\n", > - key, vl_str); > - printf(buff); > - cl_log_event("OpenSM", CL_LOG_INFO, > - buff, NULL, 0); > - } else if (vl < 0 || vl > 14) { > - sprintf(buff, > - " Warning: Cached Option %s:vl=%ld out of range\n", > - key, vl); > - printf(buff); > - cl_log_event("OpenSM", CL_LOG_INFO, > - buff, NULL, 0); > - } > - > - weight = strtol(weight_str, &end, 0); > - > - if (*end) { > - sprintf(buff, > - " Warning: Cached Option %s:weight=%s improperly formatted\n", > - key, weight_str); > - printf(buff); > - cl_log_event("OpenSM", CL_LOG_INFO, > - buff, NULL, 0); > - } else if (weight < 0 || weight > 255) { > - sprintf(buff, > - " Warning: Cached Option %s:weight=%ld out of range\n", > - key, weight); > - printf(buff); > - cl_log_event("OpenSM", CL_LOG_INFO, > - buff, NULL, 0); > - } > - } else { > - sprintf(buff, > - " Warning: Cached Option %s:vl:weight=%s improperly formatted\n", > - key, tok); > - printf(buff); > - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, > - 0); > - } > + char buff[128]; > + char *str, *tok, *end, *ptr; > + int count = 0; > > - count++; > - tok = strtok_r(NULL, ",\n", &ptr); > - } > + if (*vlarb == NULL) { > + sprintf(buff, " Invalid Cached Option:%s:" > + "Using Default\n", key); > + printf(buff); > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > + (*vlarb) = dflt; > + return; > + } > > - if (count > 64) { > - sprintf(buff, > - " Warning: Cached Option %s: > 64 listed: " > - "excess vl:weight pairs will be dropped\n", > - key); > - printf(buff); > - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > - } > + str = (char *)malloc(strlen(*vlarb) + 1); > + strcpy(str, *vlarb); > > - free(str); > - } > -} > + tok = strtok_r(str, ",\n", &ptr); > + while (tok) { > + char *vl_str, *weight_str; > > -static void subn_verify_sl2vl(IN char *sl2vl, IN char *key) > -{ > - if (sl2vl) { > - char buff[128]; > - char *str, *tok, *end, *ptr; > - int count = 0; > + vl_str = tok; > + weight_str = strchr(tok, ':'); > > - str = (char *)malloc(strlen(sl2vl) + 1); > - strcpy(str, sl2vl); > + if (weight_str) { > + long vl, weight; > > - tok = strtok_r(str, ",\n", &ptr); > - while (tok) { > - long vl = strtol(tok, &end, 0); > + *weight_str = '\0'; > + weight_str++; > + > + vl = strtol(vl_str, &end, 0); > > if (*end) { > sprintf(buff, > " Warning: Cached Option %s:vl=%s improperly formatted\n", > - key, tok); > + key, vl_str); > printf(buff); > - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, > - 0); > - } else if (vl < 0 || vl > 15) { > + cl_log_event("OpenSM", CL_LOG_INFO, > + buff, NULL, 0); > + } else if (vl < 0 || vl > 14) { > sprintf(buff, > " Warning: Cached Option %s:vl=%ld out of range\n", > key, vl); > printf(buff); > - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, > - 0); > + cl_log_event("OpenSM", CL_LOG_INFO, > + buff, NULL, 0); > } > > - count++; > - tok = strtok_r(NULL, ",\n", &ptr); > - } > + weight = strtol(weight_str, &end, 0); > > - if (count < 16) { > + if (*end) { > + sprintf(buff, > + " Warning: Cached Option %s:weight=%s improperly formatted\n", > + key, weight_str); > + printf(buff); > + cl_log_event("OpenSM", CL_LOG_INFO, > + buff, NULL, 0); > + } else if (weight < 0 || weight > 255) { > + sprintf(buff, > + " Warning: Cached Option %s:weight=%ld out of range\n", > + key, weight); > + printf(buff); > + cl_log_event("OpenSM", CL_LOG_INFO, > + buff, NULL, 0); > + } > + } else { > sprintf(buff, > - " Warning: Cached Option %s: < 16 VLs listed\n", > - key); > + " Warning: Cached Option %s:vl:weight=%s improperly formatted\n", > + key, tok); > printf(buff); > - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, > + 0); > } > - if (count > 16) { > + > + count++; > + tok = strtok_r(NULL, ",\n", &ptr); > + } > + > + if (count > 64) { > + sprintf(buff, > + " Warning: Cached Option %s: > 64 listed: " > + "excess vl:weight pairs will be dropped\n", > + key); > + printf(buff); > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > + } > + > + free(str); > +} > + > +static void subn_verify_sl2vl(IN char **sl2vl, IN char *key, IN char *dflt) > +{ > + char buff[128]; > + char *str, *tok, *end, *ptr; > + int count = 0; > + > + if (*sl2vl == NULL) { > + sprintf(buff, " Invalid Cached Option:%s:" > + "Using Default\n", key); > + printf(buff); > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > + (*sl2vl) = dflt; > + return; > + } > + > + str = (char *)malloc(strlen(*sl2vl) + 1); > + strcpy(str, *sl2vl); > + > + tok = strtok_r(str, ",\n", &ptr); > + while (tok) { > + long vl = strtol(tok, &end, 0); > + > + if (*end) { > sprintf(buff, > - " Warning: Cached Option %s: > 16 listed: " > - "excess VLs will be dropped\n", key); > + " Warning: Cached Option %s:vl=%s improperly formatted\n", > + key, tok); > printf(buff); > - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, > + 0); > + } else if (vl < 0 || vl > 15) { > + sprintf(buff, > + " Warning: Cached Option %s:vl=%ld out of range\n", > + key, vl); > + printf(buff); > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, > + 0); > } > > - free(str); > + count++; > + tok = strtok_r(NULL, ",\n", &ptr); > + } > + > + if (count < 16) { > + sprintf(buff, > + " Warning: Cached Option %s: < 16 VLs listed\n", > + key); > + printf(buff); > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > } > + if (count > 16) { > + sprintf(buff, > + " Warning: Cached Option %s: > 16 listed: " > + "excess VLs will be dropped\n", key); > + printf(buff); > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > + } > + > + free(str); > } > > static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) > @@ -1046,61 +1086,113 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) > } > > if (p_opts->qos) { > + /* the default options in qos_options must be correct. > + * every other one need not be, b/c those will default > + * back to whatever is in qos_options. > + */ > + > subn_verify_max_vls(&(p_opts->qos_options.max_vls), > - "qos_max_vls"); > - subn_verify_max_vls(&(p_opts->qos_ca_options.max_vls), > - "qos_ca_max_vls"); > - subn_verify_max_vls(&(p_opts->qos_sw0_options.max_vls), > - "qos_sw0_max_vls"); > - subn_verify_max_vls(&(p_opts->qos_swe_options.max_vls), > - "qos_swe_max_vls"); > - subn_verify_max_vls(&(p_opts->qos_rtr_options.max_vls), > - "qos_rtr_max_vls"); > + "qos_max_vls", > + OSM_DEFAULT_MAX_OP_VLS); > + if (p_opts->qos_ca_options.max_vls) > + subn_verify_max_vls(&(p_opts->qos_ca_options.max_vls), > + "qos_ca_max_vls", > + 0); > + if (p_opts->qos_sw0_options.max_vls) > + subn_verify_max_vls(&(p_opts->qos_sw0_options.max_vls), > + "qos_sw0_max_vls", > + 0); > + if (p_opts->qos_swe_options.max_vls) > + subn_verify_max_vls(&(p_opts->qos_swe_options.max_vls), > + "qos_swe_max_vls", > + 0); > + if (p_opts->qos_rtr_options.max_vls) > + subn_verify_max_vls(&(p_opts->qos_rtr_options.max_vls), > + "qos_rtr_max_vls", > + 0); > > subn_verify_high_limit(&(p_opts->qos_options.high_limit), > - "qos_high_limit"); > - subn_verify_high_limit(&(p_opts->qos_ca_options.high_limit), > - "qos_ca_high_limit"); > - subn_verify_high_limit(& > - (p_opts->qos_sw0_options.high_limit), > - "qos_sw0_high_limit"); > - subn_verify_high_limit(& > - (p_opts->qos_swe_options.high_limit), > - "qos_swe_high_limit"); > - subn_verify_high_limit(& > - (p_opts->qos_rtr_options.high_limit), > - "qos_rtr_high_limit"); > - > - subn_verify_vlarb(p_opts->qos_options.vlarb_low, > - "qos_vlarb_low"); > - subn_verify_vlarb(p_opts->qos_ca_options.vlarb_low, > - "qos_ca_vlarb_low"); > - subn_verify_vlarb(p_opts->qos_sw0_options.vlarb_low, > - "qos_sw0_vlarb_low"); > - subn_verify_vlarb(p_opts->qos_swe_options.vlarb_low, > - "qos_swe_vlarb_low"); > - subn_verify_vlarb(p_opts->qos_rtr_options.vlarb_low, > - "qos_rtr_vlarb_low"); > - > - subn_verify_vlarb(p_opts->qos_options.vlarb_high, > - "qos_vlarb_high"); > - subn_verify_vlarb(p_opts->qos_ca_options.vlarb_high, > - "qos_ca_vlarb_high"); > - subn_verify_vlarb(p_opts->qos_sw0_options.vlarb_high, > - "qos_sw0_vlarb_high"); > - subn_verify_vlarb(p_opts->qos_swe_options.vlarb_high, > - "qos_swe_vlarb_high"); > - subn_verify_vlarb(p_opts->qos_rtr_options.vlarb_high, > - "qos_rtr_vlarb_high"); > - > - subn_verify_sl2vl(p_opts->qos_options.sl2vl, "qos_sl2vl"); > - subn_verify_sl2vl(p_opts->qos_ca_options.sl2vl, "qos_ca_sl2vl"); > - subn_verify_sl2vl(p_opts->qos_sw0_options.sl2vl, > - "qos_sw0_sl2vl"); > - subn_verify_sl2vl(p_opts->qos_swe_options.sl2vl, > - "qos_swe_sl2vl"); > - subn_verify_sl2vl(p_opts->qos_rtr_options.sl2vl, > - "qos_rtr_sl2vl"); > + "qos_high_limit", > + OSM_DEFAULT_QOS_HIGH_LIMIT); > + if (p_opts->qos_ca_options.high_limit >= 0) > + subn_verify_high_limit(&(p_opts->qos_ca_options.high_limit), > + "qos_ca_high_limit", > + -1); > + if (p_opts->qos_sw0_options.high_limit >= 0) > + subn_verify_high_limit(& > + (p_opts->qos_sw0_options.high_limit), > + "qos_sw0_high_limit", > + -1); > + if (p_opts->qos_swe_options.high_limit >= 0) > + subn_verify_high_limit(& > + (p_opts->qos_swe_options.high_limit), > + "qos_swe_high_limit", > + -1); > + if (p_opts->qos_rtr_options.high_limit >= 0) > + subn_verify_high_limit(& > + (p_opts->qos_rtr_options.high_limit), > + "qos_rtr_high_limit", > + -1); > + > + subn_verify_vlarb(&(p_opts->qos_options.vlarb_low), > + "qos_vlarb_low", > + OSM_DEFAULT_QOS_VLARB_LOW); > + if (p_opts->qos_ca_options.vlarb_low) > + subn_verify_vlarb(&(p_opts->qos_ca_options.vlarb_low), > + "qos_ca_vlarb_low", > + NULL); > + if (p_opts->qos_sw0_options.vlarb_low) > + subn_verify_vlarb(&(p_opts->qos_sw0_options.vlarb_low), > + "qos_sw0_vlarb_low", > + NULL); > + if (p_opts->qos_swe_options.vlarb_low) > + subn_verify_vlarb(&(p_opts->qos_swe_options.vlarb_low), > + "qos_swe_vlarb_low", > + NULL); > + if (p_opts->qos_rtr_options.vlarb_low) > + subn_verify_vlarb(&(p_opts->qos_rtr_options.vlarb_low), > + "qos_rtr_vlarb_low", > + NULL); > + > + subn_verify_vlarb(&(p_opts->qos_options.vlarb_high), > + "qos_vlarb_high", > + OSM_DEFAULT_QOS_VLARB_HIGH); > + if (p_opts->qos_ca_options.vlarb_high) > + subn_verify_vlarb(&(p_opts->qos_ca_options.vlarb_high), > + "qos_ca_vlarb_high", > + NULL); > + if (p_opts->qos_sw0_options.vlarb_high) > + subn_verify_vlarb(&(p_opts->qos_sw0_options.vlarb_high), > + "qos_sw0_vlarb_high", > + NULL); > + if (p_opts->qos_swe_options.vlarb_high) > + subn_verify_vlarb(&(p_opts->qos_swe_options.vlarb_high), > + "qos_swe_vlarb_high", > + NULL); > + if (p_opts->qos_rtr_options.vlarb_high) > + subn_verify_vlarb(&(p_opts->qos_rtr_options.vlarb_high), > + "qos_rtr_vlarb_high", > + NULL); > + > + subn_verify_sl2vl(&(p_opts->qos_options.sl2vl), > + "qos_sl2vl", > + OSM_DEFAULT_QOS_SL2VL); > + if (p_opts->qos_ca_options.sl2vl) > + subn_verify_sl2vl(&(p_opts->qos_ca_options.sl2vl), > + "qos_ca_sl2vl", > + NULL); > + if (p_opts->qos_sw0_options.sl2vl) > + subn_verify_sl2vl(&(p_opts->qos_sw0_options.sl2vl), > + "qos_sw0_sl2vl", > + NULL); > + if (p_opts->qos_swe_options.sl2vl) > + subn_verify_sl2vl(&(p_opts->qos_swe_options.sl2vl), > + "qos_swe_sl2vl", > + NULL); > + if (p_opts->qos_rtr_options.sl2vl) > + subn_verify_sl2vl(&(p_opts->qos_rtr_options.sl2vl), > + "qos_rtr_sl2vl", > + NULL); > } > #ifdef ENABLE_OSM_PERF_MGR > if (p_opts->perfmgr_sweep_time_s < 1) { > @@ -1714,23 +1806,28 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) > > subn_dump_qos_options(opts_file, > "QoS default options", "qos", > + &p_opts->qos_options, > &p_opts->qos_options); > fprintf(opts_file, "\n"); > subn_dump_qos_options(opts_file, > "QoS CA options", "qos_ca", > - &p_opts->qos_ca_options); > + &p_opts->qos_ca_options, > + &p_opts->qos_options); > fprintf(opts_file, "\n"); > subn_dump_qos_options(opts_file, > "QoS Switch Port 0 options", "qos_sw0", > - &p_opts->qos_sw0_options); > + &p_opts->qos_sw0_options, > + &p_opts->qos_options); > fprintf(opts_file, "\n"); > subn_dump_qos_options(opts_file, > "QoS Switch external ports options", "qos_swe", > - &p_opts->qos_swe_options); > + &p_opts->qos_swe_options, > + &p_opts->qos_options); > fprintf(opts_file, "\n"); > subn_dump_qos_options(opts_file, > "QoS Router ports options", "qos_rtr", > - &p_opts->qos_rtr_options); > + &p_opts->qos_rtr_options, > + &p_opts->qos_options); > fprintf(opts_file, "\n"); > > fprintf(opts_file, > -- > 1.5.4.5 > From landman at scalableinformatics.com Tue Nov 11 12:17:56 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 11 Nov 2008 15:17:56 -0500 Subject: FW: [ofa-general] NFS-RDMA (OFED1.4) with standard distributions ? In-Reply-To: <7391130E01ED404FBD7A3C86731EEB7D20ECAB4C0F@GVW1087EXB.americas.hpqcorp.net> References: <7391130E01ED404FBD7A3C86731EEB7D20ECAB4C0F@GVW1087EXB.americas.hpqcorp.net> Message-ID: <4919E874.4090409@scalableinformatics.com> Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote: > Well, I did not plan to test all the possible versions of the kernel; > for sure improvements are on their way, what just confirms the > assumption that this 'technology' is not mature yet. > > With IPoIB an NFS server can easily export (for instance) up to > 1.2GB/s (at least this is what I can measure), with the data in the > page cache. No problem up to that point at least. I clearly True ... but not so interesting to the actual data read/write case when it has to get back to spinning disk. > understand the theoretical benefits of RDMA and it's a clear > improvement over TCP, for MPI. However, the drastic change for MPI is > even more on the latency side, though the peak message bandwidth is > also improved as one might expect for NFS. Again, true, though NFS has to walk through transport protocol layers as well as NFS application layers. This additional effort reduces performance considerably. Add to this that you need (sadly) a copy of a buffer between the network stack and the disk stack. RDMA reduces one of these copies, but but as far as I know, it doesn't talk directly to the disks (you can do something like this with SCST in the block modes if you don't mind iSCSI). > Registration/deregistration issues are also well-known to the MPI > developpers, and all this is certainly not that easy to manage in > other areas. > > Still, NFS-RDMA remains NFS. If the bottleneck is not in the > transport, nothing will be improved by RDMA from the performance > point of view. Even worse, what I saw with the 2.6.27 kernel + > OFED1.4-rc3 is the inability of NFS-RDMA to match the performance of > NFS-TCP for some patterns of IOzone, with a filesystem able to Hmmm.... Most of the (default) IOzone measurements we have done (and seen published) are bound almost entirely by system ram cache. Indeed, we have had to go into the code and alter some of the constants to allow us to test greater than 16 MB records, and greater than 16 GB files. Otherwise all we measure is cache speed. Could you elaborate on system parameters, and what measurements weren't up to par, as well as what options you used? We see NFSoverRDMA on SDR achieving about 400 MB/s while NFS over IPoIB on the same hardware (identical actually) is about 200 MB/s on reads. With DDR IB, we ran a test between a pair of our JackRabbit machines, and found a sustained ~500-550 MB/s read, and about 400 MB/s or so write. The underlying file system could handle well over 1 GB/s. NFS over IPoIB wasn't close. > sustain itself several hundreds of MB/s (using exactly the same > hardware and software in both cases). We are far from a pure IB > bandwidth issue here, we are just facing an issue in how the requests > are handled probably, perhaps when paging occurs, I can't tell. I I don't think this is the limitation. I think it is more along the lines of copying buffers between different stacks ... kernel buffer to user space program and then back to kernel for net->ram->disk and vice-versa. There are other issues as well which could be causing performance degradation, specifically on payload size. FWIW: This is a 2.6.27.5 kernel. > could not find any tuning to solve the more obvious problem, i.e. the > low bandwidth for reading, except mounting with '-o rsize=4096'; > probably not what people expect, as this will have other effects. > Anyway this does improve only the sequential read bandwidth. But of > course I will repeat my tests with the latest release of everything > when I have time, still making sure I compare apples to apples... > Again, I'm sure improvements are on their way ! > > Fred. > > > -----Original Message----- From: Talpey, Thomas > [mailto:Thomas.Talpey at netapp.com] Sent: Tuesday, 11 November, 2008 > 17:02 To: Ciesielski, Frederic (EMEA HPC&OSLO CC) Cc: Jeff Becker; > general at lists.openfabrics.org Subject: RE: [ofa-general] NFS-RDMA > (OFED1.4) with standard distributions ? > > At 11:27 AM 11/10/2008, Ciesielski, Frederic (EMEA HPC&OSLO CC) > wrote: >> That's great, thanks. >> >> I ran some tests with the 2.6.27 kernel as server and client, and >> basically it works fine. >> >> I could not find yet any situation where NFS-RDMA would outperform >> NFS/IPoIB, at least when you compare apples to apples (same >> clients, same server, same protocol, and not just write to/read >> from the caches), and it even seems to have severe performance >> issues for reading with files larger than the memory size of the >> client and the server. Hopefully this will improve when more users >> will be able to give valuable feedback... > > I have a couple of questions, and perhaps suggestions as well. First > the questions... > > - Have you tried with a 2.6.28-rc4 client and server at all? There > are a number of significant NFS/RDMA improvements queued in > kernel.org, especially around RDMA memory registration as well as > RDMA operation scheduling. We've seen some significant throughput > improvement even for basic tunings. > > - What type of storage are you using at the server, and have you > attempted to tune the server at all? For example, if you are storage > (spindle) limited, no network tuning is likely to help and you should > address that first. Also, there are tunings such as nfsd thread > count, export options, and adapter choice that can make a large > difference. > > Bottom line, you should be able to reach multi-hundred-MB/sec of > read/write throughput with NFS/RDMA, but there may be issues on > specific systems, or perhaps with the OFED1.4 code, that need to be > accounted for. If possible, you may want to set expectations based on > mainline, then try to duplicate them in the OFED backport. The > current OFED NFS/RDMA support is still evolving, while we consider > the mainline kernel.org version to be rather solid. > > Tom. > >> Fred. >> >> -----Original Message----- From: Jeff Becker >> [mailto:Jeffrey.C.Becker at nasa.gov] Sent: Saturday, 08 November, >> 2008 22:35 To: Ciesielski, Frederic (EMEA HPC&OSLO CC) Cc: >> general at lists.openfabrics.org Subject: Re: [ofa-general] NFS-RDMA >> (OFED1.4) with standard distributions ? >> >> Ciesielski, Frederic (EMEA HPC&OSLO CC) wrote: >>> Is there any chance that the new NFS-RDMA features coming with >>> OFED 1.4 work with standard and current distributions, like >>> RHEL5, SLES10 ? >> Not yet, but I'm working on it. I intend for NFSRDMA to work on >> 2.6.27 and 2.6.26 for OFED 1.4. The RHEL5 and SLES10 backports will >> likely be done for OFED 1.4.1. Thanks. >> >> -jeff >> >>> Did anybody test this, or would pretend it is supposed to work ? >>> >>> I mean without building a 2.6.27 or equivalent kernel on top of >>> it, keeping almost full support from the vendors. >>> >>> Enhanced kernel modules may not be sufficient to work around the >>> limitations of old kernels... >>> >>> >>> > > _______________________________________________ general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From sashak at voltaire.com Tue Nov 11 12:26:48 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 11 Nov 2008 22:26:48 +0200 Subject: [ofa-general] Re: [opensm patch][2/2] verify config inputs when config file is rescanned In-Reply-To: <1226353273.13603.39.camel@cardanus.llnl.gov> References: <1225404081.1197.534.camel@cardanus.llnl.gov> <20081110210233.GE3467@sashak.voltaire.com> <1226351730.13603.27.camel@cardanus.llnl.gov> <1226353273.13603.39.camel@cardanus.llnl.gov> Message-ID: <20081111202648.GB8894@sashak.voltaire.com> On 13:41 Mon 10 Nov , Al Chu wrote: > Hey Sasha, > > Sorry, repost, w/ the right Author. > > Al > > On Mon, 2008-11-10 at 13:15 -0800, Al Chu wrote: > > On Mon, 2008-11-10 at 23:02 +0200, Sasha Khapyorsky wrote: > > > Hi Al, > > > > > > On 15:01 Thu 30 Oct , Al Chu wrote: > > > > Hey Sasha, > > > > > > > > I noticed that after the config file is rescanned, the new potential > > > > inputs aren't checked for validity. Patch is attached. > > > > > > > > Al > > > > > > > > -- > > > > Albert Chu > > > > chu11 at llnl.gov > > > > Computer Scientist > > > > High Performance Systems Division > > > > Lawrence Livermore National Laboratory > > > > > > > From edfcd2de96c3525d1609b4c0f03c17ecc0495c18 Mon Sep 17 00:00:00 2001 > > > > From: root > > > > Date: Thu, 30 Oct 2008 13:58:55 -0700 > > > > Subject: [PATCH] verify rescanned config input > > > > > > > > > > > > Signed-off-by: root > > > ^^^^^^^^^^^^^^^^^^^^^^^^ > > > > > > I'm fine with this patch, but could you fix S-O-B line? Thanks. > > > > Oops. New one is attached (I'll repost the [1/2] patch too). > > > > Al > > > > > Sasha > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general > -- > Albert Chu > chu11 at llnl.gov > Computer Scientist > High Performance Systems Division > Lawrence Livermore National Laboratory > From a9f7ea0b667ff32a029593e954286c349fe499e7 Mon Sep 17 00:00:00 2001 > From: Albert Chu > Date: Mon, 10 Nov 2008 13:10:25 -0800 > Subject: [PATCH] verify rescanned config input > > > Signed-off-by: Albert Chu Applied. Thanks. Sasha From rpearson at systemfabricworks.com Tue Nov 11 12:33:52 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Tue, 11 Nov 2008 14:33:52 -0600 Subject: [ofa-general] [PATCH][9] opensm: lash preparation Message-ID: <008701c9443c$cfc1f050$6f45d0f0$@com> Sasha, Here is the ninth patch implementing the mesh analysis algorithm. This patch makes some minor cleanups in osm_ucast_lash.c in preparation for next steps. The main change is to minimize the occurrences of phys_connections. Also there are a few nits: - delete banner for local variables that moved to ...lash.h - fix bad return value of osm_mesh_node_create fails - clear sw->p_sw->priv on switch cleanup - fix spelling error in comment - discover_network_properties returns an error which was not checked Regards, Bob Pearson Signed-off-by: Bob Pearson ---- diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c index b9394af..95dbcc2 100644 --- a/opensm/opensm/osm_ucast_lash.c +++ b/opensm/opensm/osm_ucast_lash.c @@ -55,10 +55,6 @@ #include #include -/* //////////////////////////// */ -/* Local types */ -/* //////////////////////////// */ - static cdg_vertex_t *create_cdg_vertex(unsigned num_switches) { cdg_vertex_t *cdg_vertex = (cdg_vertex_t *) malloc(sizeof(cdg_vertex_t)); @@ -150,6 +146,11 @@ static int cycle_exists(cdg_vertex_t * start, cdg_vertex_t * current, return cycle_found; } +static inline int get_next_switch(lash_t *p_lash, int sw, int link) +{ + return p_lash->switches[sw]->phys_connections[link]; +} + static void remove_semipermanent_depend_for_sp(lash_t * p_lash, int sw, int dest_switch, int lane) { @@ -161,7 +162,7 @@ static void remove_semipermanent_depend_for_sp(lash_t * p_lash, int sw, int found; output_link = switches[sw]->routing_table[dest_switch].out_link; - i_next_switch = switches[sw]->phys_connections[output_link]; + i_next_switch = get_next_switch(p_lash, sw, output_link); while (sw != dest_switch) { v = cdg_vertex_matrix[lane][sw][i_next_switch]; @@ -177,8 +178,7 @@ static void remove_semipermanent_depend_for_sp(lash_t * p_lash, int sw, if (i_next_switch != dest_switch) { next_link = switches[i_next_switch]->routing_table[dest_switch].out_link; - i_next_next_switch = - switches[i_next_switch]->phys_connections[next_link]; + i_next_next_switch = get_next_switch(p_lash, i_next_switch, next_link); found = 0; for (i = 0; i < v->num_dependencies; i++) @@ -211,8 +211,7 @@ static void remove_semipermanent_depend_for_sp(lash_t * p_lash, int sw, output_link = switches[sw]->routing_table[dest_switch].out_link; if (sw != dest_switch) - i_next_switch = - switches[sw]->phys_connections[output_link]; + i_next_switch = get_next_switch(p_lash, sw, output_link); } } @@ -312,7 +311,7 @@ static void generate_cdg_for_sp(lash_t * p_lash, int sw, int dest_switch, cdg_vertex_t *v, *prev = NULL; output_link = switches[sw]->routing_table[dest_switch].out_link; - next_switch = switches[sw]->phys_connections[output_link]; + next_switch = get_next_switch(p_lash, sw, output_link); while (sw != dest_switch) { @@ -368,7 +367,7 @@ static void generate_cdg_for_sp(lash_t * p_lash, int sw, int dest_switch, if (sw != dest_switch) { CL_ASSERT(output_link != NONE); - next_switch = switches[sw]->phys_connections[output_link]; + next_switch = get_next_switch(p_lash, sw, output_link); } prev = v; @@ -384,7 +383,7 @@ static void set_temp_depend_to_permanent_for_sp(lash_t * p_lash, int sw, cdg_vertex_t *v; output_link = switches[sw]->routing_table[dest_switch].out_link; - next_switch = switches[sw]->phys_connections[output_link]; + next_switch = get_next_switch(p_lash, sw, output_link); while (sw != dest_switch) { v = cdg_vertex_matrix[lane][sw][next_switch]; @@ -399,8 +398,7 @@ static void set_temp_depend_to_permanent_for_sp(lash_t * p_lash, int sw, output_link = switches[sw]->routing_table[dest_switch].out_link; if (sw != dest_switch) - next_switch = - switches[sw]->phys_connections[output_link]; + next_switch = get_next_switch(p_lash, sw, output_link); } } @@ -414,7 +412,7 @@ static void remove_temp_depend_for_sp(lash_t * p_lash, int sw, int dest_switch, cdg_vertex_t *v; output_link = switches[sw]->routing_table[dest_switch].out_link; - next_switch = switches[sw]->phys_connections[output_link]; + next_switch = get_next_switch(p_lash, sw, output_link); while (sw != dest_switch) { v = cdg_vertex_matrix[lane][sw][next_switch]; @@ -439,8 +437,7 @@ static void remove_temp_depend_for_sp(lash_t * p_lash, int sw, int dest_switch, output_link = switches[sw]->routing_table[dest_switch].out_link; if (sw != dest_switch) - next_switch = - switches[sw]->phys_connections[output_link]; + next_switch = get_next_switch(p_lash, sw, output_link); } } @@ -502,10 +499,10 @@ static void balance_virtual_lanes(lash_t * p_lash, unsigned lanes_needed) generate_cdg_for_sp(p_lash, dest, src, min_filled_lane); output_link = p_lash->switches[src]->routing_table[dest].out_link; - next_switch = p_lash->switches[src]->phys_connections[output_link]; + next_switch = get_next_switch(p_lash, src, output_link); output_link2 = p_lash->switches[dest]->routing_table[src].out_link; - next_switch2 = p_lash->switches[dest]->phys_connections[output_link2]; + next_switch2 = get_next_switch(p_lash, dest, output_link2); CL_ASSERT(cdg_vertex_matrix[min_filled_lane][src][next_switch] != NULL); CL_ASSERT(cdg_vertex_matrix[min_filled_lane][dest][next_switch2] != NULL); @@ -652,7 +649,7 @@ static switch_t *switch_create(lash_t * p_lash, unsigned id, osm_switch_t * p_sw } if (osm_mesh_node_create(p_lash, sw)) - return -1; + return NULL; sw->p_sw = p_sw; if (p_sw) @@ -673,6 +670,8 @@ static void switch_delete(switch_t * sw) free(sw->phys_connections); if (sw->routing_table) free(sw->routing_table); + if (sw->p_sw) + sw->p_sw->priv = NULL; free(sw); } @@ -875,9 +874,8 @@ static int lash_core(lash_t * p_lash) output_link2 = switches[dest_switch]->routing_table[i].out_link; - i_next_switch = switches[i]->phys_connections[output_link]; - i_next_switch2 = - switches[dest_switch]->phys_connections[output_link2]; + i_next_switch = get_next_switch(p_lash, i, output_link); + i_next_switch2 = get_next_switch(p_lash, dest_switch, output_link2); CL_ASSERT(p_lash-> cdg_vertex_matrix[v_lane][i][i_next_switch] != @@ -1205,7 +1203,7 @@ static void process_switches(lash_t * p_lash) osm_switch_t *p_sw, *p_next_sw; osm_subn_t *p_subn = &p_lash->p_osm->subn; - /* Go through each swithc and process it. i.e build the connection + /* Go through each switch and process it. i.e build the connection structure required by LASH */ p_next_sw = (osm_switch_t *) cl_qmap_head(&p_subn->sw_guid_tbl); while (p_next_sw != (osm_switch_t *) cl_qmap_end(&p_subn->sw_guid_tbl)) { @@ -1229,7 +1227,9 @@ static int lash_process(void *context) // everything starts here lash_cleanup(p_lash); - discover_network_properties(p_lash); + return_status = discover_network_properties(p_lash); + if (return_status != IB_SUCCESS) + goto Exit; return_status = init_lash_structures(p_lash); if (return_status != IB_SUCCESS) From sashak at voltaire.com Tue Nov 11 12:42:14 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 11 Nov 2008 22:42:14 +0200 Subject: [ofa-general] Re: [opensm patch] support dump_conf command in opensm console In-Reply-To: <1226353351.13603.42.camel@cardanus.llnl.gov> References: <1225759191.7307.9.camel@cardanus.llnl.gov> <20081109172518.GG30588@sashak.voltaire.com> <1226338962.13603.21.camel@cardanus.llnl.gov> <1226351033.13603.23.camel@cardanus.llnl.gov> <1226353351.13603.42.camel@cardanus.llnl.gov> Message-ID: <20081111204214.GC8894@sashak.voltaire.com> On 13:42 Mon 10 Nov , Al Chu wrote: > Hey Sasha, > > Sorry. Repost patch w/ the right Author. > > Al > > On Mon, 2008-11-10 at 13:03 -0800, Al Chu wrote: > > Hey Sasha, > > > > Attached is the re-worked patch. Assumes changes from my "fix qos > > config parsing bugs" patch are accepted. > > > > Al > > > > On Mon, 2008-11-10 at 09:42 -0800, Al Chu wrote: > > > Hey Sasha, > > > > > > On Sun, 2008-11-09 at 19:25 +0200, Sasha Khapyorsky wrote: > > > > Hi Al, > > > > > > > > On 16:39 Mon 03 Nov , Al Chu wrote: > > > > > Hey Sasha, > > > > > > > > > > When config files are rescanned and loaded, there's no way to know if > > > > > the right configuration was actually reloaded or not. A console command > > > > > to dump the current config is a useful way to verify the loading of new > > > > > configs or not. > > > > > > > > > > This patch assumes the fixes from my "fix qos config parsing bugs" is > > > > > accepted. > > > > > > > > Didn't pass over it, sorry about delay. > > > > > > > > > > > > > > Al > > > > > > > > > > -- > > > > > Albert Chu > > > > > chu11 at llnl.gov > > > > > Computer Scientist > > > > > High Performance Systems Division > > > > > Lawrence Livermore National Laboratory > > > > > > > > > From 249607e47ec7ef1b92f9578cece90460418d12b8 Mon Sep 17 00:00:00 2001 > > > > > From: Albert Chu > > > > > Date: Mon, 3 Nov 2008 16:22:29 -0800 > > > > > Subject: [PATCH] support dump_conf console command > > > > > > > > > > > > > > > Signed-off-by: Albert Chu Rebased against master and applied. Thanks. Sasha From rpearson at systemfabricworks.com Tue Nov 11 13:32:56 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Tue, 11 Nov 2008 15:32:56 -0600 Subject: [ofa-general] [PATCH][10] opensm: hook mesh code into lash code Message-ID: <009d01c94445$10387f20$30a97d60$@com> Sasha, Here is the tenth patch implementing the mesh analysis algorithm. This patch - hooks mesh code into lash - replaces sw->phys_connections by the equivalent switch->node->links - replaces sw->num_connections by the equivalent switch->node->num_links - replaces sw->virtual_physical_port_table by switch->node->links[]->ports When the do_mesh_analysis flag is not set there is no change to the function except To replace the variables with variables in node that have the same size. In this Case the port table in link_t will always have just one port. When the do_mesh_analysis flag is set multiple physical links will collapse to a Single logical link with a port list with more than one element. - fixed bug, mesh not set in osm_do_mesh_analysis - rewrote connect switches to use variables in node - in log Lane requirements (%d) exceed available lanes (%d) Arguments were reversed, fixed - compute physical egress port in routine get_next_port Which will use round robin if there are more than one Physical links between switches Regards, Bob Pearson Signed-off-by: Bob Pearson ---- diff --git a/opensm/include/opensm/osm_ucast_lash.h b/opensm/include/opensm/osm_ucast_lash.h index c037571..f3bde5d 100644 --- a/opensm/include/opensm/osm_ucast_lash.h +++ b/opensm/include/opensm/osm_ucast_lash.h @@ -82,9 +82,6 @@ typedef struct _switch { unsigned lane; } *routing_table; mesh_node_t *node; - unsigned int num_connections; - int *virtual_physical_port_table; - int *phys_connections; } switch_t; typedef struct _lash { diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c index a248522..fea9237 100644 --- a/opensm/opensm/osm_mesh.c +++ b/opensm/opensm/osm_mesh.c @@ -1080,6 +1080,8 @@ int osm_do_mesh_analysis(lash_t *p_lash) return -1; } + mesh = p_lash->mesh; + /* * get local metric and invariant for each switch * also classify each switch diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c index 95dbcc2..34a4a62 100644 --- a/opensm/opensm/osm_ucast_lash.c +++ b/opensm/opensm/osm_ucast_lash.c @@ -67,16 +67,53 @@ static cdg_vertex_t *create_cdg_vertex(unsigned num_switches) static void connect_switches(lash_t * p_lash, int sw1, int sw2, int phy_port_1) { osm_log_t *p_log = &p_lash->p_osm->log; - unsigned num = p_lash->switches[sw1]->num_connections; + unsigned num = p_lash->switches[sw1]->node->num_links; + switch_t *s1 = p_lash->switches[sw1]; + mesh_node_t *node = s1->node; + switch_t *s2; + link_t *l; + int i; + + /* + * if doing mesh analysis: + * - do not consider connections to self + * - collapse multiple connections between + * pair of switches to a single locical link + */ + if (p_lash->p_osm->subn.opt.do_mesh_analysis) { + if (sw1 == sw2) + return; + + /* see if we are alredy linked to sw2 */ + for (i = 0; i < num; i++) { + l = node->links[i]; + + if (node->links[i]->switch_id == sw2) { + l->ports[l->num_ports++] = phy_port_1; + return; + } + } + } - p_lash->switches[sw1]->phys_connections[num] = sw2; - p_lash->switches[sw1]->virtual_physical_port_table[num] = phy_port_1; - p_lash->switches[sw1]->num_connections++; + l = node->links[num]; + l->switch_id = sw2; + l->link_id = -1; + l->ports[l->num_ports++] = phy_port_1; + + s2 = p_lash->switches[sw2]; + for (i = 0; i < s2->node->num_links; i++) { + if (s2->node->links[i]->switch_id == sw1) { + s2->node->links[i]->link_id = num; + l->link_id = i; + break; + } + } + + node->num_links++; OSM_LOG(p_log, OSM_LOG_VERBOSE, "LASH connect: %d, %d, %d\n", sw1, sw2, phy_port_1); - } static osm_switch_t *get_osm_switch_from_port(osm_port_t * port) @@ -148,7 +185,7 @@ static int cycle_exists(cdg_vertex_t * start, cdg_vertex_t * current, static inline int get_next_switch(lash_t *p_lash, int sw, int link) { - return p_lash->switches[sw]->phys_connections[link]; + return p_lash->switches[sw]->node->links[link]->switch_id; } static void remove_semipermanent_depend_for_sp(lash_t * p_lash, int sw, @@ -233,8 +270,8 @@ static int get_phys_connection(switch_t *sw, int switch_to) { unsigned int i = 0; - for (i = 0; i < sw->num_connections; i++) - if (sw->phys_connections[i] == switch_to) + for (i = 0; i < sw->node->num_links; i++) + if (sw->node->links[i]->switch_id == switch_to) return i; return i; } @@ -252,8 +289,8 @@ static void shortest_path(lash_t * p_lash, int ir) while (!cl_is_list_empty(&bfsq)) { dequeue(&bfsq, &sw); - for (i = 0; i < sw->num_connections; i++) { - swi = switches[sw->phys_connections[i]]; + for (i = 0; i < sw->node->num_links; i++) { + swi = switches[sw->node->links[i]->switch_id]; if (swi->q_state == UNQUEUED) { enqueue(&bfsq, swi); sw->dij_channels[sw->used_channels++] = swi->id; @@ -614,25 +651,8 @@ static switch_t *switch_create(lash_t * p_lash, unsigned id, osm_switch_t * p_sw return NULL; } - sw->virtual_physical_port_table = malloc(num_ports * sizeof(int)); - if (!sw->virtual_physical_port_table) { - free(sw->dij_channels); - free(sw); - return NULL; - } - - sw->phys_connections = malloc(num_ports * sizeof(int)); - if (!sw->phys_connections) { - free(sw->virtual_physical_port_table); - free(sw->dij_channels); - free(sw); - return NULL; - } - sw->routing_table = malloc(num_switches * sizeof(sw->routing_table[0])); if (!sw->routing_table) { - free(sw->phys_connections); - free(sw->virtual_physical_port_table); free(sw->dij_channels); free(sw); return NULL; @@ -643,11 +663,6 @@ static switch_t *switch_create(lash_t * p_lash, unsigned id, osm_switch_t * p_sw sw->routing_table[i].lane = NONE; } - for (i = 0; i < num_ports; i++) { - sw->virtual_physical_port_table[i] = -1; - sw->phys_connections[i] = NONE; - } - if (osm_mesh_node_create(p_lash, sw)) return NULL; @@ -664,10 +679,6 @@ static void switch_delete(switch_t * sw) if (sw->dij_channels) free(sw->dij_channels); - if (sw->virtual_physical_port_table) - free(sw->virtual_physical_port_table); - if (sw->phys_connections) - free(sw->phys_connections); if (sw->routing_table) free(sw->routing_table); if (sw->p_sw) @@ -972,7 +983,7 @@ Error_Not_Enough_Lanes: status = -1; OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 4D02: " "Lane requirements (%d) exceed available lanes (%d)\n", - p_lash->vl_min, lanes_needed); + lanes_needed, p_lash->vl_min); Exit: if (switch_bitmap) free(switch_bitmap); @@ -985,6 +996,21 @@ static unsigned get_lash_id(osm_switch_t * p_sw) return ((switch_t *) p_sw->priv)->id; } +int get_next_port(switch_t *sw, int link) +{ + link_t *l = sw->node->links[link]; + int port = l->next_port++; + + /* + * note if not doing mesh analysis + * then num_ports is always 1 + */ + if (l->next_port >= l->num_ports) + l->next_port = 0; + + return l->ports[port]; +} + static void populate_fwd_tbls(lash_t * p_lash) { osm_log_t *p_log = &p_lash->p_osm->log; @@ -1036,9 +1062,7 @@ static void populate_fwd_tbls(lash_t * p_lash) (uint8_t) sw-> routing_table[dst_lash_switch_id].out_link; uint8_t physical_egress_port = - (uint8_t) sw-> - virtual_physical_port_table - [lash_egress_port]; + get_next_port(sw, lash_egress_port); p_sw->lft_buf[lid] = physical_egress_port; OSM_LOG(p_log, OSM_LOG_VERBOSE, From rpearson at systemfabricworks.com Tue Nov 11 14:41:04 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Tue, 11 Nov 2008 16:41:04 -0600 Subject: [ofa-general] [PATCH][10] opensm: hook mesh code into lash (updated) Message-ID: <00ad01c9444e$96e5f300$c4b1d900$@com> Sasha, Here is the tenth patch implementing the mesh analysis algorithm. I am resending it because I inadvertently left a bug in the last version. This patch - hooks mesh code into lash - replaces sw->phys_connections by the equivalent switch->node->links - replaces sw->num_connections by the equivalent switch->node->num_links - replaces sw->virtual_physical_port_table by switch->node->links[]->ports When the do_mesh_analysis flag is not set there is no change to the function except To replace the variables with variables in node that have the same size. In this Case the port table in link_t will always have just one port. When the do_mesh_analysis flag is set multiple physical links will collapse to a Single logical link with a port list with more than one element. - fixed bug, mesh not set in osm_do_mesh_analysis - rewrote connect switches to use variables in node - in log Lane requirements (%d) exceed available lanes (%d) Arguments were reversed, fixed - compute physical egress port in routine get_next_port Which will use round robin if there are more than one Physical links between switches - changed printf's to OSM_LOG's in mesh.c Regards, Bob Pearson Signed-off-by: Bob Pearson ---- diff --git a/opensm/include/opensm/osm_ucast_lash.h b/opensm/include/opensm/osm_ucast_lash.h index c037571..f3bde5d 100644 --- a/opensm/include/opensm/osm_ucast_lash.h +++ b/opensm/include/opensm/osm_ucast_lash.h @@ -82,9 +82,6 @@ typedef struct _switch { unsigned lane; } *routing_table; mesh_node_t *node; - unsigned int num_connections; - int *virtual_physical_port_table; - int *phys_connections; } switch_t; typedef struct _lash { diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c index a248522..dbe3eeb 100644 --- a/opensm/opensm/osm_mesh.c +++ b/opensm/opensm/osm_mesh.c @@ -750,7 +750,7 @@ static void make_geometry(lash_t *p_lash, int sw) continue; if (l2 == -1) { - printf("ERROR no reverse link\n"); + OSM_LOG(p_log, OSM_LOG_DEBUG, "ERROR no reverse link\n"); continue; } @@ -919,6 +919,7 @@ static int reorder_links(lash_t *p_lash, int sw) */ static int measure_geometry(lash_t *p_lash, int seed) { + osm_log_t *p_log = &p_lash->p_osm->log; int i, j, k; int sw; switch_t *s, *s1; @@ -942,7 +943,7 @@ static int measure_geometry(lash_t *p_lash, int seed) assigned_axes++; } - printf("lash: %d/%d unassigned/assigned axes\n", unassigned_axes, assigned_axes); + OSM_LOG(p_log, OSM_LOG_DEBUG, "%d/%d unassigned/assigned axes\n", unassigned_axes, assigned_axes); do { change = 0; @@ -1069,8 +1070,7 @@ int osm_do_mesh_analysis(lash_t *p_lash) int i; mesh_t *mesh; switch_t *s; - - OSM_LOG_ENTER(p_log); + char buf[256], *p; /* * allocate per mesh data structures @@ -1080,6 +1080,8 @@ int osm_do_mesh_analysis(lash_t *p_lash) return -1; } + mesh = p_lash->mesh; + /* * get local metric and invariant for each switch * also classify each switch @@ -1099,36 +1101,41 @@ int osm_do_mesh_analysis(lash_t *p_lash) s = p_lash->switches[max_class_type]; - printf("lash: found %d node type%s\n", mesh->num_class, (mesh->num_class == 1)? "" : "s"); - printf("lash: %snode type is ", (mesh->num_class == 1)? "" : "most common "); + OSM_LOG(p_log, OSM_LOG_INFO, "found %d node type%s\n", mesh->num_class, (mesh->num_class == 1)? "" : "s"); + + p = buf; + p += sprintf( p, "%snode type is ", (mesh->num_class == 1)? "" : "most common "); if (s->node->type) { struct _mesh_info *t = &mesh_info[s->node->type]; for (i = 0; i < t->dimension; i++) { - printf("%s%d%s", i? "X" : "", t->size[i], + p += sprintf(p, "%s%d%s", i? " x " : "", t->size[i], (t->size[i] == 6)? "+" : ""); } - printf(" mesh\n"); + p += sprintf(p, " mesh\n"); p_lash->mesh->dimension = t->dimension; } else { - printf("unknown geometry\n"); + p += sprintf(p, "unknown geometry\n"); } + OSM_LOG(p_log, OSM_LOG_INFO, "%s", buf); + if (s->node->type) { make_geometry(p_lash, max_class_type); if (measure_geometry(p_lash, max_class_type)) return -1; - printf("lash: found "); + p = buf; + p += sprintf(p, "found "); for (i = 0; i < mesh->dimension; i++) - printf("%s%d", i? "X" : "", mesh->size[i]); - printf(" mesh\n"); - } + p += sprintf(p, "%s%d", i? " x " : "", mesh->size[i]); + p += sprintf(p, " mesh\n"); - OSM_LOG_EXIT(p_log); + OSM_LOG(p_log, OSM_LOG_INFO, "%s", buf); + } return 0; } diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c index 95dbcc2..660ad56 100644 --- a/opensm/opensm/osm_ucast_lash.c +++ b/opensm/opensm/osm_ucast_lash.c @@ -67,16 +67,53 @@ static cdg_vertex_t *create_cdg_vertex(unsigned num_switches) static void connect_switches(lash_t * p_lash, int sw1, int sw2, int phy_port_1) { osm_log_t *p_log = &p_lash->p_osm->log; - unsigned num = p_lash->switches[sw1]->num_connections; + unsigned num = p_lash->switches[sw1]->node->num_links; + switch_t *s1 = p_lash->switches[sw1]; + mesh_node_t *node = s1->node; + switch_t *s2; + link_t *l; + int i; + + /* + * if doing mesh analysis: + * - do not consider connections to self + * - collapse multiple connections between + * pair of switches to a single locical link + */ + if (p_lash->p_osm->subn.opt.do_mesh_analysis) { + if (sw1 == sw2) + return; + + /* see if we are alredy linked to sw2 */ + for (i = 0; i < num; i++) { + l = node->links[i]; + + if (node->links[i]->switch_id == sw2) { + l->ports[l->num_ports++] = phy_port_1; + return; + } + } + } + + l = node->links[num]; + l->switch_id = sw2; + l->link_id = -1; + l->ports[l->num_ports++] = phy_port_1; + + s2 = p_lash->switches[sw2]; + for (i = 0; i < s2->node->num_links; i++) { + if (s2->node->links[i]->switch_id == sw1) { + s2->node->links[i]->link_id = num; + l->link_id = i; + break; + } + } - p_lash->switches[sw1]->phys_connections[num] = sw2; - p_lash->switches[sw1]->virtual_physical_port_table[num] = phy_port_1; - p_lash->switches[sw1]->num_connections++; + node->num_links++; OSM_LOG(p_log, OSM_LOG_VERBOSE, "LASH connect: %d, %d, %d\n", sw1, sw2, phy_port_1); - } static osm_switch_t *get_osm_switch_from_port(osm_port_t * port) @@ -148,7 +185,7 @@ static int cycle_exists(cdg_vertex_t * start, cdg_vertex_t * current, static inline int get_next_switch(lash_t *p_lash, int sw, int link) { - return p_lash->switches[sw]->phys_connections[link]; + return p_lash->switches[sw]->node->links[link]->switch_id; } static void remove_semipermanent_depend_for_sp(lash_t * p_lash, int sw, @@ -233,8 +270,8 @@ static int get_phys_connection(switch_t *sw, int switch_to) { unsigned int i = 0; - for (i = 0; i < sw->num_connections; i++) - if (sw->phys_connections[i] == switch_to) + for (i = 0; i < sw->node->num_links; i++) + if (sw->node->links[i]->switch_id == switch_to) return i; return i; } @@ -252,8 +289,8 @@ static void shortest_path(lash_t * p_lash, int ir) while (!cl_is_list_empty(&bfsq)) { dequeue(&bfsq, &sw); - for (i = 0; i < sw->num_connections; i++) { - swi = switches[sw->phys_connections[i]]; + for (i = 0; i < sw->node->num_links; i++) { + swi = switches[sw->node->links[i]->switch_id]; if (swi->q_state == UNQUEUED) { enqueue(&bfsq, swi); sw->dij_channels[sw->used_channels++] = swi->id; @@ -614,25 +651,8 @@ static switch_t *switch_create(lash_t * p_lash, unsigned id, osm_switch_t * p_sw return NULL; } - sw->virtual_physical_port_table = malloc(num_ports * sizeof(int)); - if (!sw->virtual_physical_port_table) { - free(sw->dij_channels); - free(sw); - return NULL; - } - - sw->phys_connections = malloc(num_ports * sizeof(int)); - if (!sw->phys_connections) { - free(sw->virtual_physical_port_table); - free(sw->dij_channels); - free(sw); - return NULL; - } - sw->routing_table = malloc(num_switches * sizeof(sw->routing_table[0])); if (!sw->routing_table) { - free(sw->phys_connections); - free(sw->virtual_physical_port_table); free(sw->dij_channels); free(sw); return NULL; @@ -643,18 +663,13 @@ static switch_t *switch_create(lash_t * p_lash, unsigned id, osm_switch_t * p_sw sw->routing_table[i].lane = NONE; } - for (i = 0; i < num_ports; i++) { - sw->virtual_physical_port_table[i] = -1; - sw->phys_connections[i] = NONE; - } - - if (osm_mesh_node_create(p_lash, sw)) - return NULL; - sw->p_sw = p_sw; if (p_sw) p_sw->priv = sw; + if (osm_mesh_node_create(p_lash, sw)) + return NULL; + return sw; } @@ -664,10 +679,6 @@ static void switch_delete(switch_t * sw) if (sw->dij_channels) free(sw->dij_channels); - if (sw->virtual_physical_port_table) - free(sw->virtual_physical_port_table); - if (sw->phys_connections) - free(sw->phys_connections); if (sw->routing_table) free(sw->routing_table); if (sw->p_sw) @@ -972,7 +983,7 @@ Error_Not_Enough_Lanes: status = -1; OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 4D02: " "Lane requirements (%d) exceed available lanes (%d)\n", - p_lash->vl_min, lanes_needed); + lanes_needed, p_lash->vl_min); Exit: if (switch_bitmap) free(switch_bitmap); @@ -985,6 +996,21 @@ static unsigned get_lash_id(osm_switch_t * p_sw) return ((switch_t *) p_sw->priv)->id; } +int get_next_port(switch_t *sw, int link) +{ + link_t *l = sw->node->links[link]; + int port = l->next_port++; + + /* + * note if not doing mesh analysis + * then num_ports is always 1 + */ + if (l->next_port >= l->num_ports) + l->next_port = 0; + + return l->ports[port]; +} + static void populate_fwd_tbls(lash_t * p_lash) { osm_log_t *p_log = &p_lash->p_osm->log; @@ -1036,9 +1062,7 @@ static void populate_fwd_tbls(lash_t * p_lash) (uint8_t) sw-> routing_table[dst_lash_switch_id].out_link; uint8_t physical_egress_port = - (uint8_t) sw-> - virtual_physical_port_table - [lash_egress_port]; + get_next_port(sw, lash_egress_port); p_sw->lft_buf[lid] = physical_egress_port; OSM_LOG(p_log, OSM_LOG_VERBOSE, From rpearson at systemfabricworks.com Tue Nov 11 14:44:08 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Tue, 11 Nov 2008 16:44:08 -0600 Subject: [ofa-general] mesh analysis patch done. Message-ID: <00ae01c9444f$02702960$07507c20$@com> Forgot to mention that the 10th patch was the last one. Take a look when you get a chance. Regards, Bob Pearson From chu11 at llnl.gov Tue Nov 11 15:57:52 2008 From: chu11 at llnl.gov (Al Chu) Date: Tue, 11 Nov 2008 15:57:52 -0800 Subject: [ofa-general] Re: [opensm patch][1/2] fix qos config parsing bugs In-Reply-To: <20081111191958.GA8894@sashak.voltaire.com> References: <1225404078.1197.533.camel@cardanus.llnl.gov> <20081111191958.GA8894@sashak.voltaire.com> Message-ID: <1226447872.6239.2.camel@cardanus.llnl.gov> Hey Sasha, On Tue, 2008-11-11 at 21:19 +0200, Sasha Khapyorsky wrote: > Hi Al, > > On 15:01 Thu 30 Oct , Al Chu wrote: > > > > I found a bunch of qos config parsing issues, listed below: > > > > 1) > > > > If the user sets the qos default fields (i.e. qos_high_limit, > > qos_vlarb_high. etc.), but do not have the qos_ca, qos_swe, qos_rtr, > > etc. equivalent fields listed (i.e. qos_ca_high_limit, > > qos_sw0_vlarb_high), the values set in teh qos default fields are not > > loaded into the CAs, switches, etc. The reason is in qos_build_config() > > we load defaults like this: > > > > p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; > > > > but we always set the fields to something non-NULL. > > > > static void subn_set_default_qos_options(IN osm_qos_options_t * opt) > > { > > opt->max_vls = OSM_DEFAULT_QOS_MAX_VLS; > > opt->high_limit = OSM_DEFAULT_QOS_HIGH_LIMIT; > > opt->vlarb_high = OSM_DEFAULT_QOS_VLARB_HIGH; > > opt->vlarb_low = OSM_DEFAULT_QOS_VLARB_LOW; > > opt->sl2vl = OSM_DEFAULT_QOS_SL2VL; > > } > > Yes, we are setting this to the default qos set (if not explicitly > specified by user). So finally we always have valid set. No? Sorry, I may have not explained it well. Lets say I do this in the config file. qos_vlarb_high FOOBAR # qos_ca_vlarb_high BLAH qos_swe_vlarb_high XYZZY I currently expect qos_ca_vlarb_high to use the value of FOOBAR because I commented out the field. But it uses OSM_DEFAULT_QOS_HIGH_LIMIT instead. The reason is because qos_build_config() checks for NULL to use default vs. non-default values. p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; Under the above situation where I've commented out veral fields, opt- >vlarb_high is always non-NULL b/c it was set to OSM_DEFAULT_QOS_HIGH_LIMIT. Thus OSM_DEFAULT_QOS_HIGH_LIMIT is used instead of FOOBAR. > > 2) > > > > In qos_build_config() we load the high_limit like this: > > > > cfg->vl_high_limit = (uint8_t) opt->high_limit; > > > > So there is no way to tell the qos_ca, qos_swe, qos_rtr, etc. high_limit > > options to "go back to" the default high_limit. It just assumes that > > whatever is input (or was set by default) is what you should use. > > Right. What is a limitation here? That an user cannot set this to > "no value"? But she/he can just skip it. Similar to the above issue, lets say I want to do: qos_high_limit 8 # qos_ca_high_limit 15 # qos_swe_high_limit 15 I want qos_ca_high_limit and qos_swe_high_limit to use whatever I set in qos_high_limit. But the code doesn't allow for this. > > > 3) > > > > Some fields like qos_vlarb_high are assumed to be correctly set and can > > segfault opensm. > > qos_build_config() assumes that valid parameters are used. And we are > using this this way (I hope :)) (finally it is not library API). I think the issue is the osm_subnet.c code did not properly check all inputs, and subsequently some inputs used in qos_build_config() were bad. I think qos_vlarb_high (null) was something I tried that opensm seg-faulted on. > > The attached patch fixes these up. Obviously there's tons of ways to > > do this. I decided to ... > > > > A) only initialization qos_options to the real defaults > > > > B) init all qos_*_options to sentinel values (-1, NULL, etc.) to > > indicate it should use the configured defaults if they aren't set by the > > user. The high_limit was changed from an unsigned to an int b/c 0 is a > > valid high_limit value. > > > > C) verify that the default qos inputs are definitely correct (i.e. can't > > be NULL). Reset to hard coded defaults if need be. > > > > D) load the default vs. non-default appropriately in QoS. > > And I see that we have here much more sometimes not-trivial flows and > default values are spread over many places... :( I will admit its possible that I'm fixing something that shouldn't be fixed in the code but only in the documentation. Currently, the documentation indicates to me the behavior I describe above. Should we instead tell the user they must set each of the qos_ca*, qos_swe*, etc. fields respectively and cannot assume the "default" fields can be used to set those other fields? Perhaps we should just remove those "default" fields?? Al > Sasha > > > > > Al > > > > P.S. This patch does not rely on my previous "remove qos_max_vls > > config" patch. I assume we're keeping the max_vls fields in this patch. > > > > -- > > Albert Chu > > chu11 at llnl.gov > > Computer Scientist > > High Performance Systems Division > > Lawrence Livermore National Laboratory > > > From 00a15a1797b79fd5e3298d98742b6da3613fb9c3 Mon Sep 17 00:00:00 2001 > > From: root > > Date: Thu, 30 Oct 2008 09:32:29 -0700 > > Subject: [PATCH] fix qos config parsing bugs > > > > > > Signed-off-by: root > > --- > > opensm/include/opensm/osm_subnet.h | 12 +- > > opensm/opensm/osm_qos.c | 6 +- > > opensm/opensm/osm_subnet.c | 467 ++++++++++++++++++++++-------------- > > 3 files changed, 293 insertions(+), 192 deletions(-) > > > > diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h > > index 7259587..11063b7 100644 > > --- a/opensm/include/opensm/osm_subnet.h > > +++ b/opensm/include/opensm/osm_subnet.h > > @@ -99,7 +99,7 @@ struct osm_qos_policy; > > */ > > typedef struct osm_qos_options { > > unsigned max_vls; > > - unsigned high_limit; > > + int high_limit; > > char *vlarb_high; > > char *vlarb_low; > > char *sl2vl; > > @@ -108,20 +108,20 @@ typedef struct osm_qos_options { > > * FIELDS > > * > > * max_vls > > -* The number of maximum VLs on the Subnet > > +* The number of maximum VLs on the Subnet (0 == use default) > > * > > * high_limit > > * The limit of High Priority component of VL Arbitration > > -* table (IBA 7.6.9) > > +* table (IBA 7.6.9) (-1 == use default) > > * > > * vlarb_high > > -* High priority VL Arbitration table template. > > +* High priority VL Arbitration table template. (NULL == use default) > > * > > * vlarb_low > > -* Low priority VL Arbitration table template. > > +* Low priority VL Arbitration table template. (NULL == use default) > > * > > * sl2vl > > -* SL2VL Mapping table (IBA 7.6.6) template. > > +* SL2VL Mapping table (IBA 7.6.6) template. (NULL == use default) > > * > > *********/ > > > > diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c > > index 1679ae0..b451c25 100644 > > --- a/opensm/opensm/osm_qos.c > > +++ b/opensm/opensm/osm_qos.c > > @@ -382,7 +382,11 @@ static void qos_build_config(struct qos_config *cfg, > > memset(cfg, 0, sizeof(*cfg)); > > > > cfg->max_vls = opt->max_vls > 0 ? opt->max_vls : dflt->max_vls; > > - cfg->vl_high_limit = (uint8_t) opt->high_limit; > > + > > + if (opt->high_limit >= 0) > > + cfg->vl_high_limit = (uint8_t) opt->high_limit; > > + else > > + cfg->vl_high_limit = (uint8_t) dflt->high_limit; > > > > p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; > > for (i = 0; i < 2 * IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; i++) { > > diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c > > index 0422d0f..ab2ff9c 100644 > > --- a/opensm/opensm/osm_subnet.c > > +++ b/opensm/opensm/osm_subnet.c > > @@ -370,6 +370,15 @@ static void subn_set_default_qos_options(IN osm_qos_options_t * opt) > > opt->sl2vl = OSM_DEFAULT_QOS_SL2VL; > > } > > > > +static void subn_init_qos_options(IN osm_qos_options_t * opt) > > +{ > > + opt->max_vls = 0; > > + opt->high_limit = -1; > > + opt->vlarb_high = NULL; > > + opt->vlarb_low = NULL; > > + opt->sl2vl = NULL; > > +} > > + > > /********************************************************************** > > **********************************************************************/ > > void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) > > @@ -458,10 +467,10 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) > > p_opt->prefix_routes_file = OSM_DEFAULT_PREFIX_ROUTES_FILE; > > p_opt->consolidate_ipv6_snm_req = FALSE; > > subn_set_default_qos_options(&p_opt->qos_options); > > - subn_set_default_qos_options(&p_opt->qos_ca_options); > > - subn_set_default_qos_options(&p_opt->qos_sw0_options); > > - subn_set_default_qos_options(&p_opt->qos_swe_options); > > - subn_set_default_qos_options(&p_opt->qos_rtr_options); > > + subn_init_qos_options(&p_opt->qos_ca_options); > > + subn_init_qos_options(&p_opt->qos_sw0_options); > > + subn_init_qos_options(&p_opt->qos_swe_options); > > + subn_init_qos_options(&p_opt->qos_rtr_options); > > } > > > > /********************************************************************** > > @@ -497,6 +506,7 @@ opts_unpack_net64(IN char *p_req_key, > > } > > } > > > > + > > /********************************************************************** > > **********************************************************************/ > > static void > > @@ -511,6 +521,20 @@ opts_unpack_uint32(IN char *p_req_key, > > } > > } > > } > > +/********************************************************************** > > + **********************************************************************/ > > +static void > > +opts_unpack_int32(IN char *p_req_key, > > + IN char *p_key, IN char *p_val_str, IN int32_t * p_val) > > +{ > > + if (!strcmp(p_req_key, p_key)) { > > + int32_t val = strtol(p_val_str, NULL, 0); > > + if (val != *p_val) { > > + log_config_value(p_key, "%d", val); > > + *p_val = val; > > + } > > + } > > +} > > > > /********************************************************************** > > **********************************************************************/ > > @@ -641,7 +665,7 @@ subn_parse_qos_options(IN const char *prefix, > > snprintf(name, sizeof(name), "%s_max_vls", prefix); > > opts_unpack_uint32(name, p_key, p_val_str, &opt->max_vls); > > snprintf(name, sizeof(name), "%s_high_limit", prefix); > > - opts_unpack_uint32(name, p_key, p_val_str, &opt->high_limit); > > + opts_unpack_int32(name, p_key, p_val_str, &opt->high_limit); > > snprintf(name, sizeof(name), "%s_vlarb_high", prefix); > > opts_unpack_charp(name, p_key, p_val_str, &opt->vlarb_high); > > snprintf(name, sizeof(name), "%s_vlarb_low", prefix); > > @@ -653,7 +677,9 @@ subn_parse_qos_options(IN const char *prefix, > > static int > > subn_dump_qos_options(FILE * file, > > const char *set_name, > > - const char *prefix, osm_qos_options_t * opt) > > + const char *prefix, > > + osm_qos_options_t * opt, > > + osm_qos_options_t * dflt) > > { > > return fprintf(file, "# %s\n" > > "%s_max_vls %u\n" > > @@ -662,10 +688,11 @@ subn_dump_qos_options(FILE * file, > > "%s_vlarb_low %s\n" > > "%s_sl2vl %s\n", > > set_name, > > - prefix, opt->max_vls, > > - prefix, opt->high_limit, > > - prefix, opt->vlarb_high, > > - prefix, opt->vlarb_low, prefix, opt->sl2vl); > > + prefix, opt->max_vls > 0 ? opt->max_vls : dflt->max_vls, > > + prefix, opt->high_limit >= 0 ? opt->high_limit : dflt->high_limit, > > + prefix, opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high, > > + prefix, opt->vlarb_low ? opt->vlarb_low : dflt->vlarb_low, > > + prefix, opt->sl2vl ? opt->sl2vl : dflt->sl2vl); > > } > > > > /********************************************************************** > > @@ -833,169 +860,182 @@ int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) > > /********************************************************************** > > **********************************************************************/ > > > > -static void subn_verify_max_vls(IN unsigned *max_vls, IN char *key) > > +static void subn_verify_max_vls(IN unsigned *max_vls, IN char *key, IN unsigned dflt) > > { > > char buff[128]; > > > > - if (*max_vls > 15) { > > + if (!(*max_vls) || *max_vls > 15) { > > sprintf(buff, " Invalid Cached Option:%s=%u:" > > - "Using Default:%u\n", > > - key, *max_vls, OSM_DEFAULT_QOS_MAX_VLS); > > + "Using Default\n", > > + key, *max_vls); > > printf(buff); > > cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > > - *max_vls = OSM_DEFAULT_QOS_MAX_VLS; > > + *max_vls = dflt; > > } > > } > > > > -static void subn_verify_high_limit(IN unsigned *high_limit, IN char *key) > > +static void subn_verify_high_limit(IN int *high_limit, IN char *key, IN int dflt) > > { > > char buff[128]; > > > > - if (*high_limit > 255) { > > - sprintf(buff, " Invalid Cached Option:%s=%u:" > > - "Using Default:%u\n", > > - key, *high_limit, OSM_DEFAULT_QOS_HIGH_LIMIT); > > + if (*high_limit < 0 || *high_limit > 255) { > > + sprintf(buff, " Invalid Cached Option:%s=%d:" > > + "Using Default\n", key, *high_limit); > > printf(buff); > > cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > > - *high_limit = OSM_DEFAULT_QOS_HIGH_LIMIT; > > + *high_limit = dflt; > > } > > } > > > > -static void subn_verify_vlarb(IN char *vlarb, IN char *key) > > +static void subn_verify_vlarb(IN char **vlarb, IN char *key, IN char *dflt) > > { > > - if (vlarb) { > > - char buff[128]; > > - char *str, *tok, *end, *ptr; > > - int count = 0; > > - > > - str = (char *)malloc(strlen(vlarb) + 1); > > - strcpy(str, vlarb); > > - > > - tok = strtok_r(str, ",\n", &ptr); > > - while (tok) { > > - char *vl_str, *weight_str; > > - > > - vl_str = tok; > > - weight_str = strchr(tok, ':'); > > - > > - if (weight_str) { > > - long vl, weight; > > - > > - *weight_str = '\0'; > > - weight_str++; > > - > > - vl = strtol(vl_str, &end, 0); > > - > > - if (*end) { > > - sprintf(buff, > > - " Warning: Cached Option %s:vl=%s improperly formatted\n", > > - key, vl_str); > > - printf(buff); > > - cl_log_event("OpenSM", CL_LOG_INFO, > > - buff, NULL, 0); > > - } else if (vl < 0 || vl > 14) { > > - sprintf(buff, > > - " Warning: Cached Option %s:vl=%ld out of range\n", > > - key, vl); > > - printf(buff); > > - cl_log_event("OpenSM", CL_LOG_INFO, > > - buff, NULL, 0); > > - } > > - > > - weight = strtol(weight_str, &end, 0); > > - > > - if (*end) { > > - sprintf(buff, > > - " Warning: Cached Option %s:weight=%s improperly formatted\n", > > - key, weight_str); > > - printf(buff); > > - cl_log_event("OpenSM", CL_LOG_INFO, > > - buff, NULL, 0); > > - } else if (weight < 0 || weight > 255) { > > - sprintf(buff, > > - " Warning: Cached Option %s:weight=%ld out of range\n", > > - key, weight); > > - printf(buff); > > - cl_log_event("OpenSM", CL_LOG_INFO, > > - buff, NULL, 0); > > - } > > - } else { > > - sprintf(buff, > > - " Warning: Cached Option %s:vl:weight=%s improperly formatted\n", > > - key, tok); > > - printf(buff); > > - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, > > - 0); > > - } > > + char buff[128]; > > + char *str, *tok, *end, *ptr; > > + int count = 0; > > > > - count++; > > - tok = strtok_r(NULL, ",\n", &ptr); > > - } > > + if (*vlarb == NULL) { > > + sprintf(buff, " Invalid Cached Option:%s:" > > + "Using Default\n", key); > > + printf(buff); > > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > > + (*vlarb) = dflt; > > + return; > > + } > > > > - if (count > 64) { > > - sprintf(buff, > > - " Warning: Cached Option %s: > 64 listed: " > > - "excess vl:weight pairs will be dropped\n", > > - key); > > - printf(buff); > > - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > > - } > > + str = (char *)malloc(strlen(*vlarb) + 1); > > + strcpy(str, *vlarb); > > > > - free(str); > > - } > > -} > > + tok = strtok_r(str, ",\n", &ptr); > > + while (tok) { > > + char *vl_str, *weight_str; > > > > -static void subn_verify_sl2vl(IN char *sl2vl, IN char *key) > > -{ > > - if (sl2vl) { > > - char buff[128]; > > - char *str, *tok, *end, *ptr; > > - int count = 0; > > + vl_str = tok; > > + weight_str = strchr(tok, ':'); > > > > - str = (char *)malloc(strlen(sl2vl) + 1); > > - strcpy(str, sl2vl); > > + if (weight_str) { > > + long vl, weight; > > > > - tok = strtok_r(str, ",\n", &ptr); > > - while (tok) { > > - long vl = strtol(tok, &end, 0); > > + *weight_str = '\0'; > > + weight_str++; > > + > > + vl = strtol(vl_str, &end, 0); > > > > if (*end) { > > sprintf(buff, > > " Warning: Cached Option %s:vl=%s improperly formatted\n", > > - key, tok); > > + key, vl_str); > > printf(buff); > > - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, > > - 0); > > - } else if (vl < 0 || vl > 15) { > > + cl_log_event("OpenSM", CL_LOG_INFO, > > + buff, NULL, 0); > > + } else if (vl < 0 || vl > 14) { > > sprintf(buff, > > " Warning: Cached Option %s:vl=%ld out of range\n", > > key, vl); > > printf(buff); > > - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, > > - 0); > > + cl_log_event("OpenSM", CL_LOG_INFO, > > + buff, NULL, 0); > > } > > > > - count++; > > - tok = strtok_r(NULL, ",\n", &ptr); > > - } > > + weight = strtol(weight_str, &end, 0); > > > > - if (count < 16) { > > + if (*end) { > > + sprintf(buff, > > + " Warning: Cached Option %s:weight=%s improperly formatted\n", > > + key, weight_str); > > + printf(buff); > > + cl_log_event("OpenSM", CL_LOG_INFO, > > + buff, NULL, 0); > > + } else if (weight < 0 || weight > 255) { > > + sprintf(buff, > > + " Warning: Cached Option %s:weight=%ld out of range\n", > > + key, weight); > > + printf(buff); > > + cl_log_event("OpenSM", CL_LOG_INFO, > > + buff, NULL, 0); > > + } > > + } else { > > sprintf(buff, > > - " Warning: Cached Option %s: < 16 VLs listed\n", > > - key); > > + " Warning: Cached Option %s:vl:weight=%s improperly formatted\n", > > + key, tok); > > printf(buff); > > - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, > > + 0); > > } > > - if (count > 16) { > > + > > + count++; > > + tok = strtok_r(NULL, ",\n", &ptr); > > + } > > + > > + if (count > 64) { > > + sprintf(buff, > > + " Warning: Cached Option %s: > 64 listed: " > > + "excess vl:weight pairs will be dropped\n", > > + key); > > + printf(buff); > > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > > + } > > + > > + free(str); > > +} > > + > > +static void subn_verify_sl2vl(IN char **sl2vl, IN char *key, IN char *dflt) > > +{ > > + char buff[128]; > > + char *str, *tok, *end, *ptr; > > + int count = 0; > > + > > + if (*sl2vl == NULL) { > > + sprintf(buff, " Invalid Cached Option:%s:" > > + "Using Default\n", key); > > + printf(buff); > > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > > + (*sl2vl) = dflt; > > + return; > > + } > > + > > + str = (char *)malloc(strlen(*sl2vl) + 1); > > + strcpy(str, *sl2vl); > > + > > + tok = strtok_r(str, ",\n", &ptr); > > + while (tok) { > > + long vl = strtol(tok, &end, 0); > > + > > + if (*end) { > > sprintf(buff, > > - " Warning: Cached Option %s: > 16 listed: " > > - "excess VLs will be dropped\n", key); > > + " Warning: Cached Option %s:vl=%s improperly formatted\n", > > + key, tok); > > printf(buff); > > - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, > > + 0); > > + } else if (vl < 0 || vl > 15) { > > + sprintf(buff, > > + " Warning: Cached Option %s:vl=%ld out of range\n", > > + key, vl); > > + printf(buff); > > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, > > + 0); > > } > > > > - free(str); > > + count++; > > + tok = strtok_r(NULL, ",\n", &ptr); > > + } > > + > > + if (count < 16) { > > + sprintf(buff, > > + " Warning: Cached Option %s: < 16 VLs listed\n", > > + key); > > + printf(buff); > > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > > } > > + if (count > 16) { > > + sprintf(buff, > > + " Warning: Cached Option %s: > 16 listed: " > > + "excess VLs will be dropped\n", key); > > + printf(buff); > > + cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); > > + } > > + > > + free(str); > > } > > > > static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) > > @@ -1046,61 +1086,113 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) > > } > > > > if (p_opts->qos) { > > + /* the default options in qos_options must be correct. > > + * every other one need not be, b/c those will default > > + * back to whatever is in qos_options. > > + */ > > + > > subn_verify_max_vls(&(p_opts->qos_options.max_vls), > > - "qos_max_vls"); > > - subn_verify_max_vls(&(p_opts->qos_ca_options.max_vls), > > - "qos_ca_max_vls"); > > - subn_verify_max_vls(&(p_opts->qos_sw0_options.max_vls), > > - "qos_sw0_max_vls"); > > - subn_verify_max_vls(&(p_opts->qos_swe_options.max_vls), > > - "qos_swe_max_vls"); > > - subn_verify_max_vls(&(p_opts->qos_rtr_options.max_vls), > > - "qos_rtr_max_vls"); > > + "qos_max_vls", > > + OSM_DEFAULT_MAX_OP_VLS); > > + if (p_opts->qos_ca_options.max_vls) > > + subn_verify_max_vls(&(p_opts->qos_ca_options.max_vls), > > + "qos_ca_max_vls", > > + 0); > > + if (p_opts->qos_sw0_options.max_vls) > > + subn_verify_max_vls(&(p_opts->qos_sw0_options.max_vls), > > + "qos_sw0_max_vls", > > + 0); > > + if (p_opts->qos_swe_options.max_vls) > > + subn_verify_max_vls(&(p_opts->qos_swe_options.max_vls), > > + "qos_swe_max_vls", > > + 0); > > + if (p_opts->qos_rtr_options.max_vls) > > + subn_verify_max_vls(&(p_opts->qos_rtr_options.max_vls), > > + "qos_rtr_max_vls", > > + 0); > > > > subn_verify_high_limit(&(p_opts->qos_options.high_limit), > > - "qos_high_limit"); > > - subn_verify_high_limit(&(p_opts->qos_ca_options.high_limit), > > - "qos_ca_high_limit"); > > - subn_verify_high_limit(& > > - (p_opts->qos_sw0_options.high_limit), > > - "qos_sw0_high_limit"); > > - subn_verify_high_limit(& > > - (p_opts->qos_swe_options.high_limit), > > - "qos_swe_high_limit"); > > - subn_verify_high_limit(& > > - (p_opts->qos_rtr_options.high_limit), > > - "qos_rtr_high_limit"); > > - > > - subn_verify_vlarb(p_opts->qos_options.vlarb_low, > > - "qos_vlarb_low"); > > - subn_verify_vlarb(p_opts->qos_ca_options.vlarb_low, > > - "qos_ca_vlarb_low"); > > - subn_verify_vlarb(p_opts->qos_sw0_options.vlarb_low, > > - "qos_sw0_vlarb_low"); > > - subn_verify_vlarb(p_opts->qos_swe_options.vlarb_low, > > - "qos_swe_vlarb_low"); > > - subn_verify_vlarb(p_opts->qos_rtr_options.vlarb_low, > > - "qos_rtr_vlarb_low"); > > - > > - subn_verify_vlarb(p_opts->qos_options.vlarb_high, > > - "qos_vlarb_high"); > > - subn_verify_vlarb(p_opts->qos_ca_options.vlarb_high, > > - "qos_ca_vlarb_high"); > > - subn_verify_vlarb(p_opts->qos_sw0_options.vlarb_high, > > - "qos_sw0_vlarb_high"); > > - subn_verify_vlarb(p_opts->qos_swe_options.vlarb_high, > > - "qos_swe_vlarb_high"); > > - subn_verify_vlarb(p_opts->qos_rtr_options.vlarb_high, > > - "qos_rtr_vlarb_high"); > > - > > - subn_verify_sl2vl(p_opts->qos_options.sl2vl, "qos_sl2vl"); > > - subn_verify_sl2vl(p_opts->qos_ca_options.sl2vl, "qos_ca_sl2vl"); > > - subn_verify_sl2vl(p_opts->qos_sw0_options.sl2vl, > > - "qos_sw0_sl2vl"); > > - subn_verify_sl2vl(p_opts->qos_swe_options.sl2vl, > > - "qos_swe_sl2vl"); > > - subn_verify_sl2vl(p_opts->qos_rtr_options.sl2vl, > > - "qos_rtr_sl2vl"); > > + "qos_high_limit", > > + OSM_DEFAULT_QOS_HIGH_LIMIT); > > + if (p_opts->qos_ca_options.high_limit >= 0) > > + subn_verify_high_limit(&(p_opts->qos_ca_options.high_limit), > > + "qos_ca_high_limit", > > + -1); > > + if (p_opts->qos_sw0_options.high_limit >= 0) > > + subn_verify_high_limit(& > > + (p_opts->qos_sw0_options.high_limit), > > + "qos_sw0_high_limit", > > + -1); > > + if (p_opts->qos_swe_options.high_limit >= 0) > > + subn_verify_high_limit(& > > + (p_opts->qos_swe_options.high_limit), > > + "qos_swe_high_limit", > > + -1); > > + if (p_opts->qos_rtr_options.high_limit >= 0) > > + subn_verify_high_limit(& > > + (p_opts->qos_rtr_options.high_limit), > > + "qos_rtr_high_limit", > > + -1); > > + > > + subn_verify_vlarb(&(p_opts->qos_options.vlarb_low), > > + "qos_vlarb_low", > > + OSM_DEFAULT_QOS_VLARB_LOW); > > + if (p_opts->qos_ca_options.vlarb_low) > > + subn_verify_vlarb(&(p_opts->qos_ca_options.vlarb_low), > > + "qos_ca_vlarb_low", > > + NULL); > > + if (p_opts->qos_sw0_options.vlarb_low) > > + subn_verify_vlarb(&(p_opts->qos_sw0_options.vlarb_low), > > + "qos_sw0_vlarb_low", > > + NULL); > > + if (p_opts->qos_swe_options.vlarb_low) > > + subn_verify_vlarb(&(p_opts->qos_swe_options.vlarb_low), > > + "qos_swe_vlarb_low", > > + NULL); > > + if (p_opts->qos_rtr_options.vlarb_low) > > + subn_verify_vlarb(&(p_opts->qos_rtr_options.vlarb_low), > > + "qos_rtr_vlarb_low", > > + NULL); > > + > > + subn_verify_vlarb(&(p_opts->qos_options.vlarb_high), > > + "qos_vlarb_high", > > + OSM_DEFAULT_QOS_VLARB_HIGH); > > + if (p_opts->qos_ca_options.vlarb_high) > > + subn_verify_vlarb(&(p_opts->qos_ca_options.vlarb_high), > > + "qos_ca_vlarb_high", > > + NULL); > > + if (p_opts->qos_sw0_options.vlarb_high) > > + subn_verify_vlarb(&(p_opts->qos_sw0_options.vlarb_high), > > + "qos_sw0_vlarb_high", > > + NULL); > > + if (p_opts->qos_swe_options.vlarb_high) > > + subn_verify_vlarb(&(p_opts->qos_swe_options.vlarb_high), > > + "qos_swe_vlarb_high", > > + NULL); > > + if (p_opts->qos_rtr_options.vlarb_high) > > + subn_verify_vlarb(&(p_opts->qos_rtr_options.vlarb_high), > > + "qos_rtr_vlarb_high", > > + NULL); > > + > > + subn_verify_sl2vl(&(p_opts->qos_options.sl2vl), > > + "qos_sl2vl", > > + OSM_DEFAULT_QOS_SL2VL); > > + if (p_opts->qos_ca_options.sl2vl) > > + subn_verify_sl2vl(&(p_opts->qos_ca_options.sl2vl), > > + "qos_ca_sl2vl", > > + NULL); > > + if (p_opts->qos_sw0_options.sl2vl) > > + subn_verify_sl2vl(&(p_opts->qos_sw0_options.sl2vl), > > + "qos_sw0_sl2vl", > > + NULL); > > + if (p_opts->qos_swe_options.sl2vl) > > + subn_verify_sl2vl(&(p_opts->qos_swe_options.sl2vl), > > + "qos_swe_sl2vl", > > + NULL); > > + if (p_opts->qos_rtr_options.sl2vl) > > + subn_verify_sl2vl(&(p_opts->qos_rtr_options.sl2vl), > > + "qos_rtr_sl2vl", > > + NULL); > > } > > #ifdef ENABLE_OSM_PERF_MGR > > if (p_opts->perfmgr_sweep_time_s < 1) { > > @@ -1714,23 +1806,28 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts) > > > > subn_dump_qos_options(opts_file, > > "QoS default options", "qos", > > + &p_opts->qos_options, > > &p_opts->qos_options); > > fprintf(opts_file, "\n"); > > subn_dump_qos_options(opts_file, > > "QoS CA options", "qos_ca", > > - &p_opts->qos_ca_options); > > + &p_opts->qos_ca_options, > > + &p_opts->qos_options); > > fprintf(opts_file, "\n"); > > subn_dump_qos_options(opts_file, > > "QoS Switch Port 0 options", "qos_sw0", > > - &p_opts->qos_sw0_options); > > + &p_opts->qos_sw0_options, > > + &p_opts->qos_options); > > fprintf(opts_file, "\n"); > > subn_dump_qos_options(opts_file, > > "QoS Switch external ports options", "qos_swe", > > - &p_opts->qos_swe_options); > > + &p_opts->qos_swe_options, > > + &p_opts->qos_options); > > fprintf(opts_file, "\n"); > > subn_dump_qos_options(opts_file, > > "QoS Router ports options", "qos_rtr", > > - &p_opts->qos_rtr_options); > > + &p_opts->qos_rtr_options, > > + &p_opts->qos_options); > > fprintf(opts_file, "\n"); > > > > fprintf(opts_file, > > -- > > 1.5.4.5 > > > -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From nicolas.morey-chaisemartin at ext.bull.net Tue Nov 11 22:36:06 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Wed, 12 Nov 2008 07:36:06 +0100 Subject: [ofa-general] opensm: Routing on non-pure Fat-Tree Message-ID: <491A7956.2000406@ext.bull.net> Hello, I am conducting some tests on routing non-pure fat-tree network using the fat tree algorithm of OpenSM. The network I am experimenting on is a 3 level fat tree, with a pruned 3rd layer. By providing the root_guid_file, the algorithm works great ! The problem is, we would like to add some service nodes directly on the 3rd level switches. I have added the cn_guid_file so the network is still recognize as a fat tree. OpenSM once more manage to create the routing for the network. It provides full connectivity, except there are no routes between non computes nodes. I understand that the point of setting these node as not compute node should intend they won't talk to each other, but we still need a bit of connectivity between them to exchange few datas (pings and such). A simple min-hop or such should be enough to generate those routes. It will probably desequilibrate the number of routes going through the top links, but those additional link makes virtually no traffic at all, so in practical it shouldn't be a problem. Is there any reasons such a behaviour wasn't implemented yet? Should there be one? Regards Nicolas Morey-Chaisemartin From ogerlitz at voltaire.com Tue Nov 11 22:46:04 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 12 Nov 2008 08:46:04 +0200 Subject: [ofa-general] [PATCH] perftest: don't attach the sender QP In-Reply-To: References: Message-ID: <491A7BAC.5030708@voltaire.com> Or Gerlitz wrote: > don't attach the sender QP to the MGID > Oren, Did you had the chance to look into this patch? Or. > Signed-off-by: Or Gerlitz > > Index: perftest-1.2/send_bw.c > =================================================================== > --- perftest-1.2.orig/send_bw.c > +++ perftest-1.2/send_bw.c > @@ -421,7 +421,7 @@ static struct pingpong_context *pp_init_ > return NULL; > } > > - if ((user_parm->connection_type==UD) && (user_parm->use_mcg)) { > + if ((user_parm->connection_type==UD) && (user_parm->use_mcg) && !user_parm->servername) { > union ibv_gid gid; > uint8_t mcg_gid[16] = MCG_GID; > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From vlad at lists.openfabrics.org Wed Nov 12 03:23:53 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Wed, 12 Nov 2008 03:23:53 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081112-0200 daily build status Message-ID: <20081112112353.69757E60D93@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From kliteyn at dev.mellanox.co.il Wed Nov 12 07:57:58 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 12 Nov 2008 17:57:58 +0200 Subject: [ofa-general] opensm: Routing on non-pure Fat-Tree In-Reply-To: <491A7956.2000406@ext.bull.net> References: <491A7956.2000406@ext.bull.net> Message-ID: <491AFD06.3010207@dev.mellanox.co.il> Hi Nicolas, Nicolas Morey Chaisemartin wrote: > Hello, > > I am conducting some tests on routing non-pure fat-tree network using > the fat tree algorithm of OpenSM. > The network I am experimenting on is a 3 level fat tree, with a pruned > 3rd layer. > By providing the root_guid_file, the algorithm works great ! > > The problem is, we would like to add some service nodes directly on the > 3rd level switches. > I have added the cn_guid_file so the network is still recognize as a fat > tree. > OpenSM once more manage to create the routing for the network. It > provides full connectivity, > except there are no routes between non computes nodes. > I understand that the point of setting these node as not compute node > should intend they won't talk to each other, but we still need a bit of > connectivity between them to exchange few datas (pings and such). > A simple min-hop or such should be enough to generate those routes. > It will probably desequilibrate the number of routes going through the > top links, but those additional link makes virtually no traffic at all, > so in practical it shouldn't be a problem. Fat-tree should create full connectivity as long as there is an up/down route between ports. Do you get connectivity between these nodes with up/down routing algorithm? Try running it with the same root_guid_file. -- Yevgeny > Is there any reasons such a behaviour wasn't implemented yet? Should > there be one? > > Regards > > Nicolas Morey-Chaisemartin > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From nicolas.morey-chaisemartin at ext.bull.net Wed Nov 12 08:01:53 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Wed, 12 Nov 2008 17:01:53 +0100 Subject: [ofa-general] opensm: Routing on non-pure Fat-Tree In-Reply-To: <491AFD06.3010207@dev.mellanox.co.il> References: <491A7956.2000406@ext.bull.net> <491AFD06.3010207@dev.mellanox.co.il> Message-ID: <491AFDF1.2080607@ext.bull.net> Yevgeny Kliteynik a écrit : > Hi Nicolas, > > Nicolas Morey Chaisemartin wrote: >> Hello, >> >> I am conducting some tests on routing non-pure fat-tree network using >> the fat tree algorithm of OpenSM. >> The network I am experimenting on is a 3 level fat tree, with a >> pruned 3rd layer. >> By providing the root_guid_file, the algorithm works great ! >> >> The problem is, we would like to add some service nodes directly on >> the 3rd level switches. >> I have added the cn_guid_file so the network is still recognize as a >> fat tree. >> OpenSM once more manage to create the routing for the network. It >> provides full connectivity, >> except there are no routes between non computes nodes. >> I understand that the point of setting these node as not compute node >> should intend they won't talk to each other, but we still need a bit >> of connectivity between them to exchange few datas (pings and such). >> A simple min-hop or such should be enough to generate those routes. >> It will probably desequilibrate the number of routes going through >> the top links, but those additional link makes virtually no traffic >> at all, so in practical it shouldn't be a problem. > > Fat-tree should create full connectivity as long as there is an up/down > route between ports. Do you get connectivity between these nodes with > up/down routing algorithm? > Try running it with the same root_guid_file. > > -- Yevgeny Well the route would be more down/up compared to the rest of the transfer. (Im not sure I was clear, but when i talk of 3rd level, I mean top level. 1st level begin the switches just above the compute nodes) I'll try this tomorrow Thanks Nicolas > >> Is there any reasons such a behaviour wasn't implemented yet? Should >> there be one? >> >> Regards >> >> Nicolas Morey-Chaisemartin >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > > > From michael.heinz at qlogic.com Wed Nov 12 08:52:29 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Wed, 12 Nov 2008 10:52:29 -0600 Subject: [ofa-general] fork() failing in mvapich1 and mvapich2, using OFED 1.4 Message-ID: I'm not sure when this stopped working, but I'm getting a complaint from our QA people that our fork() test program is failing with mvapich1 and mvapich2 when tested with OFED 1.4. When I tested with OFED 1.3.1, I got a similar result: [root at panic mpi_fork]$ mpirun_rsh -np 2 panic homer mpi_fork 128 1024 Exit code -3 signaled from homer Abort signaled by rank 0: [panic:0] Got completion with error IBV_WC_LOC_LEN_ERR, code=1, dest rank=1 Killing remote processes...MPI process terminated unexpectedly DONE This is the program that generates the failure: #include #include #include #include #define MYBUFSIZE (4*1024*1028) #define MAX_REQ_NUM 100000 char s_buf1[MYBUFSIZE]; char r_buf1[MYBUFSIZE]; MPI_Request request[MAX_REQ_NUM]; MPI_Status my_stat[MAX_REQ_NUM]; int main(int argc,char *argv[]) { int myid, numprocs, i; int size, loop, page_size; char *s_buf, *r_buf; double t_start=0.0, t_end=0.0, t=0.0; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); if ( argc < 3 ) { fprintf(stderr, "Usage: mpi_fork loop msg_size\n"); MPI_Finalize(); return 0; } size=atoi(argv[2]); loop = atoi(argv[1]); if(size > MYBUFSIZE){ fprintf(stderr, "Maximum message size is %d\n",MYBUFSIZE); MPI_Finalize(); return 0; } if(loop > MAX_REQ_NUM){ fprintf(stderr, "Maximum number of iterations is %d\n",MAX_REQ_NUM); MPI_Finalize(); return 0; } page_size = getpagesize(); s_buf = (char*)(((unsigned long)s_buf1 + (page_size -1))/page_size * page_size); r_buf = (char*)(((unsigned long)r_buf1 + (page_size -1))/page_size * page_size); assert( (s_buf != NULL) && (r_buf != NULL) ); for ( i=0; i Message-ID: Hi Mike, In order to have the fork support enabled you need to set an additional ENV. See Section 7.1.2 in the User Guide for more information: http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide.html#x1-350007.1.2 Thanks, Matt On Wed, 12 Nov 2008, Mike Heinz wrote: > I'm not sure when this stopped working, but I'm getting a complaint from > our QA people that our fork() test program is failing with mvapich1 and > mvapich2 when tested with OFED 1.4. When I tested with OFED 1.3.1, I got > a similar result: > > > [root at panic mpi_fork]$ mpirun_rsh -np 2 panic homer mpi_fork 128 1024 > Exit code -3 signaled from homer > Abort signaled by rank 0: [panic:0] Got completion with error > IBV_WC_LOC_LEN_ERR, code=1, dest rank=1 > > Killing remote processes...MPI process terminated unexpectedly > DONE > > > This is the program that generates the failure: > > #include > #include > #include > #include > > > #define MYBUFSIZE (4*1024*1028) > #define MAX_REQ_NUM 100000 > > char s_buf1[MYBUFSIZE]; > char r_buf1[MYBUFSIZE]; > > > MPI_Request request[MAX_REQ_NUM]; > MPI_Status my_stat[MAX_REQ_NUM]; > > int main(int argc,char *argv[]) > { > int myid, numprocs, i; > int size, loop, page_size; > char *s_buf, *r_buf; > double t_start=0.0, t_end=0.0, t=0.0; > > > MPI_Init(&argc,&argv); > MPI_Comm_size(MPI_COMM_WORLD,&numprocs); > MPI_Comm_rank(MPI_COMM_WORLD,&myid); > > if ( argc < 3 ) { > fprintf(stderr, "Usage: mpi_fork loop msg_size\n"); > MPI_Finalize(); > return 0; > } > size=atoi(argv[2]); > loop = atoi(argv[1]); > > if(size > MYBUFSIZE){ > fprintf(stderr, "Maximum message size is %d\n",MYBUFSIZE); > MPI_Finalize(); > return 0; > } > > if(loop > MAX_REQ_NUM){ > fprintf(stderr, "Maximum number of iterations is > %d\n",MAX_REQ_NUM); > MPI_Finalize(); > return 0; > } > > page_size = getpagesize(); > > s_buf = (char*)(((unsigned long)s_buf1 + (page_size -1))/page_size * > page_size); > r_buf = (char*)(((unsigned long)r_buf1 + (page_size -1))/page_size * > page_size); > > assert( (s_buf != NULL) && (r_buf != NULL) ); > > for ( i=0; i s_buf[i]='a'; > r_buf[i]='b'; > } > > /*warmup */ > if (myid == 0) > { > for ( i=0; i< loop; i++ ) { > MPI_Isend(s_buf, size, MPI_CHAR, 1, 100, MPI_COMM_WORLD, > request+i); > } > > MPI_Waitall(loop, request, my_stat); > MPI_Recv(r_buf, 4, MPI_CHAR, 1, 101, MPI_COMM_WORLD, > &my_stat[0]); > > }else{ > for ( i=0; i< loop; i++ ) { > MPI_Irecv(r_buf, size, MPI_CHAR, 0, 100, MPI_COMM_WORLD, > request+i); > } > MPI_Waitall(loop, request, my_stat); > MPI_Send(s_buf, 4, MPI_CHAR, 0, 101, MPI_COMM_WORLD); > } > // fork a child process and make sure it lives beyond parent > touching pages > // if fork is not properly handled in stack, parent would get a copy > // of its registered/locked pages (such as qp wqes) on 1st access > // and problems such as Local Length Error would be reported by HCA > if (fork() == 0) { > // child exists but doesn't touch anything, parent still owns > pages > sleep(10); > // exec another program > execlp("date", "date", NULL); > // just in case exec fails > exit(0); > } > > MPI_Barrier(MPI_COMM_WORLD); > > if (myid == 0) > { > t_start=MPI_Wtime(); > for ( i=0; i< loop; i++ ) { > MPI_Isend(s_buf, size, MPI_CHAR, 1, 100, MPI_COMM_WORLD, > request+i); > } > > MPI_Waitall(loop, request, my_stat); > MPI_Recv(r_buf, 4, MPI_CHAR, 1, 101, MPI_COMM_WORLD, > &my_stat[0]); > > t_end=MPI_Wtime(); > t = t_end - t_start; > > }else{ > for ( i=0; i< loop; i++ ) { > MPI_Irecv(r_buf, size, MPI_CHAR, 0, 100, MPI_COMM_WORLD, > request+i); > } > MPI_Waitall(loop, request, my_stat); > MPI_Send(s_buf, 4, MPI_CHAR, 0, 101, MPI_COMM_WORLD); > } > > if ( myid == 0 ) { > double tmp; > tmp = ((size*1.0)/1.0e6)*loop; > fprintf(stdout,"%d\t%f\n", size, tmp/t); > } > { > int status; > int ret; > > ret = wait(&status); > if (ret == -1 || ! WIFEXITED(status) || WEXITSTATUS(status) != > 0) > { > fprintf(stdout,"ERROR: child failure: ret=%d, status=0x%x, > exit_status=%d\n", ret, status, WEXITSTATUS(status)); > } > } > > MPI_Barrier(MPI_COMM_WORLD); > MPI_Finalize(); > return 0; > } > > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss at cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From michael.heinz at qlogic.com Wed Nov 12 09:22:22 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Wed, 12 Nov 2008 11:22:22 -0600 Subject: [ofa-general] RE: [mvapich-discuss] fork() failing in mvapich1 and mvapich2, using OFED 1.4 In-Reply-To: References: Message-ID: Thanks for the reply, Matt. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Matthew Koop [mailto:koop at cse.ohio-state.edu] Sent: Wednesday, November 12, 2008 12:13 PM To: Mike Heinz Cc: mvapich-discuss at cse.ohio-state.edu; general at lists.openfabrics.org Subject: Re: [mvapich-discuss] fork() failing in mvapich1 and mvapich2, using OFED 1.4 Hi Mike, In order to have the fork support enabled you need to set an additional ENV. See Section 7.1.2 in the User Guide for more information: http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide.html#x1-350 007.1.2 Thanks, Matt On Wed, 12 Nov 2008, Mike Heinz wrote: > I'm not sure when this stopped working, but I'm getting a complaint > from our QA people that our fork() test program is failing with > mvapich1 and > mvapich2 when tested with OFED 1.4. When I tested with OFED 1.3.1, I > got a similar result: > > > [root at panic mpi_fork]$ mpirun_rsh -np 2 panic homer mpi_fork 128 1024 > Exit code -3 signaled from homer Abort signaled by rank 0: [panic:0] > Got completion with error IBV_WC_LOC_LEN_ERR, code=1, dest rank=1 > > Killing remote processes...MPI process terminated unexpectedly DONE > > > This is the program that generates the failure: > > #include > #include > #include > #include > > > #define MYBUFSIZE (4*1024*1028) > #define MAX_REQ_NUM 100000 > > char s_buf1[MYBUFSIZE]; > char r_buf1[MYBUFSIZE]; > > > MPI_Request request[MAX_REQ_NUM]; > MPI_Status my_stat[MAX_REQ_NUM]; > > int main(int argc,char *argv[]) > { > int myid, numprocs, i; > int size, loop, page_size; > char *s_buf, *r_buf; > double t_start=0.0, t_end=0.0, t=0.0; > > > MPI_Init(&argc,&argv); > MPI_Comm_size(MPI_COMM_WORLD,&numprocs); > MPI_Comm_rank(MPI_COMM_WORLD,&myid); > > if ( argc < 3 ) { > fprintf(stderr, "Usage: mpi_fork loop msg_size\n"); > MPI_Finalize(); > return 0; > } > size=atoi(argv[2]); > loop = atoi(argv[1]); > > if(size > MYBUFSIZE){ > fprintf(stderr, "Maximum message size is %d\n",MYBUFSIZE); > MPI_Finalize(); > return 0; > } > > if(loop > MAX_REQ_NUM){ > fprintf(stderr, "Maximum number of iterations is > %d\n",MAX_REQ_NUM); > MPI_Finalize(); > return 0; > } > > page_size = getpagesize(); > > s_buf = (char*)(((unsigned long)s_buf1 + (page_size -1))/page_size * > page_size); > r_buf = (char*)(((unsigned long)r_buf1 + (page_size -1))/page_size * > page_size); > > assert( (s_buf != NULL) && (r_buf != NULL) ); > > for ( i=0; i s_buf[i]='a'; > r_buf[i]='b'; > } > > /*warmup */ > if (myid == 0) > { > for ( i=0; i< loop; i++ ) { > MPI_Isend(s_buf, size, MPI_CHAR, 1, 100, MPI_COMM_WORLD, > request+i); > } > > MPI_Waitall(loop, request, my_stat); > MPI_Recv(r_buf, 4, MPI_CHAR, 1, 101, MPI_COMM_WORLD, > &my_stat[0]); > > }else{ > for ( i=0; i< loop; i++ ) { > MPI_Irecv(r_buf, size, MPI_CHAR, 0, 100, MPI_COMM_WORLD, > request+i); > } > MPI_Waitall(loop, request, my_stat); > MPI_Send(s_buf, 4, MPI_CHAR, 0, 101, MPI_COMM_WORLD); > } > // fork a child process and make sure it lives beyond parent > touching pages > // if fork is not properly handled in stack, parent would get a copy > // of its registered/locked pages (such as qp wqes) on 1st access > // and problems such as Local Length Error would be reported by HCA > if (fork() == 0) { > // child exists but doesn't touch anything, parent still owns > pages > sleep(10); > // exec another program > execlp("date", "date", NULL); > // just in case exec fails > exit(0); > } > > MPI_Barrier(MPI_COMM_WORLD); > > if (myid == 0) > { > t_start=MPI_Wtime(); > for ( i=0; i< loop; i++ ) { > MPI_Isend(s_buf, size, MPI_CHAR, 1, 100, MPI_COMM_WORLD, > request+i); > } > > MPI_Waitall(loop, request, my_stat); > MPI_Recv(r_buf, 4, MPI_CHAR, 1, 101, MPI_COMM_WORLD, > &my_stat[0]); > > t_end=MPI_Wtime(); > t = t_end - t_start; > > }else{ > for ( i=0; i< loop; i++ ) { > MPI_Irecv(r_buf, size, MPI_CHAR, 0, 100, MPI_COMM_WORLD, > request+i); > } > MPI_Waitall(loop, request, my_stat); > MPI_Send(s_buf, 4, MPI_CHAR, 0, 101, MPI_COMM_WORLD); > } > > if ( myid == 0 ) { > double tmp; > tmp = ((size*1.0)/1.0e6)*loop; > fprintf(stdout,"%d\t%f\n", size, tmp/t); > } > { > int status; > int ret; > > ret = wait(&status); > if (ret == -1 || ! WIFEXITED(status) || WEXITSTATUS(status) != > 0) > { > fprintf(stdout,"ERROR: child failure: ret=%d, status=0x%x, > exit_status=%d\n", ret, status, WEXITSTATUS(status)); > } > } > > MPI_Barrier(MPI_COMM_WORLD); > MPI_Finalize(); > return 0; > } > > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss at cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From rdreier at cisco.com Wed Nov 12 10:20:40 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 12 Nov 2008 10:20:40 -0800 Subject: [ofa-general] Re: [PATCH 2.6.28] RDMA/cxgb3: deadlock in iw_cxgb3 can cause hang when configuring interface. In-Reply-To: <20081106230642.28808.66765.stgit@dell3.ogc.int> (Steve Wise's message of "Thu, 06 Nov 2008 17:06:42 -0600") References: <20081106230642.28808.66765.stgit@dell3.ogc.int> Message-ID: Looks good, applied. However, I think it's a little yucky to call ethtool ops without rtnl, although it is of course perfectly safe in this case. It might be nicer to introduce a new cxgb3 <-> iw_cxgb3 interface that returns the firmware version, which can also be used to implement the get_drvinfo ethtool op as well. That would let you avoid fw_vers_string_to_u64() as well -- it is a little silly at the moment how cxgb3 converts to a string and then iw_cxgb3 parses that string. But that's all much lower priority than just fixing a deadlock. - R. From sashak at voltaire.com Wed Nov 12 10:54:57 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 12 Nov 2008 20:54:57 +0200 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_state_mgr.c: Add check for valid physical port before using pointer. In-Reply-To: <20081110131140.52561f42.weiny2@llnl.gov> References: <20081104095744.35893d4a.weiny2@llnl.gov> <20081110201333.GM313@sashak.voltaire.com> <20081110131140.52561f42.weiny2@llnl.gov> Message-ID: <20081112185457.GD27271@sashak.voltaire.com> Hi Ira, On 13:11 Mon 10 Nov , Ira Weiny wrote: > > > > Actually it can be a valid case. For example when node was first time > > discovered via port A, when this port was disconnected and the same node > > was discovered via port B - it is not a new node and node_info (where > > port number for osm_node_get_any_physp_ptr() is stored) will not be > > updated. > > Ah, good point, I just happened to see it when PortInfo failed. > > > > > Obviously the patch is fine. But probably we need more general fix, for > > example to redo osm_node_get_any_physp_ptr() so that it will not return > > invalid ports. Need to review other osm_node_get_any_physp_ptr() usages. > > I was wondering if it would return invalid ports ever. It would be easy for it > to return only valid ports but perhaps that should be another function to > preserve functionality? Perhaps. OTOH osm_node_get_any_physp_ptr() is used very few. I think first we need to review all those cases, then we will know better how to handle this. Sasha From chu11 at llnl.gov Wed Nov 12 11:26:56 2008 From: chu11 at llnl.gov (Al Chu) Date: Wed, 12 Nov 2008 11:26:56 -0800 Subject: [ofa-general] opensm - can/cannot set alternate default pkey? Message-ID: <1226518016.7156.15.camel@cardanus.llnl.gov> Before I run off and write a patch I shouldn't, I thought I'd ask. In 10.9.1.2 of the spec, it states, "The P_Key value of 0xFFFF shall represent the default partition key." (I couldn't find the glossary in the spec about what "shall" means, but I assume it means "must" or "required" like RFCs.) Does this mean that a P_Key of 0xFFFF must be in the P_Key_Table? Currently, it seems that in opensm, no matter how you write your partition.conf file, 0xFFFF will always be the P_Key_Table. This is because opensm inserts this in it's internal list by default, and nothing (as far as I can find) can remove it/get rid of it out of that internal list. This seems wrong to me, but I'm getting confused on the wording. Thanks, Al -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From kliteyn at dev.mellanox.co.il Wed Nov 12 12:00:10 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 12 Nov 2008 22:00:10 +0200 Subject: [ofa-general] opensm: Routing on non-pure Fat-Tree In-Reply-To: <491AFDF1.2080607@ext.bull.net> References: <491A7956.2000406@ext.bull.net> <491AFD06.3010207@dev.mellanox.co.il> <491AFDF1.2080607@ext.bull.net> Message-ID: <491B35CA.8020904@dev.mellanox.co.il> Nicolas Morey Chaisemartin wrote: > Yevgeny Kliteynik a écrit : >> Hi Nicolas, >> >> Nicolas Morey Chaisemartin wrote: >>> Hello, >>> >>> I am conducting some tests on routing non-pure fat-tree network using >>> the fat tree algorithm of OpenSM. >>> The network I am experimenting on is a 3 level fat tree, with a >>> pruned 3rd layer. >>> By providing the root_guid_file, the algorithm works great ! >>> >>> The problem is, we would like to add some service nodes directly on >>> the 3rd level switches. >>> I have added the cn_guid_file so the network is still recognize as a >>> fat tree. >>> OpenSM once more manage to create the routing for the network. It >>> provides full connectivity, >>> except there are no routes between non computes nodes. >>> I understand that the point of setting these node as not compute node >>> should intend they won't talk to each other, but we still need a bit >>> of connectivity between them to exchange few datas (pings and such). >>> A simple min-hop or such should be enough to generate those routes. >>> It will probably desequilibrate the number of routes going through >>> the top links, but those additional link makes virtually no traffic >>> at all, so in practical it shouldn't be a problem. >> >> Fat-tree should create full connectivity as long as there is an up/down >> route between ports. Do you get connectivity between these nodes with >> up/down routing algorithm? >> Try running it with the same root_guid_file. >> >> -- Yevgeny > > Well the route would be more down/up compared to the rest of the transfer. > (Im not sure I was clear, but when i talk of 3rd level, I mean top > level. 1st level begin the switches just above the compute nodes) Oh, OK. I was thinking the opposite. So you connect these non-CNs to spine switches. > I'll try this tomorrow No need :) Fat-tree is a variation of up/down routing. As such, down/up routes are not allowed. You won't have connectivity between these nodes neither in fat-tree nor in up/down routing. >>> Is there any reasons such a behavior wasn't implemented yet? The idea of allowing only up/down routes is preventing credit loops in the fabric. >>> Should there be one? I guess it is possible, but these down/up routes will create credit loops, so any traffic between these "special" nodes is potentially bad for fabric. Note that there's already a "connect roots" option in the up/down routing which violates the up/down rule, but this is only between switches, so I believe that the only traffic that uses these routes is management traffic. -- Yevgeny >>> Regards >>> >>> Nicolas Morey-Chaisemartin >>> >>> _______________________________________________ >>> general mailing list >>> general at lists.openfabrics.org >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>> >>> To unsubscribe, please visit >>> http://openib.org/mailman/listinfo/openib-general >>> >> >> >> > > From akepner at sgi.com Wed Nov 12 14:18:46 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Wed, 12 Nov 2008 14:18:46 -0800 Subject: [ofa-general] opensm: bad multicast forwarding table entries Message-ID: <20081112221846.GE25248@sgi.com> Here's a description of a problem we're seeing where multicast forwarding tables are apparently getting set up incorrectly. I'd appreciate any debug help from the opensm experts out there. On large clusters (>1000 nodes or so) we often see hundreds of errors from 'ibdiagnet -r' like the following (this is the simplest example I could find): -I- Multicast Group:0xC069 has:2 switches and:2 HCAs -E- Disconnected switch:S0800690000002e51/U1 in group:0xC069 -E- Disconnected HCA:r4i2n10/U1 These have invariably been multicast groups associated with IPv6 solicited node multicast addresses, e.g., in this case 'saquery -m' shows only a single member, "r5lead": MCMemberRecord member dump: MGID....................0xff12601bffff0000 : 0x00000001ff26d289 Mlid....................0xC069 PortGid.................0xfe80000000000000 : 0x0002c9020026d289 ScopeState..............0x1 ProxyJoin...............0x0 NodeDescription.........r5lead HCA-1 ibdiagnet shows that "r5lead" is connected to the switch with lid 1609, port 24: Switch 24 "S-0800690000002db4" # "MT47396 Infiniscale-III Mellanox Technologies" base port 0 lid 1609 lmc 0 [24] "H-0002c9020026d288"[1](2c9020026d289) # "r5lead HCA-1" lid 1576 4xDDR and the multicast forwarding table (from 'dump_mfts.sh') is consistent: Multicast mlids [0xc000-0xc3ff] of switch Lid 1609 guid 0x0800690000002db4 (MT47396 Infiniscale-III Mellanox Technologies): 0 1 2 Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 MLid .... 0xc069 x So far, so good. But we also have r4i2n10, connected to the switch with lid 1533 port 7: switchguid=0x800690000002e50(800690000002e50) Switch 24 "S-0800690000002e50" # "MT47396 Infiniscale-III Mellanox Technologies" base port 0 lid 1533 lmc 0 ...... [7] "H-003048c2438a0000"[1](3048c2438a0001) # "r4i2n10 HCA-1" lid 771 4xDDR with this mft entry: Multicast mlids [0xc000-0xc3ff] of switch Lid 1533 guid 0x0800690000002e50 (MT47396 Infiniscale-III Mellanox Technologies): 0 1 2 Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 MLid ..... 0xc069 x Any idea why "r4i2n10", with PortGid fe80::3048c2438a0001 would have a mft entry for the multicast group with MGID ff12601bffff::1ff26d289? Anyone else seen similar? -- Arthur From hal.rosenstock at gmail.com Wed Nov 12 14:46:18 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Wed, 12 Nov 2008 17:46:18 -0500 Subject: [ofa-general] opensm - can/cannot set alternate default pkey? In-Reply-To: <1226518016.7156.15.camel@cardanus.llnl.gov> References: <1226518016.7156.15.camel@cardanus.llnl.gov> Message-ID: Hi Al, On Wed, Nov 12, 2008 at 2:26 PM, Al Chu wrote: > Before I run off and write a patch I shouldn't, I thought I'd ask. I don't think there's a need (see below). > In 10.9.1.2 of the spec, it states, "The P_Key value of 0xFFFF shall > represent the default partition key." Default in this sense is referring to the default partition (and it is not changeable in the same sense other defaults are). All end ports _must_ be a member of the default partition either as a full or limited member. This is needed for SA communication. See p.882 Table 185 P_KeyTable (initialization) for one citation on this. There are others in the spec. > (I couldn't find the glossary in the spec about what "shall" means, but > I assume it means "must" or "required" like RFCs.) Yes. > Does this mean that a P_Key of 0xFFFF must be in the P_Key_Table? Either 0xffff or 0x7fff must be in the P_KeyTable of every end port. > Currently, it seems that in opensm, no matter how you write your > partition.conf file, 0xFFFF will always be the P_Key_Table. This is > because opensm inserts this in it's internal list by default, and > nothing (as far as I can find) can remove it/get rid of it out of that > internal list. That's being a full member of the default partition. You should be able to change this to be a limited member of the default partition too. -- Hal > This seems wrong to me, but I'm getting confused on the wording. > Thanks, > Al > > -- > Albert Chu > chu11 at llnl.gov > Computer Scientist > High Performance Systems Division > Lawrence Livermore National Laboratory > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From hal.rosenstock at gmail.com Wed Nov 12 14:46:57 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Wed, 12 Nov 2008 17:46:57 -0500 Subject: ***SPAM*** Re: [ofa-general] opensm: bad multicast forwarding table entries In-Reply-To: <20081112221846.GE25248@sgi.com> References: <20081112221846.GE25248@sgi.com> Message-ID: On Wed, Nov 12, 2008 at 5:18 PM, wrote: > > Here's a description of a problem we're seeing where multicast > forwarding tables are apparently getting set up incorrectly. I'd > appreciate any debug help from the opensm experts out there. > > On large clusters (>1000 nodes or so) we often see hundreds of errors > from 'ibdiagnet -r' like the following (this is the simplest example > I could find): > > -I- Multicast Group:0xC069 has:2 switches and:2 HCAs > -E- Disconnected switch:S0800690000002e51/U1 in group:0xC069 > -E- Disconnected HCA:r4i2n10/U1 > > These have invariably been multicast groups associated with IPv6 > solicited node multicast addresses, e.g., in this case 'saquery -m' > shows only a single member, "r5lead": > > MCMemberRecord member dump: > MGID....................0xff12601bffff0000 : 0x00000001ff26d289 > Mlid....................0xC069 > PortGid.................0xfe80000000000000 : 0x0002c9020026d289 > ScopeState..............0x1 > ProxyJoin...............0x0 > NodeDescription.........r5lead HCA-1 > > ibdiagnet shows that "r5lead" is connected to the switch with lid > 1609, port 24: > > Switch 24 "S-0800690000002db4" # "MT47396 Infiniscale-III Mellanox Technologies" base port 0 lid 1609 lmc 0 > [24] "H-0002c9020026d288"[1](2c9020026d289) # "r5lead HCA-1" lid 1576 4xDDR > > and the multicast forwarding table (from 'dump_mfts.sh') is consistent: > > Multicast mlids [0xc000-0xc3ff] of switch Lid 1609 guid 0x0800690000002db4 (MT47396 Infiniscale-III Mellanox Technologies): > 0 1 2 > Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 > MLid > .... > 0xc069 x > > > So far, so good. But we also have r4i2n10, connected to the switch with > lid 1533 port 7: > > switchguid=0x800690000002e50(800690000002e50) > Switch 24 "S-0800690000002e50" # "MT47396 Infiniscale-III Mellanox Technologies" base port 0 lid 1533 lmc 0 > ...... > [7] "H-003048c2438a0000"[1](3048c2438a0001) # "r4i2n10 HCA-1" lid 771 4xDDR > > with this mft entry: > > Multicast mlids [0xc000-0xc3ff] of switch Lid 1533 guid 0x0800690000002e50 (MT47396 Infiniscale-III Mellanox Technologies): > 0 1 2 > Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 > MLid > ..... > 0xc069 x > > Any idea why "r4i2n10", with PortGid fe80::3048c2438a0001 would have a > mft entry for the multicast group with MGID ff12601bffff::1ff26d289? Are you using the consolidate IPv6 SNM (solicited node multicast) option in OpenSM ? -- Hal > Anyone else seen similar? > > -- > Arthur > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From chu11 at llnl.gov Wed Nov 12 14:59:53 2008 From: chu11 at llnl.gov (Al Chu) Date: Wed, 12 Nov 2008 14:59:53 -0800 Subject: [ofa-general] opensm - can/cannot set alternate default pkey? In-Reply-To: References: <1226518016.7156.15.camel@cardanus.llnl.gov> Message-ID: <1226530793.7156.25.camel@cardanus.llnl.gov> Hey Hal, Now its making more sense to me. Thanks for clearing it up. Al On Wed, 2008-11-12 at 17:46 -0500, Hal Rosenstock wrote: > Hi Al, > > On Wed, Nov 12, 2008 at 2:26 PM, Al Chu wrote: > > Before I run off and write a patch I shouldn't, I thought I'd ask. > > I don't think there's a need (see below). > > > In 10.9.1.2 of the spec, it states, "The P_Key value of 0xFFFF shall > > represent the default partition key." > > Default in this sense is referring to the default partition (and it is > not changeable in the same sense other defaults are). > > All end ports _must_ be a member of the default partition either as a > full or limited member. This is needed for SA communication. See p.882 > Table 185 P_KeyTable (initialization) for one citation on this. There > are others in the spec. > > > (I couldn't find the glossary in the spec about what "shall" means, but > > I assume it means "must" or "required" like RFCs.) > > Yes. > > > Does this mean that a P_Key of 0xFFFF must be in the P_Key_Table? > > Either 0xffff or 0x7fff must be in the P_KeyTable of every end port. > > > Currently, it seems that in opensm, no matter how you write your > > partition.conf file, 0xFFFF will always be the P_Key_Table. This is > > because opensm inserts this in it's internal list by default, and > > nothing (as far as I can find) can remove it/get rid of it out of that > > internal list. > > That's being a full member of the default partition. You should be > able to change this to be a limited member of the default partition > too. > > -- Hal > > > This seems wrong to me, but I'm getting confused on the wording. > > > Thanks, > > Al > > > > -- > > Albert Chu > > chu11 at llnl.gov > > Computer Scientist > > High Performance Systems Division > > Lawrence Livermore National Laboratory > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general > > > -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From akepner at sgi.com Wed Nov 12 15:00:13 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Wed, 12 Nov 2008 15:00:13 -0800 Subject: [ofa-general] opensm: bad multicast forwarding table entries In-Reply-To: References: <20081112221846.GE25248@sgi.com> Message-ID: <20081112230013.GF25248@sgi.com> On Wed, Nov 12, 2008 at 05:46:57PM -0500, Hal Rosenstock wrote: > ... > Are you using the consolidate IPv6 SNM (solicited node multicast) > option in OpenSM ? > No, we're generally using OFED 1.3-1.3.1 vintage code, which doesn't have that option. (In fact, this is the first I've heard of it.) -- Arthur From hal.rosenstock at gmail.com Wed Nov 12 15:14:15 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Wed, 12 Nov 2008 18:14:15 -0500 Subject: [ofa-general] opensm: bad multicast forwarding table entries In-Reply-To: <20081112230013.GF25248@sgi.com> References: <20081112221846.GE25248@sgi.com> <20081112230013.GF25248@sgi.com> Message-ID: On Wed, Nov 12, 2008 at 6:00 PM, wrote: > On Wed, Nov 12, 2008 at 05:46:57PM -0500, Hal Rosenstock wrote: >> ... >> Are you using the consolidate IPv6 SNM (solicited node multicast) >> option in OpenSM ? >> > > No, we're generally using OFED 1.3-1.3.1 vintage code, which > doesn't have that option. (In fact, this is the first I've > heard of it.) OK; that at least level sets this. I'm not sure about what's changed in this area but I'll respond some more to the original post. -- Hal > Arthur > > From hal.rosenstock at gmail.com Wed Nov 12 15:27:29 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Wed, 12 Nov 2008 18:27:29 -0500 Subject: ***SPAM*** Re: [ofa-general] opensm: bad multicast forwarding table entries In-Reply-To: <20081112221846.GE25248@sgi.com> References: <20081112221846.GE25248@sgi.com> Message-ID: On Wed, Nov 12, 2008 at 5:18 PM, wrote: > > Here's a description of a problem we're seeing where multicast > forwarding tables are apparently getting set up incorrectly. I'd > appreciate any debug help from the opensm experts out there. > > On large clusters (>1000 nodes or so) we often see hundreds of errors > from 'ibdiagnet -r' like the following (this is the simplest example > I could find): > > -I- Multicast Group:0xC069 has:2 switches and:2 HCAs > -E- Disconnected switch:S0800690000002e51/U1 in group:0xC069 > -E- Disconnected HCA:r4i2n10/U1 Is it really an error to have a multicast group like this ? I agree it's not needed to route if there's only 1 member port. Can you describe the scenario under which this occurs ? Are things steady state or are there changes going on in the subnet ? Any errors in the opensm log ? > These have invariably been multicast groups associated with IPv6 > solicited node multicast addresses, e.g., in this case 'saquery -m' > shows only a single member, "r5lead": > > MCMemberRecord member dump: > MGID....................0xff12601bffff0000 : 0x00000001ff26d289 > Mlid....................0xC069 > PortGid.................0xfe80000000000000 : 0x0002c9020026d289 > ScopeState..............0x1 > ProxyJoin...............0x0 > NodeDescription.........r5lead HCA-1 > > ibdiagnet shows that "r5lead" is connected to the switch with lid > 1609, port 24: > > Switch 24 "S-0800690000002db4" # "MT47396 Infiniscale-III Mellanox Technologies" base port 0 lid 1609 lmc 0 > [24] "H-0002c9020026d288"[1](2c9020026d289) # "r5lead HCA-1" lid 1576 4xDDR > > and the multicast forwarding table (from 'dump_mfts.sh') is consistent: > > Multicast mlids [0xc000-0xc3ff] of switch Lid 1609 guid 0x0800690000002db4 (MT47396 Infiniscale-III Mellanox Technologies): > 0 1 2 > Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 > MLid > .... > 0xc069 x > > > So far, so good. But we also have r4i2n10, connected to the switch with > lid 1533 port 7: > > switchguid=0x800690000002e50(800690000002e50) > Switch 24 "S-0800690000002e50" # "MT47396 Infiniscale-III Mellanox Technologies" base port 0 lid 1533 lmc 0 > ...... > [7] "H-003048c2438a0000"[1](3048c2438a0001) # "r4i2n10 HCA-1" lid 771 4xDDR > > with this mft entry: > > Multicast mlids [0xc000-0xc3ff] of switch Lid 1533 guid 0x0800690000002e50 (MT47396 Infiniscale-III Mellanox Technologies): > 0 1 2 > Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 > MLid > ..... > 0xc069 x > > Any idea why "r4i2n10", with PortGid fe80::3048c2438a0001 would have a > mft entry for the multicast group with MGID ff12601bffff::1ff26d289? The MFT entry is based on an MLID and not the MGID. What does saquery -g show ? Does it show one or more than one MGID with an MLID of 0xc069 ? Also, does saquery -m 0xc069 show one member ? I don't think OpenSM does this but if the multicast groups are disjoint, the same MLID could be used for two different groups (MGIDs) in different parts of the subnet. Sasha is probably best to comment on what has changed in this area. Is it possible to try this with the latest OpenSM to see if this has been fixed ? -- Hal > Anyone else seen similar? > > -- > Arthur > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From akepner at sgi.com Wed Nov 12 15:54:31 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Wed, 12 Nov 2008 15:54:31 -0800 Subject: [ofa-general] opensm: bad multicast forwarding table entries In-Reply-To: References: <20081112221846.GE25248@sgi.com> Message-ID: <20081112235431.GG25248@sgi.com> On Wed, Nov 12, 2008 at 06:27:29PM -0500, Hal Rosenstock wrote: Thanks for having a look at this, Hal. > On Wed, Nov 12, 2008 at 5:18 PM, wrote: > > ..... > > -I- Multicast Group:0xC069 has:2 switches and:2 HCAs > > -E- Disconnected switch:S0800690000002e51/U1 in group:0xC069 > > -E- Disconnected HCA:r4i2n10/U1 > > Is it really an error to have a multicast group like this ? Well, 'ibidagnet -r' reports it as an error. > ... I agree > it's not needed to route if there's only 1 member port. > > Can you describe the scenario under which this occurs ? Are things > steady state or are there changes going on in the subnet ? Any errors > in the opensm log ? As far as I know, this is steady state behavior. I'll check about opensm logging any errors. > ..... > > So far, so good. But we also have r4i2n10, connected to the switch with > > lid 1533 port 7: > > > > switchguid=0x800690000002e50(800690000002e50) > > Switch 24 "S-0800690000002e50" # "MT47396 Infiniscale-III Mellanox Technologies" base port 0 lid 1533 lmc 0 > > ...... > > [7] "H-003048c2438a0000"[1](3048c2438a0001) # "r4i2n10 HCA-1" lid 771 4xDDR > > > > with this mft entry: > > > > Multicast mlids [0xc000-0xc3ff] of switch Lid 1533 guid 0x0800690000002e50 (MT47396 Infiniscale-III Mellanox Technologies): > > 0 1 2 > > Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 > > MLid > > ..... > > 0xc069 x > > > > Any idea why "r4i2n10", with PortGid fe80::3048c2438a0001 would have a > > mft entry for the multicast group with MGID ff12601bffff::1ff26d289? > > The MFT entry is based on an MLID and not the MGID. What does saquery > -g show ? Does it show one or more than one MGID with an MLID of > 0xc069 ? Will also try to get this information. > Also, does saquery -m 0xc069 show one member ? Yes, only one member. > > I don't think OpenSM does this but if the multicast groups are > disjoint, the same MLID could be used for two different groups (MGIDs) > in different parts of the subnet. > Oh, that'd be confusing. > Sasha is probably best to comment on what has changed in this area. Is > it possible to try this with the latest OpenSM to see if this has been > fixed ? > I doubt that this alone would be important enough to get the customer to try upgrading opensm, but I can let them know it's an option - especially if there's good reason to think it'd help. -- Arthur From sashak at voltaire.com Wed Nov 12 16:03:36 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 13 Nov 2008 02:03:36 +0200 Subject: [ofa-general] [PATCH] opensm/osm_subnet.c: consolidate logging code In-Reply-To: <20081111202648.GB8894@sashak.voltaire.com> References: <1225404081.1197.534.camel@cardanus.llnl.gov> <20081110210233.GE3467@sashak.voltaire.com> <1226351730.13603.27.camel@cardanus.llnl.gov> <1226353273.13603.39.camel@cardanus.llnl.gov> <20081111202648.GB8894@sashak.voltaire.com> Message-ID: <20081113000336.GE27271@sashak.voltaire.com> Consolidate code like: char buff[128]; sprintf(buff, fmt, ...); printf("%s", buff); cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); into single log_report() function. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_subnet.c | 169 ++++++++++++++++---------------------------- 1 files changed, 60 insertions(+), 109 deletions(-) diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 71ba7f5..666c93c 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -468,6 +468,17 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) /********************************************************************** **********************************************************************/ +static void log_report(const char *fmt, ...) +{ + char buf[128]; + va_list args; + va_start(args, fmt); + vsnprintf(buf, sizeof(buf), fmt, args); + va_end(args); + printf(buf); + cl_log_event("OpenSM", CL_LOG_INFO, buf, NULL, 0); +} + static void log_config_value(char *name, const char *fmt, ...) { char buf[128]; @@ -839,28 +850,20 @@ int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) static void subn_verify_max_vls(IN unsigned *max_vls, IN char *key) { - char buff[128]; - if (*max_vls > 15) { - sprintf(buff, " Invalid Cached Option:%s=%u:" - "Using Default:%u\n", - key, *max_vls, OSM_DEFAULT_QOS_MAX_VLS); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); + log_report(" Invalid Cached Option:%s=%u:" + "Using Default:%u\n", + key, *max_vls, OSM_DEFAULT_QOS_MAX_VLS); *max_vls = OSM_DEFAULT_QOS_MAX_VLS; } } static void subn_verify_high_limit(IN unsigned *high_limit, IN char *key) { - char buff[128]; - if (*high_limit > 255) { - sprintf(buff, " Invalid Cached Option:%s=%u:" - "Using Default:%u\n", - key, *high_limit, OSM_DEFAULT_QOS_HIGH_LIMIT); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); + log_report(" Invalid Cached Option:%s=%u:" + "Using Default:%u\n", + key, *high_limit, OSM_DEFAULT_QOS_HIGH_LIMIT); *high_limit = OSM_DEFAULT_QOS_HIGH_LIMIT; } } @@ -868,7 +871,6 @@ static void subn_verify_high_limit(IN unsigned *high_limit, IN char *key) static void subn_verify_vlarb(IN char *vlarb, IN char *key) { if (vlarb) { - char buff[128]; char *str, *tok, *end, *ptr; int count = 0; @@ -890,60 +892,39 @@ static void subn_verify_vlarb(IN char *vlarb, IN char *key) vl = strtol(vl_str, &end, 0); - if (*end) { - sprintf(buff, + if (*end) + log_report( " Warning: Cached Option %s:vl=%s improperly formatted\n", key, vl_str); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, - buff, NULL, 0); - } else if (vl < 0 || vl > 14) { - sprintf(buff, + else if (vl < 0 || vl > 14) + log_report( " Warning: Cached Option %s:vl=%ld out of range\n", key, vl); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, - buff, NULL, 0); - } weight = strtol(weight_str, &end, 0); - if (*end) { - sprintf(buff, + if (*end) + log_report( " Warning: Cached Option %s:weight=%s improperly formatted\n", key, weight_str); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, - buff, NULL, 0); - } else if (weight < 0 || weight > 255) { - sprintf(buff, + else if (weight < 0 || weight > 255) + log_report( " Warning: Cached Option %s:weight=%ld out of range\n", key, weight); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, - buff, NULL, 0); - } - } else { - sprintf(buff, + } else + log_report( " Warning: Cached Option %s:vl:weight=%s improperly formatted\n", key, tok); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, - 0); - } count++; tok = strtok_r(NULL, ",\n", &ptr); } - if (count > 64) { - sprintf(buff, + if (count > 64) + log_report( " Warning: Cached Option %s: > 64 listed: " "excess vl:weight pairs will be dropped\n", key); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); - } free(str); } @@ -952,7 +933,6 @@ static void subn_verify_vlarb(IN char *vlarb, IN char *key) static void subn_verify_sl2vl(IN char *sl2vl, IN char *key) { if (sl2vl) { - char buff[128]; char *str, *tok, *end, *ptr; int count = 0; @@ -963,40 +943,26 @@ static void subn_verify_sl2vl(IN char *sl2vl, IN char *key) while (tok) { long vl = strtol(tok, &end, 0); - if (*end) { - sprintf(buff, + if (*end) + log_report( " Warning: Cached Option %s:vl=%s improperly formatted\n", key, tok); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, - 0); - } else if (vl < 0 || vl > 15) { - sprintf(buff, + else if (vl < 0 || vl > 15) + log_report( " Warning: Cached Option %s:vl=%ld out of range\n", key, vl); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, - 0); - } count++; tok = strtok_r(NULL, ",\n", &ptr); } - if (count < 16) { - sprintf(buff, - " Warning: Cached Option %s: < 16 VLs listed\n", - key); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); - } - if (count > 16) { - sprintf(buff, - " Warning: Cached Option %s: > 16 listed: " - "excess VLs will be dropped\n", key); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); - } + if (count < 16) + log_report(" Warning: Cached Option %s: < 16 VLs " + "listed\n", key); + + if (count > 16) + log_report(" Warning: Cached Option %s: > 16 listed: " + "excess VLs will be dropped\n", key); free(str); } @@ -1004,33 +970,24 @@ static void subn_verify_sl2vl(IN char *sl2vl, IN char *key) static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) { - char buff[128]; - if (p_opts->lmc > 7) { - sprintf(buff, " Invalid Cached Option Value:lmc = %u:" - "Using Default:%u\n", p_opts->lmc, OSM_DEFAULT_LMC); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); + log_report(" Invalid Cached Option Value:lmc = %u:" + "Using Default:%u\n", p_opts->lmc, OSM_DEFAULT_LMC); p_opts->lmc = OSM_DEFAULT_LMC; } if (15 < p_opts->sm_priority) { - sprintf(buff, " Invalid Cached Option Value:sm_priority = %u:" - "Using Default:%u\n", - p_opts->sm_priority, OSM_DEFAULT_SM_PRIORITY); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); + log_report(" Invalid Cached Option Value:sm_priority = %u:" + "Using Default:%u\n", + p_opts->sm_priority, OSM_DEFAULT_SM_PRIORITY); p_opts->sm_priority = OSM_DEFAULT_SM_PRIORITY; } if ((15 < p_opts->force_link_speed) || (p_opts->force_link_speed > 7 && p_opts->force_link_speed < 15)) { - sprintf(buff, - " Invalid Cached Option Value:force_link_speed = %u:" - "Using Default:%u\n", p_opts->force_link_speed, - IB_PORT_LINK_SPEED_ENABLED_MASK); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); + log_report(" Invalid Cached Option Value:force_link_speed = %u:" + "Using Default:%u\n", p_opts->force_link_speed, + IB_PORT_LINK_SPEED_ENABLED_MASK); p_opts->force_link_speed = IB_PORT_LINK_SPEED_ENABLED_MASK; } @@ -1041,11 +998,9 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) && strcmp(p_opts->console, OSM_REMOTE_CONSOLE) #endif ) { - sprintf(buff, " Invalid Cached Option Value:console = %s" - ", Using Default:%s\n", - p_opts->console, OSM_DEFAULT_CONSOLE); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); + log_report(" Invalid Cached Option Value:console = %s" + ", Using Default:%s\n", + p_opts->console, OSM_DEFAULT_CONSOLE); p_opts->console = OSM_DEFAULT_CONSOLE; } @@ -1108,22 +1063,18 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) } #ifdef ENABLE_OSM_PERF_MGR if (p_opts->perfmgr_sweep_time_s < 1) { - sprintf(buff, - " Invalid Cached Option Value:perfmgr_sweep_time_s = %u" - "Using Default:%u\n", p_opts->perfmgr_sweep_time_s, - OSM_PERFMGR_DEFAULT_SWEEP_TIME_S); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); + log_report(" Invalid Cached Option Value:perfmgr_sweep_time_s " + "= %u Using Default:%u\n", + p_opts->perfmgr_sweep_time_s, + OSM_PERFMGR_DEFAULT_SWEEP_TIME_S); p_opts->perfmgr_sweep_time_s = OSM_PERFMGR_DEFAULT_SWEEP_TIME_S; } if (p_opts->perfmgr_max_outstanding_queries < 1) { - sprintf(buff, - " Invalid Cached Option Value:perfmgr_max_outstanding_queries = %u" - "Using Default:%u\n", - p_opts->perfmgr_max_outstanding_queries, - OSM_PERFMGR_DEFAULT_MAX_OUTSTANDING_QUERIES); - printf(buff); - cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0); + log_report(" Invalid Cached Option Value:" + "perfmgr_max_outstanding_queries = %u" + " Using Default:%u\n", + p_opts->perfmgr_max_outstanding_queries, + OSM_PERFMGR_DEFAULT_MAX_OUTSTANDING_QUERIES); p_opts->perfmgr_max_outstanding_queries = OSM_PERFMGR_DEFAULT_MAX_OUTSTANDING_QUERIES; } -- 1.6.0.3.517.g759a From sashak at voltaire.com Wed Nov 12 16:04:10 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 13 Nov 2008 02:04:10 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_subnet.c: consolidate logging code In-Reply-To: <20081113000336.GE27271@sashak.voltaire.com> References: <1225404081.1197.534.camel@cardanus.llnl.gov> <20081110210233.GE3467@sashak.voltaire.com> <1226351730.13603.27.camel@cardanus.llnl.gov> <1226353273.13603.39.camel@cardanus.llnl.gov> <20081111202648.GB8894@sashak.voltaire.com> <20081113000336.GE27271@sashak.voltaire.com> Message-ID: <20081113000410.GF27271@sashak.voltaire.com> >From c7fd1c7668acc5f5c1819f23b35a0baad0c09045 Mon Sep 17 00:00:00 2001 From: Sasha Khapyorsky Date: Thu, 13 Nov 2008 01:20:07 +0200 Subject: [PATCH] opensm/osm_subnet.c: use strdup() function Instead of malloc() and strcpy() use strdup() function. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_subnet.c | 9 +++------ 1 files changed, 3 insertions(+), 6 deletions(-) diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 666c93c..cd8c8e5 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -611,8 +611,7 @@ opts_unpack_charp(IN char *p_req_key, Ignore the possible memory leak here; the pointer may be to a static default. */ - *p_val = (char *)malloc(strlen(p_val_str) + 1); - strcpy(*p_val, p_val_str); + *p_val = strdup(p_val_str); } } } @@ -874,8 +873,7 @@ static void subn_verify_vlarb(IN char *vlarb, IN char *key) char *str, *tok, *end, *ptr; int count = 0; - str = (char *)malloc(strlen(vlarb) + 1); - strcpy(str, vlarb); + str = strdup(vlarb); tok = strtok_r(str, ",\n", &ptr); while (tok) { @@ -936,8 +934,7 @@ static void subn_verify_sl2vl(IN char *sl2vl, IN char *key) char *str, *tok, *end, *ptr; int count = 0; - str = (char *)malloc(strlen(sl2vl) + 1); - strcpy(str, sl2vl); + str = strdup(sl2vl); tok = strtok_r(str, ",\n", &ptr); while (tok) { -- 1.6.0.3.517.g759a From sashak at voltaire.com Wed Nov 12 16:05:28 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 13 Nov 2008 02:05:28 +0200 Subject: [ofa-general] [PATCH] opensm/osm_subnet.c: consolidate qos parameters verification code In-Reply-To: <20081113000336.GE27271@sashak.voltaire.com> References: <1225404081.1197.534.camel@cardanus.llnl.gov> <20081110210233.GE3467@sashak.voltaire.com> <1226351730.13603.27.camel@cardanus.llnl.gov> <1226353273.13603.39.camel@cardanus.llnl.gov> <20081111202648.GB8894@sashak.voltaire.com> <20081113000336.GE27271@sashak.voltaire.com> Message-ID: <20081113000528.GG27271@sashak.voltaire.com> Consolidate qos config parameters verification code. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_subnet.c | 150 +++++++++++++++++--------------------------- 1 files changed, 58 insertions(+), 92 deletions(-) diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index cd8c8e5..006d14e 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -847,27 +847,28 @@ int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) /********************************************************************** **********************************************************************/ -static void subn_verify_max_vls(IN unsigned *max_vls, IN char *key) +static void subn_verify_max_vls(unsigned *max_vls, const char *prefix) { if (*max_vls > 15) { - log_report(" Invalid Cached Option:%s=%u:" + log_report(" Invalid Cached Option:%s_max_vls=%u:" "Using Default:%u\n", - key, *max_vls, OSM_DEFAULT_QOS_MAX_VLS); + prefix, *max_vls, OSM_DEFAULT_QOS_MAX_VLS); *max_vls = OSM_DEFAULT_QOS_MAX_VLS; } } -static void subn_verify_high_limit(IN unsigned *high_limit, IN char *key) +static void subn_verify_high_limit(unsigned *high_limit, const char *prefix) { if (*high_limit > 255) { - log_report(" Invalid Cached Option:%s=%u:" + log_report(" Invalid Cached Option:%s_high_limit=%u:" "Using Default:%u\n", - key, *high_limit, OSM_DEFAULT_QOS_HIGH_LIMIT); + prefix, *high_limit, OSM_DEFAULT_QOS_HIGH_LIMIT); *high_limit = OSM_DEFAULT_QOS_HIGH_LIMIT; } } -static void subn_verify_vlarb(IN char *vlarb, IN char *key) +static void subn_verify_vlarb(char *vlarb, const char *prefix, + const char *suffix) { if (vlarb) { char *str, *tok, *end, *ptr; @@ -891,44 +892,48 @@ static void subn_verify_vlarb(IN char *vlarb, IN char *key) vl = strtol(vl_str, &end, 0); if (*end) - log_report( - " Warning: Cached Option %s:vl=%s improperly formatted\n", - key, vl_str); + log_report(" Warning: Cached Option " + "%s_vlarb_%s:vl=%s " + "improperly formatted\n", + prefix, suffix, vl_str); else if (vl < 0 || vl > 14) - log_report( - " Warning: Cached Option %s:vl=%ld out of range\n", - key, vl); + log_report(" Warning: Cached Option " + "%s_vlarb_%s:vl=%ld out " + "of range\n", + prefix, suffix, vl); weight = strtol(weight_str, &end, 0); if (*end) - log_report( - " Warning: Cached Option %s:weight=%s improperly formatted\n", - key, weight_str); + log_report(" Warning: Cached Option " + "%s_vlarb_%s:weight=%s " + "improperly formatted\n", + prefix, suffix, weight_str); else if (weight < 0 || weight > 255) - log_report( - " Warning: Cached Option %s:weight=%ld out of range\n", - key, weight); + log_report(" Warning: Cached Option " + "%s_vlarb_%s:weight=%ld " + "out of range\n", + prefix, suffix, weight); } else - log_report( - " Warning: Cached Option %s:vl:weight=%s improperly formatted\n", - key, tok); + log_report(" Warning: Cached Option " + "%s_vlarb_%s:vl:weight=%s " + "improperly formatted\n", + prefix, suffix, tok); count++; tok = strtok_r(NULL, ",\n", &ptr); } if (count > 64) - log_report( - " Warning: Cached Option %s: > 64 listed: " - "excess vl:weight pairs will be dropped\n", - key); + log_report(" Warning: Cached Option %s_vlarb_%s: " + "> 64 listed: excess vl:weight pairs " + "will be dropped\n", prefix, suffix); free(str); } } -static void subn_verify_sl2vl(IN char *sl2vl, IN char *key) +static void subn_verify_sl2vl(char *sl2vl, const char *prefix) { if (sl2vl) { char *str, *tok, *end, *ptr; @@ -941,30 +946,40 @@ static void subn_verify_sl2vl(IN char *sl2vl, IN char *key) long vl = strtol(tok, &end, 0); if (*end) - log_report( - " Warning: Cached Option %s:vl=%s improperly formatted\n", - key, tok); + log_report(" Warning: Cached Option %s_sl2vl:" + "vl=%s improperly formatted\n", + prefix, tok); else if (vl < 0 || vl > 15) - log_report( - " Warning: Cached Option %s:vl=%ld out of range\n", - key, vl); + log_report(" Warning: Cached Option %s_sl2vl:" + "vl=%ld out of range\n", + prefix, vl); count++; tok = strtok_r(NULL, ",\n", &ptr); } if (count < 16) - log_report(" Warning: Cached Option %s: < 16 VLs " - "listed\n", key); + log_report(" Warning: Cached Option %s_sl2vl: < 16 VLs " + "listed\n", prefix); if (count > 16) - log_report(" Warning: Cached Option %s: > 16 listed: " - "excess VLs will be dropped\n", key); + log_report(" Warning: Cached Option %s_sl2vl: " + "> 16 listed: excess VLs will be dropped\n", + prefix); free(str); } } +static void subn_verify_qos_set(osm_qos_options_t *set, const char *prefix) +{ + subn_verify_max_vls(&set->max_vls, prefix); + subn_verify_high_limit(&set->high_limit, prefix); + subn_verify_vlarb(set->vlarb_low, prefix, "low"); + subn_verify_vlarb(set->vlarb_high, prefix, "high"); + subn_verify_sl2vl(set->sl2vl, prefix); +} + static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) { if (p_opts->lmc > 7) { @@ -1002,62 +1017,13 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) } if (p_opts->qos) { - subn_verify_max_vls(&(p_opts->qos_options.max_vls), - "qos_max_vls"); - subn_verify_max_vls(&(p_opts->qos_ca_options.max_vls), - "qos_ca_max_vls"); - subn_verify_max_vls(&(p_opts->qos_sw0_options.max_vls), - "qos_sw0_max_vls"); - subn_verify_max_vls(&(p_opts->qos_swe_options.max_vls), - "qos_swe_max_vls"); - subn_verify_max_vls(&(p_opts->qos_rtr_options.max_vls), - "qos_rtr_max_vls"); - - subn_verify_high_limit(&(p_opts->qos_options.high_limit), - "qos_high_limit"); - subn_verify_high_limit(&(p_opts->qos_ca_options.high_limit), - "qos_ca_high_limit"); - subn_verify_high_limit(& - (p_opts->qos_sw0_options.high_limit), - "qos_sw0_high_limit"); - subn_verify_high_limit(& - (p_opts->qos_swe_options.high_limit), - "qos_swe_high_limit"); - subn_verify_high_limit(& - (p_opts->qos_rtr_options.high_limit), - "qos_rtr_high_limit"); - - subn_verify_vlarb(p_opts->qos_options.vlarb_low, - "qos_vlarb_low"); - subn_verify_vlarb(p_opts->qos_ca_options.vlarb_low, - "qos_ca_vlarb_low"); - subn_verify_vlarb(p_opts->qos_sw0_options.vlarb_low, - "qos_sw0_vlarb_low"); - subn_verify_vlarb(p_opts->qos_swe_options.vlarb_low, - "qos_swe_vlarb_low"); - subn_verify_vlarb(p_opts->qos_rtr_options.vlarb_low, - "qos_rtr_vlarb_low"); - - subn_verify_vlarb(p_opts->qos_options.vlarb_high, - "qos_vlarb_high"); - subn_verify_vlarb(p_opts->qos_ca_options.vlarb_high, - "qos_ca_vlarb_high"); - subn_verify_vlarb(p_opts->qos_sw0_options.vlarb_high, - "qos_sw0_vlarb_high"); - subn_verify_vlarb(p_opts->qos_swe_options.vlarb_high, - "qos_swe_vlarb_high"); - subn_verify_vlarb(p_opts->qos_rtr_options.vlarb_high, - "qos_rtr_vlarb_high"); - - subn_verify_sl2vl(p_opts->qos_options.sl2vl, "qos_sl2vl"); - subn_verify_sl2vl(p_opts->qos_ca_options.sl2vl, "qos_ca_sl2vl"); - subn_verify_sl2vl(p_opts->qos_sw0_options.sl2vl, - "qos_sw0_sl2vl"); - subn_verify_sl2vl(p_opts->qos_swe_options.sl2vl, - "qos_swe_sl2vl"); - subn_verify_sl2vl(p_opts->qos_rtr_options.sl2vl, - "qos_rtr_sl2vl"); + subn_verify_qos_set(&p_opts->qos_options, "qos"); + subn_verify_qos_set(&p_opts->qos_ca_options, "qos_ca"); + subn_verify_qos_set(&p_opts->qos_sw0_options, "qos_sw0"); + subn_verify_qos_set(&p_opts->qos_swe_options, "qos_swe"); + subn_verify_qos_set(&p_opts->qos_rtr_options, "qos_rtr"); } + #ifdef ENABLE_OSM_PERF_MGR if (p_opts->perfmgr_sweep_time_s < 1) { log_report(" Invalid Cached Option Value:perfmgr_sweep_time_s " -- 1.6.0.3.517.g759a From sashak at voltaire.com Wed Nov 12 16:19:44 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 13 Nov 2008 02:19:44 +0200 Subject: [ofa-general] [PATCH] opensm/osm_subnet.c: move osm_subn_rescan_conf_files() function In-Reply-To: <20081113000528.GG27271@sashak.voltaire.com> References: <1225404081.1197.534.camel@cardanus.llnl.gov> <20081110210233.GE3467@sashak.voltaire.com> <1226351730.13603.27.camel@cardanus.llnl.gov> <1226353273.13603.39.camel@cardanus.llnl.gov> <20081111202648.GB8894@sashak.voltaire.com> <20081113000336.GE27271@sashak.voltaire.com> <20081113000528.GG27271@sashak.voltaire.com> Message-ID: <20081113001944.GH27271@sashak.voltaire.com> Move osm_subn_rescan_conf_files() function. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_subnet.c | 116 +++++++++++++++++++++----------------------- 1 files changed, 56 insertions(+), 60 deletions(-) diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 006d14e..8569043 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -71,8 +71,6 @@ static const char null_str[] = "(null)"; -static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts); - /********************************************************************** **********************************************************************/ void osm_subn_construct(IN osm_subn_t * const p_subn) @@ -788,64 +786,6 @@ osm_parse_prefix_routes_file(IN osm_subn_t * const p_subn) /********************************************************************** **********************************************************************/ -int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) -{ - FILE *opts_file; - char line[1024]; - char *p_key, *p_val, *p_last; - - if (!p_subn->opt.config_file) - return 0; - - opts_file = fopen(p_subn->opt.config_file, "r"); - if (!opts_file) { - if (errno == ENOENT) - return 1; - OSM_LOG(&p_subn->p_osm->log, OSM_LOG_ERROR, - "cannot open file \'%s\': %s\n", - p_subn->opt.config_file, strerror(errno)); - return -1; - } - - while (fgets(line, 1023, opts_file) != NULL) { - /* get the first token */ - p_key = strtok_r(line, " \t\n", &p_last); - if (p_key) { - p_val = strtok_r(NULL, " \t\n", &p_last); - - subn_parse_qos_options("qos", - p_key, p_val, - &p_subn->opt.qos_options); - - subn_parse_qos_options("qos_ca", - p_key, p_val, - &p_subn->opt.qos_ca_options); - - subn_parse_qos_options("qos_sw0", - p_key, p_val, - &p_subn->opt.qos_sw0_options); - - subn_parse_qos_options("qos_swe", - p_key, p_val, - &p_subn->opt.qos_swe_options); - - subn_parse_qos_options("qos_rtr", - p_key, p_val, - &p_subn->opt.qos_rtr_options); - - } - } - fclose(opts_file); - - subn_verify_conf_file(&p_subn->opt); - - osm_parse_prefix_routes_file(p_subn); - - return 0; -} - -/********************************************************************** - **********************************************************************/ static void subn_verify_max_vls(unsigned *max_vls, const char *prefix) { @@ -1308,6 +1248,62 @@ int osm_subn_parse_conf_file(char *file_name, osm_subn_opt_t * const p_opts) return 0; } +int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) +{ + FILE *opts_file; + char line[1024]; + char *p_key, *p_val, *p_last; + + if (!p_subn->opt.config_file) + return 0; + + opts_file = fopen(p_subn->opt.config_file, "r"); + if (!opts_file) { + if (errno == ENOENT) + return 1; + OSM_LOG(&p_subn->p_osm->log, OSM_LOG_ERROR, + "cannot open file \'%s\': %s\n", + p_subn->opt.config_file, strerror(errno)); + return -1; + } + + while (fgets(line, 1023, opts_file) != NULL) { + /* get the first token */ + p_key = strtok_r(line, " \t\n", &p_last); + if (p_key) { + p_val = strtok_r(NULL, " \t\n", &p_last); + + subn_parse_qos_options("qos", + p_key, p_val, + &p_subn->opt.qos_options); + + subn_parse_qos_options("qos_ca", + p_key, p_val, + &p_subn->opt.qos_ca_options); + + subn_parse_qos_options("qos_sw0", + p_key, p_val, + &p_subn->opt.qos_sw0_options); + + subn_parse_qos_options("qos_swe", + p_key, p_val, + &p_subn->opt.qos_swe_options); + + subn_parse_qos_options("qos_rtr", + p_key, p_val, + &p_subn->opt.qos_rtr_options); + + } + } + fclose(opts_file); + + subn_verify_conf_file(&p_subn->opt); + + osm_parse_prefix_routes_file(p_subn); + + return 0; +} + /********************************************************************** **********************************************************************/ int osm_subn_output_conf(FILE *out, IN osm_subn_opt_t *const p_opts) -- 1.6.0.3.517.g759a From sashak at voltaire.com Wed Nov 12 16:24:03 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 13 Nov 2008 02:24:03 +0200 Subject: [ofa-general] Re: [opensm patch][1/2] fix qos config parsing bugs In-Reply-To: <1226447872.6239.2.camel@cardanus.llnl.gov> References: <1225404078.1197.533.camel@cardanus.llnl.gov> <20081111191958.GA8894@sashak.voltaire.com> <1226447872.6239.2.camel@cardanus.llnl.gov> Message-ID: <20081113002403.GI27271@sashak.voltaire.com> Hi Al, On 15:57 Tue 11 Nov , Al Chu wrote: > > Sorry, I may have not explained it well. Lets say I do this in the > config file. > > qos_vlarb_high FOOBAR > # qos_ca_vlarb_high BLAH > qos_swe_vlarb_high XYZZY > > I currently expect qos_ca_vlarb_high to use the value of FOOBAR because > I commented out the field. But it uses OSM_DEFAULT_QOS_HIGH_LIMIT > instead. The reason is because qos_build_config() checks for NULL to > use default vs. non-default values. > > p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; > > Under the above situation where I've commented out veral fields, opt- > >vlarb_high is always non-NULL b/c it was set to > OSM_DEFAULT_QOS_HIGH_LIMIT. Thus OSM_DEFAULT_QOS_HIGH_LIMIT is used > instead of FOOBAR. > > > > 2) > > > > > > In qos_build_config() we load the high_limit like this: > > > > > > cfg->vl_high_limit = (uint8_t) opt->high_limit; > > > > > > So there is no way to tell the qos_ca, qos_swe, qos_rtr, etc. high_limit > > > options to "go back to" the default high_limit. It just assumes that > > > whatever is input (or was set by default) is what you should use. > > > > Right. What is a limitation here? That an user cannot set this to > > "no value"? But she/he can just skip it. > > Similar to the above issue, lets say I want to do: > > qos_high_limit 8 > # qos_ca_high_limit 15 > # qos_swe_high_limit 15 > > I want qos_ca_high_limit and qos_swe_high_limit to use whatever I set in > qos_high_limit. But the code doesn't allow for this. > > > > > > 3) > > > > > > Some fields like qos_vlarb_high are assumed to be correctly set and can > > > segfault opensm. > > > > qos_build_config() assumes that valid parameters are used. And we are > > using this this way (I hope :)) (finally it is not library API). > > I think the issue is the osm_subnet.c code did not properly check all > inputs, and subsequently some inputs used in qos_build_config() were > bad. I think > > qos_vlarb_high (null) > > was something I tried that opensm seg-faulted on. Ok. I see now. Probably it will be simpler just to generate a valid qos parameter sets right after parser (in verification time)? Like in your modified (and rebased against recent patches) patch below? Sasha >From a973a8a1ea6c805cf07965d86731ae04510266ce Mon Sep 17 00:00:00 2001 From: Al Chu Date: Mon, 10 Nov 2008 13:41:04 -0800 Subject: [PATCH] fix qos config parsing bugs Signed-off-by: Albert Chu Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_subnet.h | 12 +- opensm/opensm/osm_qos.c | 6 +- opensm/opensm/osm_subnet.c | 298 ++++++++++++++++++++--------------- 3 files changed, 181 insertions(+), 135 deletions(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index a16cbce..2bcd232 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -100,7 +100,7 @@ struct osm_qos_policy; */ typedef struct osm_qos_options { unsigned max_vls; - unsigned high_limit; + int high_limit; char *vlarb_high; char *vlarb_low; char *sl2vl; @@ -109,20 +109,20 @@ typedef struct osm_qos_options { * FIELDS * * max_vls -* The number of maximum VLs on the Subnet +* The number of maximum VLs on the Subnet (0 == use default) * * high_limit * The limit of High Priority component of VL Arbitration -* table (IBA 7.6.9) +* table (IBA 7.6.9) (-1 == use default) * * vlarb_high -* High priority VL Arbitration table template. +* High priority VL Arbitration table template. (NULL == use default) * * vlarb_low -* Low priority VL Arbitration table template. +* Low priority VL Arbitration table template. (NULL == use default) * * sl2vl -* SL2VL Mapping table (IBA 7.6.6) template. +* SL2VL Mapping table (IBA 7.6.6) template. (NULL == use default) * *********/ diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c index 1679ae0..b451c25 100644 --- a/opensm/opensm/osm_qos.c +++ b/opensm/opensm/osm_qos.c @@ -382,7 +382,11 @@ static void qos_build_config(struct qos_config *cfg, memset(cfg, 0, sizeof(*cfg)); cfg->max_vls = opt->max_vls > 0 ? opt->max_vls : dflt->max_vls; - cfg->vl_high_limit = (uint8_t) opt->high_limit; + + if (opt->high_limit >= 0) + cfg->vl_high_limit = (uint8_t) opt->high_limit; + else + cfg->vl_high_limit = (uint8_t) dflt->high_limit; p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; for (i = 0; i < 2 * IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; i++) { diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 8569043..1c9777e 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -370,6 +370,15 @@ static void subn_set_default_qos_options(IN osm_qos_options_t * opt) opt->sl2vl = OSM_DEFAULT_QOS_SL2VL; } +static void subn_init_qos_options(IN osm_qos_options_t * opt) +{ + opt->max_vls = 0; + opt->high_limit = -1; + opt->vlarb_high = NULL; + opt->vlarb_low = NULL; + opt->sl2vl = NULL; +} + /********************************************************************** **********************************************************************/ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) @@ -457,11 +466,11 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) p_opt->no_clients_rereg = FALSE; p_opt->prefix_routes_file = OSM_DEFAULT_PREFIX_ROUTES_FILE; p_opt->consolidate_ipv6_snm_req = FALSE; - subn_set_default_qos_options(&p_opt->qos_options); - subn_set_default_qos_options(&p_opt->qos_ca_options); - subn_set_default_qos_options(&p_opt->qos_sw0_options); - subn_set_default_qos_options(&p_opt->qos_swe_options); - subn_set_default_qos_options(&p_opt->qos_rtr_options); + subn_init_qos_options(&p_opt->qos_options); + subn_init_qos_options(&p_opt->qos_ca_options); + subn_init_qos_options(&p_opt->qos_sw0_options); + subn_init_qos_options(&p_opt->qos_swe_options); + subn_init_qos_options(&p_opt->qos_rtr_options); } /********************************************************************** @@ -526,6 +535,21 @@ opts_unpack_uint32(IN char *p_req_key, /********************************************************************** **********************************************************************/ static void +opts_unpack_int32(IN char *p_req_key, + IN char *p_key, IN char *p_val_str, IN int32_t * p_val) +{ + if (!strcmp(p_req_key, p_key)) { + int32_t val = strtol(p_val_str, NULL, 0); + if (val != *p_val) { + log_config_value(p_key, "%d", val); + *p_val = val; + } + } +} + +/********************************************************************** + **********************************************************************/ +static void opts_unpack_uint16(IN char *p_req_key, IN char *p_key, IN char *p_val_str, IN uint16_t * p_val) { @@ -651,7 +675,7 @@ subn_parse_qos_options(IN const char *prefix, snprintf(name, sizeof(name), "%s_max_vls", prefix); opts_unpack_uint32(name, p_key, p_val_str, &opt->max_vls); snprintf(name, sizeof(name), "%s_high_limit", prefix); - opts_unpack_uint32(name, p_key, p_val_str, &opt->high_limit); + opts_unpack_int32(name, p_key, p_val_str, &opt->high_limit); snprintf(name, sizeof(name), "%s_vlarb_high", prefix); opts_unpack_charp(name, p_key, p_val_str, &opt->vlarb_high); snprintf(name, sizeof(name), "%s_vlarb_low", prefix); @@ -786,138 +810,142 @@ osm_parse_prefix_routes_file(IN osm_subn_t * const p_subn) /********************************************************************** **********************************************************************/ - -static void subn_verify_max_vls(unsigned *max_vls, const char *prefix) +static void subn_verify_max_vls(unsigned *max_vls, const char *prefix, unsigned dflt) { - if (*max_vls > 15) { - log_report(" Invalid Cached Option:%s_max_vls=%u:" - "Using Default:%u\n", - prefix, *max_vls, OSM_DEFAULT_QOS_MAX_VLS); - *max_vls = OSM_DEFAULT_QOS_MAX_VLS; + if (!(*max_vls) || *max_vls > 15) { + log_report(" Invalid Cached Option: %s_max_vls=%u: " + "Using Default = %u\n", prefix, *max_vls, dflt); + *max_vls = dflt; } } -static void subn_verify_high_limit(unsigned *high_limit, const char *prefix) +static void subn_verify_high_limit(int *high_limit, const char *prefix, int dflt) { - if (*high_limit > 255) { - log_report(" Invalid Cached Option:%s_high_limit=%u:" - "Using Default:%u\n", - prefix, *high_limit, OSM_DEFAULT_QOS_HIGH_LIMIT); - *high_limit = OSM_DEFAULT_QOS_HIGH_LIMIT; + if (*high_limit < 0 || *high_limit > 255) { + log_report(" Invalid Cached Option: %s_high_limit=%d: " + "Using Default: %d\n", prefix, *high_limit, dflt); + *high_limit = dflt; } } -static void subn_verify_vlarb(char *vlarb, const char *prefix, - const char *suffix) +static void subn_verify_vlarb(char **vlarb, const char *prefix, + const char *suffix, char *dflt) { - if (vlarb) { - char *str, *tok, *end, *ptr; - int count = 0; - - str = strdup(vlarb); - - tok = strtok_r(str, ",\n", &ptr); - while (tok) { - char *vl_str, *weight_str; - - vl_str = tok; - weight_str = strchr(tok, ':'); - - if (weight_str) { - long vl, weight; - - *weight_str = '\0'; - weight_str++; - - vl = strtol(vl_str, &end, 0); - - if (*end) - log_report(" Warning: Cached Option " - "%s_vlarb_%s:vl=%s " - "improperly formatted\n", - prefix, suffix, vl_str); - else if (vl < 0 || vl > 14) - log_report(" Warning: Cached Option " - "%s_vlarb_%s:vl=%ld out " - "of range\n", - prefix, suffix, vl); - - weight = strtol(weight_str, &end, 0); - - if (*end) - log_report(" Warning: Cached Option " - "%s_vlarb_%s:weight=%s " - "improperly formatted\n", - prefix, suffix, weight_str); - else if (weight < 0 || weight > 255) - log_report(" Warning: Cached Option " - "%s_vlarb_%s:weight=%ld " - "out of range\n", - prefix, suffix, weight); - } else - log_report(" Warning: Cached Option " - "%s_vlarb_%s:vl:weight=%s " - "improperly formatted\n", - prefix, suffix, tok); + char *str, *tok, *end, *ptr; + int count = 0; + + if (*vlarb == NULL) { + log_report(" Invalid Cached Option: %s_vlarb_%s: " + "Using Default\n", prefix, suffix); + *vlarb = dflt; + return; + } - count++; - tok = strtok_r(NULL, ",\n", &ptr); - } + str = strdup(*vlarb); + + tok = strtok_r(str, ",\n", &ptr); + while (tok) { + char *vl_str, *weight_str; - if (count > 64) - log_report(" Warning: Cached Option %s_vlarb_%s: " - "> 64 listed: excess vl:weight pairs " - "will be dropped\n", prefix, suffix); + vl_str = tok; + weight_str = strchr(tok, ':'); - free(str); + if (weight_str) { + long vl, weight; + + *weight_str = '\0'; + weight_str++; + + vl = strtol(vl_str, &end, 0); + + if (*end) + log_report(" Warning: Cached Option " + "%s_vlarb_%s:vl=%s" + " improperly formatted\n", + prefix, suffix, vl_str); + else if (vl < 0 || vl > 14) + log_report(" Warning: Cached Option " + "%s_vlarb_%s:vl=%ld out of range\n", + prefix, suffix, vl); + + weight = strtol(weight_str, &end, 0); + + if (*end) + log_report(" Warning: Cached Option " + "%s_vlarb_%s:weight=%s " + "improperly formatted\n", + prefix, suffix, weight_str); + else if (weight < 0 || weight > 255) + log_report(" Warning: Cached Option " + "%s_vlarb_%s:weight=%ld " + "out of range\n", + prefix, suffix, weight); + } else + log_report(" Warning: Cached Option " + "%s_vlarb_%s:vl:weight=%s " + "improperly formatted\n", + prefix, suffix, tok); + + count++; + tok = strtok_r(NULL, ",\n", &ptr); } + + if (count > 64) + log_report(" Warning: Cached Option %s_vlarb_%s: > 64 listed:" + " excess vl:weight pairs will be dropped\n", + prefix, suffix); + + free(str); } -static void subn_verify_sl2vl(char *sl2vl, const char *prefix) +static void subn_verify_sl2vl(char **sl2vl, const char *prefix, char *dflt) { - if (sl2vl) { - char *str, *tok, *end, *ptr; - int count = 0; + char *str, *tok, *end, *ptr; + int count = 0; + + if (*sl2vl == NULL) { + log_report(" Invalid Cached Option: %s_sl2vl: Using Default\n", + prefix); + *sl2vl = dflt; + return; + } - str = strdup(sl2vl); + str = strdup(*sl2vl); - tok = strtok_r(str, ",\n", &ptr); - while (tok) { - long vl = strtol(tok, &end, 0); + tok = strtok_r(str, ",\n", &ptr); + while (tok) { + long vl = strtol(tok, &end, 0); - if (*end) - log_report(" Warning: Cached Option %s_sl2vl:" - "vl=%s improperly formatted\n", - prefix, tok); - else if (vl < 0 || vl > 15) - log_report(" Warning: Cached Option %s_sl2vl:" - "vl=%ld out of range\n", - prefix, vl); - - count++; - tok = strtok_r(NULL, ",\n", &ptr); - } + if (*end) + log_report(" Warning: Cached Option %s_sl2vl:vl=%s " + "improperly formatted\n", prefix, tok); + else if (vl < 0 || vl > 15) + log_report(" Warning: Cached Option %s_sl2vl:vl=%ld " + "out of range\n", prefix, vl); - if (count < 16) - log_report(" Warning: Cached Option %s_sl2vl: < 16 VLs " - "listed\n", prefix); + count++; + tok = strtok_r(NULL, ",\n", &ptr); + } - if (count > 16) - log_report(" Warning: Cached Option %s_sl2vl: " - "> 16 listed: excess VLs will be dropped\n", - prefix); + if (count < 16) + log_report(" Warning: Cached Option %s_sl2vl: < 16 VLs " + "listed\n", prefix); - free(str); - } + if (count > 16) + log_report(" Warning: Cached Option %s_sl2vl: > 16 listed: " + "excess VLs will be dropped\n", prefix); + + free(str); } -static void subn_verify_qos_set(osm_qos_options_t *set, const char *prefix) +static void subn_verify_qos_set(osm_qos_options_t *set, const char *prefix, + osm_qos_options_t *dflt) { - subn_verify_max_vls(&set->max_vls, prefix); - subn_verify_high_limit(&set->high_limit, prefix); - subn_verify_vlarb(set->vlarb_low, prefix, "low"); - subn_verify_vlarb(set->vlarb_high, prefix, "high"); - subn_verify_sl2vl(set->sl2vl, prefix); + subn_verify_max_vls(&set->max_vls, prefix, dflt->max_vls); + subn_verify_high_limit(&set->high_limit, prefix, dflt->high_limit); + subn_verify_vlarb(&set->vlarb_low, prefix, "low", dflt->vlarb_low); + subn_verify_vlarb(&set->vlarb_high, prefix, "high", dflt->vlarb_high); + subn_verify_sl2vl(&set->sl2vl, prefix, dflt->sl2vl); } static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) @@ -957,11 +985,24 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) } if (p_opts->qos) { - subn_verify_qos_set(&p_opts->qos_options, "qos"); - subn_verify_qos_set(&p_opts->qos_ca_options, "qos_ca"); - subn_verify_qos_set(&p_opts->qos_sw0_options, "qos_sw0"); - subn_verify_qos_set(&p_opts->qos_swe_options, "qos_swe"); - subn_verify_qos_set(&p_opts->qos_rtr_options, "qos_rtr"); + osm_qos_options_t dflt; + + /* the default options in qos_options must be correct. + * every other one need not be, b/c those will default + * back to whatever is in qos_options. + */ + + subn_set_default_qos_options(&dflt); + + subn_verify_qos_set(&p_opts->qos_options, "qos", &dflt); + subn_verify_qos_set(&p_opts->qos_ca_options, "qos_ca", + &p_opts->qos_options); + subn_verify_qos_set(&p_opts->qos_sw0_options, "qos_sw0", + &p_opts->qos_options); + subn_verify_qos_set(&p_opts->qos_swe_options, "qos_swe", + &p_opts->qos_options); + subn_verify_qos_set(&p_opts->qos_rtr_options, "qos_rtr", + &p_opts->qos_options); } #ifdef ENABLE_OSM_PERF_MGR @@ -1267,30 +1308,31 @@ int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) return -1; } + subn_init_qos_options(&p_subn->opt.qos_options); + subn_init_qos_options(&p_subn->opt.qos_ca_options); + subn_init_qos_options(&p_subn->opt.qos_sw0_options); + subn_init_qos_options(&p_subn->opt.qos_swe_options); + subn_init_qos_options(&p_subn->opt.qos_rtr_options); + while (fgets(line, 1023, opts_file) != NULL) { /* get the first token */ p_key = strtok_r(line, " \t\n", &p_last); if (p_key) { p_val = strtok_r(NULL, " \t\n", &p_last); - subn_parse_qos_options("qos", - p_key, p_val, + subn_parse_qos_options("qos", p_key, p_val, &p_subn->opt.qos_options); - subn_parse_qos_options("qos_ca", - p_key, p_val, + subn_parse_qos_options("qos_ca", p_key, p_val, &p_subn->opt.qos_ca_options); - subn_parse_qos_options("qos_sw0", - p_key, p_val, + subn_parse_qos_options("qos_sw0", p_key, p_val, &p_subn->opt.qos_sw0_options); - subn_parse_qos_options("qos_swe", - p_key, p_val, + subn_parse_qos_options("qos_swe", p_key, p_val, &p_subn->opt.qos_swe_options); - subn_parse_qos_options("qos_rtr", - p_key, p_val, + subn_parse_qos_options("qos_rtr", p_key, p_val, &p_subn->opt.qos_rtr_options); } -- 1.6.0.3.517.g759a From sashak at voltaire.com Wed Nov 12 16:39:12 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 13 Nov 2008 02:39:12 +0200 Subject: [ofa-general] opensm: bad multicast forwarding table entries In-Reply-To: <20081112235431.GG25248@sgi.com> References: <20081112221846.GE25248@sgi.com> <20081112235431.GG25248@sgi.com> Message-ID: <20081113003912.GJ27271@sashak.voltaire.com> On 15:54 Wed 12 Nov , akepner at sgi.com wrote: > > ..... > > > So far, so good. But we also have r4i2n10, connected to the switch with > > > lid 1533 port 7: > > > > > > switchguid=0x800690000002e50(800690000002e50) > > > Switch 24 "S-0800690000002e50" # "MT47396 Infiniscale-III Mellanox Technologies" base port 0 lid 1533 lmc 0 > > > ...... > > > [7] "H-003048c2438a0000"[1](3048c2438a0001) # "r4i2n10 HCA-1" lid 771 4xDDR > > > > > > with this mft entry: > > > > > > Multicast mlids [0xc000-0xc3ff] of switch Lid 1533 guid 0x0800690000002e50 (MT47396 Infiniscale-III Mellanox Technologies): > > > 0 1 2 > > > Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 > > > MLid > > > ..... > > > 0xc069 x > > > > > > Any idea why "r4i2n10", with PortGid fe80::3048c2438a0001 would have a > > > mft entry for the multicast group with MGID ff12601bffff::1ff26d289? Any chance that port "r4i2n10" joins MGID ff12601bffff::1ff26d289 as non-member? You can run OpenSM with -V flag and track all joins. Sasha From hal.rosenstock at gmail.com Wed Nov 12 18:08:09 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Wed, 12 Nov 2008 21:08:09 -0500 Subject: ***SPAM*** Re: [ofa-general] opensm: bad multicast forwarding table entries In-Reply-To: <20081113003912.GJ27271@sashak.voltaire.com> References: <20081112221846.GE25248@sgi.com> <20081112235431.GG25248@sgi.com> <20081113003912.GJ27271@sashak.voltaire.com> Message-ID: On Wed, Nov 12, 2008 at 7:39 PM, Sasha Khapyorsky wrote: > On 15:54 Wed 12 Nov , akepner at sgi.com wrote: >> > ..... >> > > So far, so good. But we also have r4i2n10, connected to the switch with >> > > lid 1533 port 7: >> > > >> > > switchguid=0x800690000002e50(800690000002e50) >> > > Switch 24 "S-0800690000002e50" # "MT47396 Infiniscale-III Mellanox Technologies" base port 0 lid 1533 lmc 0 >> > > ...... >> > > [7] "H-003048c2438a0000"[1](3048c2438a0001) # "r4i2n10 HCA-1" lid 771 4xDDR >> > > >> > > with this mft entry: >> > > >> > > Multicast mlids [0xc000-0xc3ff] of switch Lid 1533 guid 0x0800690000002e50 (MT47396 Infiniscale-III Mellanox Technologies): >> > > 0 1 2 >> > > Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 >> > > MLid >> > > ..... >> > > 0xc069 x >> > > >> > > Any idea why "r4i2n10", with PortGid fe80::3048c2438a0001 would have a >> > > mft entry for the multicast group with MGID ff12601bffff::1ff26d289? > > Any chance that port "r4i2n10" joins MGID ff12601bffff::1ff26d289 as > non-member? Wouldn't saquery -m show this member too ? Arthur said there was only 1 member indicated. -- Hal > You can run OpenSM with -V flag and track all joins. > > Sasha > From sashak at voltaire.com Wed Nov 12 18:14:03 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 13 Nov 2008 04:14:03 +0200 Subject: [ofa-general] [PATCH] opensm/osm_sa_mcmember_record: return a real port JoinState on update Message-ID: <20081113021403.GK27271@sashak.voltaire.com> When port JoinState is updated by MCMember leave request response should have a real (new) JoinState. This fix addresses bug#1373. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_sa_mcmember_record.c | 4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c index 878d21e..4ca5896 100644 --- a/opensm/opensm/osm_sa_mcmember_record.c +++ b/opensm/opensm/osm_sa_mcmember_record.c @@ -1095,12 +1095,10 @@ __osm_mcmr_rcv_leave_mgrp(IN osm_sa_t * sa, goto Exit; } - mcmember_rec.scope_state = p_mcm_port->scope_state; /* remove port or update join state */ removed = osm_mgrp_remove_port(sa->p_subn, sa->p_log, p_mgrp, p_mcm_port, p_recvd_mcmember_rec->scope_state&0x0F); - if (removed) - mcmember_rec.scope_state = p_mcm_port->scope_state; + mcmember_rec.scope_state = p_mcm_port->scope_state; CL_PLOCK_RELEASE(sa->p_lock); -- 1.6.0.3.517.g759a From sashak at voltaire.com Wed Nov 12 18:34:51 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 13 Nov 2008 04:34:51 +0200 Subject: [ofa-general] opensm: bad multicast forwarding table entries In-Reply-To: References: <20081112221846.GE25248@sgi.com> <20081112235431.GG25248@sgi.com> <20081113003912.GJ27271@sashak.voltaire.com> Message-ID: <20081113023451.GL27271@sashak.voltaire.com> On 21:08 Wed 12 Nov , Hal Rosenstock wrote: > > > > Any chance that port "r4i2n10" joins MGID ff12601bffff::1ff26d289 as > > non-member? > > Wouldn't saquery -m show this member too ? Arthur said there was only > 1 member indicated. Yes, I think you are right and it should. Need to check although. Sasha From ogerlitz at voltaire.com Wed Nov 12 23:20:57 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 13 Nov 2008 09:20:57 +0200 (IST) Subject: [ofa-general] rate assignment for path queries Message-ID: Hi Yevgeny, If opensm doesn't have a match on any qos-assignment rule (eg when there's no qos-config file), when coming to serve sa path query, my understanding is that the "qos related fields" of the partition would be used. For example, I have set the following partition config file which assigns to the 0x8001 partition, and run without any qos file. Default=0x7fff,ipoib : ALL=full; RED=0x8001, ipoib, sl=1, rate=2, defmember=full : ALL=full; RED=0x8002, ipoib, sl=2, rate=3, defmember=full : ALL=full; When a path query is issued, Indeed sl=1 is returned but I see that a rate=6 (20Gbs) is returned where I configured rate=2 (2.5 Gbs). Have I done anything wrong? is it a known issue? what does it means when the SM prints "min rate = 6" Or. Nov 13 02:12:49 219374 [42803940] 0x08 -> PathRecord dump: service_id..............0x0000000000000000 dgid....................0xfe80000000000000 : 0x0002c90300026be7 sgid....................0xfe80000000000000 : 0x0002c90300026be3 dlid....................0x0 slid....................0x0 hop_flow_raw............0x0 tclass..................0x0 num_path_revers.........0x1 pkey....................0x8001 qos_class...............0x0 sl......................0x0 mtu.....................0x3 rate....................0x0 pkt_life................0x0 preference..............0x0 resv2...................0x0 resv3...................0x0 Nov 13 02:12:49 219386 [42803940] 0x10 -> __osm_pr_rcv_check_mcast_dest: [ Nov 13 02:12:49 219390 [42803940] 0x10 -> __osm_pr_rcv_check_mcast_dest: ] Nov 13 02:12:49 219394 [42803940] 0x08 -> osm_pr_rcv_process: Unicast destination requested Nov 13 02:12:49 219398 [42803940] 0x10 -> __osm_pr_rcv_get_end_points: [ Nov 13 02:12:49 219403 [42803940] 0x10 -> __osm_pr_rcv_get_end_points: ] Nov 13 02:12:49 219407 [42803940] 0x10 -> __osm_pr_rcv_process_pair: [ Nov 13 02:12:49 219411 [42803940] 0x10 -> __osm_pr_rcv_get_port_pair_paths: [ Nov 13 02:12:49 219415 [42803940] 0x08 -> __osm_pr_rcv_get_port_pair_paths: Src port 0x0002c90300026be3, Dst port 0x0002c90300026be7 Nov 13 02:12:49 219420 [42803940] 0x10 -> osm_port_share_pkey: [ Nov 13 02:12:49 219424 [42803940] 0x10 -> osm_port_share_pkey: ] Nov 13 02:12:49 219428 [42803940] 0x10 -> osm_port_share_pkey: [ Nov 13 02:12:49 219432 [42803940] 0x10 -> osm_port_share_pkey: ] Nov 13 02:12:49 219436 [42803940] 0x10 -> osm_port_share_pkey: [ Nov 13 02:12:49 219440 [42803940] 0x10 -> osm_port_share_pkey: ] Nov 13 02:12:49 219444 [42803940] 0x08 -> __osm_pr_rcv_get_port_pair_paths: Src LIDs [0x7-0x7], Dest LIDs [0x8-0x8] Nov 13 02:12:49 219449 [42803940] 0x10 -> __osm_pr_rcv_get_lid_pair_path: [ Nov 13 02:12:49 219453 [42803940] 0x08 -> __osm_pr_rcv_get_lid_pair_path: Src LID 0x7, Dest LID 0x8 Nov 13 02:12:49 219458 [42803940] 0x10 -> __osm_pr_rcv_get_path_parms: [ Nov 13 02:12:49 219464 [42803940] 0x08 -> __osm_pr_rcv_get_path_parms: Path min MTU = 4, min rate = 6 Nov 13 02:12:49 219471 [42803940] 0x08 -> __osm_pr_rcv_get_path_parms: Path params: mtu = 4, rate = 6, packet lifetime = 18, pkey = 0x8001, sl = 1 Nov 13 02:12:49 219476 [42803940] 0x10 -> __osm_pr_rcv_get_path_parms: ] Nov 13 02:12:49 219480 [42803940] 0x10 -> __osm_pr_rcv_get_path_parms: [ Nov 13 02:12:49 219484 [42803940] 0x08 -> __osm_pr_rcv_get_path_parms: Path min MTU = 4, min rate = 6 Nov 13 02:12:49 219489 [42803940] 0x08 -> __osm_pr_rcv_get_path_parms: Path params: mtu = 4, rate = 6, packet lifetime = 18, pkey = 0x8001, sl = 1 From kliteyn at dev.mellanox.co.il Thu Nov 13 01:02:54 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 13 Nov 2008 11:02:54 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_sa_mcmember_record: return a real port JoinState on update In-Reply-To: <20081113021403.GK27271@sashak.voltaire.com> References: <20081113021403.GK27271@sashak.voltaire.com> Message-ID: <491BED3E.3060104@dev.mellanox.co.il> Hi Sasha, Sasha Khapyorsky wrote: > When port JoinState is updated by MCMember leave request response should > have a real (new) JoinState. This fix addresses bug#1373. > > Signed-off-by: Sasha Khapyorsky > --- > opensm/opensm/osm_sa_mcmember_record.c | 4 +--- > 1 files changed, 1 insertions(+), 3 deletions(-) > > diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c > index 878d21e..4ca5896 100644 > --- a/opensm/opensm/osm_sa_mcmember_record.c > +++ b/opensm/opensm/osm_sa_mcmember_record.c > @@ -1095,12 +1095,10 @@ __osm_mcmr_rcv_leave_mgrp(IN osm_sa_t * sa, > goto Exit; > } > > - mcmember_rec.scope_state = p_mcm_port->scope_state; > /* remove port or update join state */ > removed = osm_mgrp_remove_port(sa->p_subn, sa->p_log, p_mgrp, p_mcm_port, > p_recvd_mcmember_rec->scope_state&0x0F); > - if (removed) > - mcmember_rec.scope_state = p_mcm_port->scope_state; > + mcmember_rec.scope_state = p_mcm_port->scope_state; I did the exact same fix last night :) -- Yevgeny > CL_PLOCK_RELEASE(sa->p_lock); > From dorfman.eli at gmail.com Thu Nov 13 01:18:07 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Thu, 13 Nov 2008 11:18:07 +0200 Subject: ***SPAM*** [ofa-general] [PATCH] opensm/osm_sa_path_record.c print port guids in error message Message-ID: <491BF0CF.4060306@gmail.com> print port guids in error message when there is no shared pkey between the ports. Signed-off-by: Eli Dorfman --- opensm/opensm/osm_sa_path_record.c | 15 ++++++++++++--- 1 files changed, 12 insertions(+), 3 deletions(-) diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c index fc425d5..b100384 100644 --- a/opensm/opensm/osm_sa_path_record.c +++ b/opensm/opensm/osm_sa_path_record.c @@ -596,7 +596,10 @@ __osm_pr_rcv_get_path_parms(IN osm_sa_t * sa, pkey = p_pr->pkey; if (!osm_physp_share_this_pkey(p_src_physp, p_dest_physp, pkey)) { OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F1A: " - "Ports do not share specified PKey 0x%04x\n", + "Ports 0x%016" PRIx64 " 0x%016" PRIx64 + " do not share specified PKey 0x%04x\n", + cl_ntoh64(osm_physp_get_port_guid(p_src_physp)), + cl_ntoh64(osm_physp_get_port_guid(p_dest_physp)), cl_ntoh16(pkey)); status = IB_NOT_FOUND; goto Exit; @@ -618,7 +621,10 @@ __osm_pr_rcv_get_path_parms(IN osm_sa_t * sa, p_src_physp, p_dest_physp); if (!pkey) { OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F1E: " - "Ports do not share PKeys defined by QoS level\n"); + "Ports 0x%016" PRIx64 " 0x%016" PRIx64 + " do not share PKeys defined by QoS level\n", + cl_ntoh64(osm_physp_get_port_guid(p_src_physp)), + cl_ntoh64(osm_physp_get_port_guid(p_dest_physp))); status = IB_NOT_FOUND; goto Exit; } @@ -630,7 +636,10 @@ __osm_pr_rcv_get_path_parms(IN osm_sa_t * sa, pkey = osm_physp_find_common_pkey(p_src_physp, p_dest_physp); if (!pkey) { OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F1B: " - "Ports do not have any shared PKeys\n"); + "Ports 0x%016" PRIx64 " 0x%016" PRIx64 + " do not have any shared PKeys\n", + cl_ntoh64(osm_physp_get_port_guid(p_src_physp)), + cl_ntoh64(osm_physp_get_port_guid(p_dest_physp))); status = IB_NOT_FOUND; goto Exit; } -- 1.5.5 From ogerlitz at voltaire.com Thu Nov 13 02:37:55 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 13 Nov 2008 12:37:55 +0200 (IST) Subject: [ofa-general] using qos_X vs qos_ca_X / qos_swe_X directives Message-ID: Hi Yevgeny, I noted that when I use qos_X directives in the opensm config file, they are not applied by the SM on the fabric, but rather the "default values (hard-coded in OpenSM initialization)". When I use qos_ca_X and qos_swe_X directives, they are applied on the fabric. I have checked this with both 3.1.11 or 3.2.2 (in their ofed 1.3.1 / ofed 1.4 form). Or. E.g try qos_max_vls 4 qos_high_limit 255 qos_vlarb_high 0:128,1:64,2:32 qos_vlarb_low 0:1,1:2,2:3,3:4 qos_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 vs qos_ca_max_vls 4 qos_ca_high_limit 255 qos_ca_vlarb_high 0:128,1:64,2:32 qos_ca_vlarb_low 0:1,1:2,2:3,3:4 qos_ca_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 From vlad at lists.openfabrics.org Thu Nov 13 03:18:45 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Thu, 13 Nov 2008 03:18:45 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081113-0200 daily build status Message-ID: <20081113111845.C6FFEE60323@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From sashak at voltaire.com Thu Nov 13 04:23:26 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 13 Nov 2008 14:23:26 +0200 Subject: [ofa-general] [PATCH] opensm/osm_sa_path_record.c print port guids in error message In-Reply-To: <491BF0CF.4060306@gmail.com> References: <491BF0CF.4060306@gmail.com> Message-ID: <20081113122326.GT27271@sashak.voltaire.com> On 11:18 Thu 13 Nov , Eli Dorfman wrote: > print port guids in error message when there is no shared pkey between the ports. > > Signed-off-by: Eli Dorfman Applied. Thanks. Sasha From sashak at voltaire.com Thu Nov 13 05:17:03 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 13 Nov 2008 15:17:03 +0200 Subject: [ofa-general] Re: rate assignment for path queries In-Reply-To: References: Message-ID: <20081113131703.GV27271@sashak.voltaire.com> Hi Or, On 09:20 Thu 13 Nov , Or Gerlitz wrote: > > If opensm doesn't have a match on any qos-assignment rule (eg when there's > no qos-config file), when coming to serve sa path query, my understanding > is that the "qos related fields" of the partition would be used. > > For example, I have set the following partition config file which assigns > to the 0x8001 partition, and run without any qos file. > > Default=0x7fff,ipoib : ALL=full; > RED=0x8001, ipoib, sl=1, rate=2, defmember=full : ALL=full; > RED=0x8002, ipoib, sl=2, rate=3, defmember=full : ALL=full; > > When a path query is issued, Indeed sl=1 is returned but I see that a > rate=6 (20Gbs) is returned where I configured rate=2 (2.5 Gbs). For my best knowledge rate=2 in partition config file will be related to corresponded IPoIB multicast group for this partition, and not to PathRecord. In PathRecord you get maximum available rate on the requested path. > Have I done anything wrong? is it a known issue? what does it means > when the SM prints "min rate = 6" Here "min rate" means minimal common rate on the path. Sasha > > Or. > > > Nov 13 02:12:49 219374 [42803940] 0x08 -> PathRecord dump: > service_id..............0x0000000000000000 > dgid....................0xfe80000000000000 : 0x0002c90300026be7 > sgid....................0xfe80000000000000 : 0x0002c90300026be3 > dlid....................0x0 > slid....................0x0 > hop_flow_raw............0x0 > tclass..................0x0 > num_path_revers.........0x1 > pkey....................0x8001 > qos_class...............0x0 > sl......................0x0 > mtu.....................0x3 > rate....................0x0 > pkt_life................0x0 > preference..............0x0 > resv2...................0x0 > resv3...................0x0 > Nov 13 02:12:49 219386 [42803940] 0x10 -> __osm_pr_rcv_check_mcast_dest: [ > Nov 13 02:12:49 219390 [42803940] 0x10 -> __osm_pr_rcv_check_mcast_dest: ] > Nov 13 02:12:49 219394 [42803940] 0x08 -> osm_pr_rcv_process: Unicast destination requested > Nov 13 02:12:49 219398 [42803940] 0x10 -> __osm_pr_rcv_get_end_points: [ > Nov 13 02:12:49 219403 [42803940] 0x10 -> __osm_pr_rcv_get_end_points: ] > Nov 13 02:12:49 219407 [42803940] 0x10 -> __osm_pr_rcv_process_pair: [ > Nov 13 02:12:49 219411 [42803940] 0x10 -> __osm_pr_rcv_get_port_pair_paths: [ > Nov 13 02:12:49 219415 [42803940] 0x08 -> __osm_pr_rcv_get_port_pair_paths: Src port 0x0002c90300026be3, Dst port 0x0002c90300026be7 > Nov 13 02:12:49 219420 [42803940] 0x10 -> osm_port_share_pkey: [ > Nov 13 02:12:49 219424 [42803940] 0x10 -> osm_port_share_pkey: ] > Nov 13 02:12:49 219428 [42803940] 0x10 -> osm_port_share_pkey: [ > Nov 13 02:12:49 219432 [42803940] 0x10 -> osm_port_share_pkey: ] > Nov 13 02:12:49 219436 [42803940] 0x10 -> osm_port_share_pkey: [ > Nov 13 02:12:49 219440 [42803940] 0x10 -> osm_port_share_pkey: ] > Nov 13 02:12:49 219444 [42803940] 0x08 -> __osm_pr_rcv_get_port_pair_paths: Src LIDs [0x7-0x7], Dest LIDs [0x8-0x8] > Nov 13 02:12:49 219449 [42803940] 0x10 -> __osm_pr_rcv_get_lid_pair_path: [ > Nov 13 02:12:49 219453 [42803940] 0x08 -> __osm_pr_rcv_get_lid_pair_path: Src LID 0x7, Dest LID 0x8 > Nov 13 02:12:49 219458 [42803940] 0x10 -> __osm_pr_rcv_get_path_parms: [ > Nov 13 02:12:49 219464 [42803940] 0x08 -> __osm_pr_rcv_get_path_parms: Path min MTU = 4, min rate = 6 > Nov 13 02:12:49 219471 [42803940] 0x08 -> __osm_pr_rcv_get_path_parms: Path params: mtu = 4, rate = 6, packet lifetime = 18, pkey = 0x8001, sl = 1 > Nov 13 02:12:49 219476 [42803940] 0x10 -> __osm_pr_rcv_get_path_parms: ] > Nov 13 02:12:49 219480 [42803940] 0x10 -> __osm_pr_rcv_get_path_parms: [ > Nov 13 02:12:49 219484 [42803940] 0x08 -> __osm_pr_rcv_get_path_parms: Path min MTU = 4, min rate = 6 > Nov 13 02:12:49 219489 [42803940] 0x08 -> __osm_pr_rcv_get_path_parms: Path params: mtu = 4, rate = 6, packet lifetime = 18, pkey = 0x8001, sl = 1 From sashak at voltaire.com Thu Nov 13 05:22:36 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 13 Nov 2008 15:22:36 +0200 Subject: [ofa-general] using qos_X vs qos_ca_X / qos_swe_X directives In-Reply-To: References: Message-ID: <20081113132236.GW27271@sashak.voltaire.com> On 12:37 Thu 13 Nov , Or Gerlitz wrote: > Hi Yevgeny, > > I noted that when I use qos_X directives in the opensm config file, they are not > applied by the SM on the fabric, but rather the "default values (hard-coded in > OpenSM initialization)". When I use qos_ca_X and qos_swe_X directives, they > are applied on the fabric. I have checked this with both 3.1.11 or 3.2.2 > (in their ofed 1.3.1 / ofed 1.4 form). Yes, this is a "feature" (bug). We are discussing this right now in the thread http://lists.openfabrics.org/pipermail/general/2008-November/055394.html Sasha > > Or. > > E.g > > try > > qos_max_vls 4 > qos_high_limit 255 > qos_vlarb_high 0:128,1:64,2:32 > qos_vlarb_low 0:1,1:2,2:3,3:4 > qos_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 > > vs > > qos_ca_max_vls 4 > qos_ca_high_limit 255 > qos_ca_vlarb_high 0:128,1:64,2:32 > qos_ca_vlarb_low 0:1,1:2,2:3,3:4 > qos_ca_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From hal.rosenstock at gmail.com Thu Nov 13 05:38:57 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 13 Nov 2008 08:38:57 -0500 Subject: [ofa-general] Re: [PATCH] opensm/osm_sa_mcmember_record: return a real port JoinState on update In-Reply-To: <20081113021403.GK27271@sashak.voltaire.com> References: <20081113021403.GK27271@sashak.voltaire.com> Message-ID: On Wed, Nov 12, 2008 at 9:14 PM, Sasha Khapyorsky wrote: > > When port JoinState is updated by MCMember leave request response should > have a real (new) JoinState. This fix addresses bug#1373. > > Signed-off-by: Sasha Khapyorsky > --- > opensm/opensm/osm_sa_mcmember_record.c | 4 +--- > 1 files changed, 1 insertions(+), 3 deletions(-) > > diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c > index 878d21e..4ca5896 100644 > --- a/opensm/opensm/osm_sa_mcmember_record.c > +++ b/opensm/opensm/osm_sa_mcmember_record.c > @@ -1095,12 +1095,10 @@ __osm_mcmr_rcv_leave_mgrp(IN osm_sa_t * sa, > goto Exit; > } > > - mcmember_rec.scope_state = p_mcm_port->scope_state; > /* remove port or update join state */ > removed = osm_mgrp_remove_port(sa->p_subn, sa->p_log, p_mgrp, p_mcm_port, > p_recvd_mcmember_rec->scope_state&0x0F); > - if (removed) > - mcmember_rec.scope_state = p_mcm_port->scope_state; > + mcmember_rec.scope_state = p_mcm_port->scope_state; In looking at this, this is really only compliant if done for trusted requests (and there are other trust issues with SA MCMemberRecord). This issue clearly predates the patch. -- Hal > > CL_PLOCK_RELEASE(sa->p_lock); > > -- > 1.6.0.3.517.g759a > > From hal.rosenstock at gmail.com Thu Nov 13 05:39:51 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 13 Nov 2008 08:39:51 -0500 Subject: ***SPAM*** [ofa-general] [PATCH] opensm/osm_sa_path_record.c print port guids in error message In-Reply-To: <491BF0CF.4060306@gmail.com> References: <491BF0CF.4060306@gmail.com> Message-ID: On Thu, Nov 13, 2008 at 4:18 AM, Eli Dorfman wrote: > print port guids in error message when there is no shared pkey between the ports. > > Signed-off-by: Eli Dorfman > --- > opensm/opensm/osm_sa_path_record.c | 15 ++++++++++++--- > 1 files changed, 12 insertions(+), 3 deletions(-) > > diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c > index fc425d5..b100384 100644 > --- a/opensm/opensm/osm_sa_path_record.c > +++ b/opensm/opensm/osm_sa_path_record.c > @@ -596,7 +596,10 @@ __osm_pr_rcv_get_path_parms(IN osm_sa_t * sa, > pkey = p_pr->pkey; > if (!osm_physp_share_this_pkey(p_src_physp, p_dest_physp, pkey)) { > OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F1A: " > - "Ports do not share specified PKey 0x%04x\n", > + "Ports 0x%016" PRIx64 " 0x%016" PRIx64 > + " do not share specified PKey 0x%04x\n", > + cl_ntoh64(osm_physp_get_port_guid(p_src_physp)), > + cl_ntoh64(osm_physp_get_port_guid(p_dest_physp)), > cl_ntoh16(pkey)); > status = IB_NOT_FOUND; > goto Exit; > @@ -618,7 +621,10 @@ __osm_pr_rcv_get_path_parms(IN osm_sa_t * sa, > p_src_physp, p_dest_physp); > if (!pkey) { > OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F1E: " > - "Ports do not share PKeys defined by QoS level\n"); > + "Ports 0x%016" PRIx64 " 0x%016" PRIx64 > + " do not share PKeys defined by QoS level\n", > + cl_ntoh64(osm_physp_get_port_guid(p_src_physp)), > + cl_ntoh64(osm_physp_get_port_guid(p_dest_physp))); > status = IB_NOT_FOUND; > goto Exit; > } > @@ -630,7 +636,10 @@ __osm_pr_rcv_get_path_parms(IN osm_sa_t * sa, > pkey = osm_physp_find_common_pkey(p_src_physp, p_dest_physp); > if (!pkey) { > OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F1B: " > - "Ports do not have any shared PKeys\n"); > + "Ports 0x%016" PRIx64 " 0x%016" PRIx64 > + " do not have any shared PKeys\n", > + cl_ntoh64(osm_physp_get_port_guid(p_src_physp)), > + cl_ntoh64(osm_physp_get_port_guid(p_dest_physp))); > status = IB_NOT_FOUND; > goto Exit; > } A nit but IMO these messages would best be consistent with the ones which are similar in osm_sa_multipath_record.c -- Hal > 1.5.5 > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From ogerlitz at voltaire.com Thu Nov 13 05:43:26 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 13 Nov 2008 15:43:26 +0200 Subject: [ofa-general] Re: rate assignment for path queries In-Reply-To: <20081113131703.GV27271@sashak.voltaire.com> References: <20081113131703.GV27271@sashak.voltaire.com> Message-ID: <491C2EFE.4060900@voltaire.com> Sasha Khapyorsky wrote: >> RED=0x8001, ipoib, sl=1, rate=2, defmember=full : ALL=full; >> >> When a path query is issued, Indeed sl=1 is returned but I see that a >> rate=6 (20Gbs) is returned where I configured rate=2 (2.5 Gbs). > For my best knowledge rate=2 in partition config file will be related to corresponded IPoIB multicast group for this partition, and not to PathRecord. In PathRecord you get maximum available rate on the requested path. I understand your comment about the relation to multicast join and not path queries. However, currently, where there's no rule in the qos-config file (or no file) that matches the path query, the SM does provide the SL assigned to the partition (specified in the query) through the pkey file but it doesn't do so for the Rate. So you say that for QoS = assignment one should use the qos-policy file, let it be. Or. From kliteyn at dev.mellanox.co.il Thu Nov 13 06:23:25 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 13 Nov 2008 16:23:25 +0200 Subject: [ofa-general] [PATCH] osmtest/osmt_multicast.c: some refinements to the multicast flow Message-ID: <491C385D.9090909@dev.mellanox.co.il> Hi Sasha, Here are some osmtest refinements (multicast flow) that I did while debugging the recent two multicast bugs in opensm: some comments fixes, creating a group that was removed because last full member left, and adding one query to check that invalid delete request really fails. Signed-off-by: Yevgeny Kliteynik --- opensm/osmtest/osmt_multicast.c | 64 ++++++++++++++++++++++++++++++++++---- 1 files changed, 57 insertions(+), 7 deletions(-) diff --git a/opensm/osmtest/osmt_multicast.c b/opensm/osmtest/osmt_multicast.c index a397142..57a8772 100644 --- a/opensm/osmtest/osmt_multicast.c +++ b/opensm/osmtest/osmt_multicast.c @@ -1813,7 +1813,7 @@ ib_api_status_t osmt_run_mcast_flow(IN osmtest_t * const p_osmt) /* Lets try another valid join scope state */ OSM_LOG(&p_osmt->log, OSM_LOG_INFO, - "Checking new MGID creation with valid join state (o15.0.1.9)...\n"); + "Checking new MGID creation with valid join state (o15.0.2.3)...\n"); mc_req_rec.mgid = good_mgid; mc_req_rec.mgid.raw[12] = 0xFB; @@ -1853,7 +1853,7 @@ ib_api_status_t osmt_run_mcast_flow(IN osmtest_t * const p_osmt) IB_MCR_COMPMASK_MGID | IB_MCR_COMPMASK_PORT_GID | IB_MCR_COMPMASK_JOIN_STATE; - status = osmt_send_mcast_request(p_osmt, 0x1, /* User Defined query */ + status = osmt_send_mcast_request(p_osmt, 0x1, /* SubnAdmSet */ &mc_req_rec, comp_mask, &res_sa_mad); if (status != IB_SUCCESS) { OSM_LOG(&p_osmt->log, OSM_LOG_ERROR, "ERR 02CC: " @@ -1862,6 +1862,16 @@ ib_api_status_t osmt_run_mcast_flow(IN osmtest_t * const p_osmt) goto Exit; } + p_mc_res = ib_sa_mad_get_payload_ptr(&res_sa_mad); + if ((p_mc_res->scope_state & 0x7) != 0x7) { + OSM_LOG(&p_osmt->log, OSM_LOG_ERROR, "ERR 02D0: " + "Validating JoinState update failed. " + "Expected 0x27 got 0x%02X\n", + p_mc_res->scope_state); + status = IB_ERROR; + goto Exit; + } + /* o15.0.1.11: */ /* - Try to join into a MGID that exists with JoinState=SendOnlyMember - */ /* see that it updates JoinState. What is the routing change? */ @@ -1869,12 +1879,24 @@ ib_api_status_t osmt_run_mcast_flow(IN osmtest_t * const p_osmt) "Checking Retry of existing MGID - See JoinState update (o15.0.1.11)...\n"); mc_req_rec.mgid = good_mgid; - mc_req_rec.scope_state = 0x22; /* link-local scope, send only member */ + /* first, make sure that the group exists */ + mc_req_rec.scope_state = 0x21; status = osmt_send_mcast_request(p_osmt, 1, &mc_req_rec, comp_mask, &res_sa_mad); if (status != IB_SUCCESS) { OSM_LOG(&p_osmt->log, OSM_LOG_ERROR, "ERR 02CD: " + "Failed to create/join as full member - got %s/%s\n", + ib_get_err_str(status), + ib_get_mad_status_str((ib_mad_t *) (&res_sa_mad))); + goto Exit; + } + + mc_req_rec.scope_state = 0x22; /* link-local scope, non-member */ + status = osmt_send_mcast_request(p_osmt, 1, + &mc_req_rec, comp_mask, &res_sa_mad); + if (status != IB_SUCCESS) { + OSM_LOG(&p_osmt->log, OSM_LOG_ERROR, "ERR 02D1: " "Failed to update existing MGID - got %s/%s\n", ib_get_err_str(status), ib_get_mad_status_str((ib_mad_t *) (&res_sa_mad))); @@ -1899,15 +1921,33 @@ ib_api_status_t osmt_run_mcast_flow(IN osmtest_t * const p_osmt) mc_req_rec.rate = IB_LINK_WIDTH_ACTIVE_1X | IB_PATH_SELECTOR_GREATER_THAN << 6; mc_req_rec.mgid = good_mgid; - /* link-local scope, non member (so we should not be able to delete) */ - /* but the FullMember bit should be gone */ + OSM_LOG(&p_osmt->log, OSM_LOG_INFO, "Checking Partially delete JoinState (o15.0.1.14)...\n"); - mc_req_rec.scope_state = 0x22; + + /* link-local scope, both non-member bits, + so we should not be able to delete) */ + mc_req_rec.scope_state = 0x26; + OSM_LOG(&p_osmt->log, OSM_LOG_ERROR, EXPECTING_ERRORS_START "\n"); status = osmt_send_mcast_request(p_osmt, 0, &mc_req_rec, comp_mask, &res_sa_mad); - if ((status != IB_SUCCESS) || (p_mc_res->scope_state != 0x21)) { + OSM_LOG(&p_osmt->log, OSM_LOG_ERROR, EXPECTING_ERRORS_END "\n"); + + if (status != IB_REMOTE_ERROR) { OSM_LOG(&p_osmt->log, OSM_LOG_ERROR, "ERR 02CF: " + "Expected to fail partially update JoinState, " + "but got %s\n", + ib_get_err_str(status)); + status = IB_ERROR; + goto Exit; + } + + /* link-local scope, NonMember bit, the FullMember bit should stay */ + mc_req_rec.scope_state = 0x22; + status = osmt_send_mcast_request(p_osmt, 0, + &mc_req_rec, comp_mask, &res_sa_mad); + if (status != IB_SUCCESS) { + OSM_LOG(&p_osmt->log, OSM_LOG_ERROR, "ERR 02D3: " "Failed to partially update JoinState : %s/%s\n", ib_get_err_str(status), ib_get_mad_status_str((ib_mad_t *) (&res_sa_mad))); @@ -1915,6 +1955,16 @@ ib_api_status_t osmt_run_mcast_flow(IN osmtest_t * const p_osmt) goto Exit; } + p_mc_res = ib_sa_mad_get_payload_ptr(&res_sa_mad); + if (p_mc_res->scope_state != 0x21) { + OSM_LOG(&p_osmt->log, OSM_LOG_ERROR, "ERR 02D4: " + "Failed to partially update JoinState : " + "JoinState = 0x%02X, expected 0x%02X\n", + p_mc_res->scope_state, 0x21); + status = IB_ERROR; + goto Exit; + } + /* So far successfully delete state - Now change it */ mc_req_rec.mgid = good_mgid; mc_req_rec.scope_state = 0x24; /* link-local scope, send only member */ -- 1.5.1.4 From tziporet at dev.mellanox.co.il Thu Nov 13 07:04:16 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Thu, 13 Nov 2008 17:04:16 +0200 Subject: [ofa-general][PATCH 1/3]mlx4: Multiple completion vectors support In-Reply-To: <490DD27C.4070109@pobox.com> References: <4907348E.7060508@mellanox.co.il> <490A8FA9.7080802@pobox.com> <490DA91A.1030703@pobox.com> <490DD27C.4070109@pobox.com> Message-ID: <491C41F0.3080304@mellanox.co.il> Jeff Garzik wrote: > Roland Dreier wrote: >> In general I think I have a bigger chance of merging more mlx4_core >> stuff through my tree, so it will probably be smoother in terms of >> conflicts etc. if I carry this patch. > > > Fine by me... > What is the status of this? I know its in mlx_core but mainly needed for mlnx_en and has minimal impact on the IB side I think Roland is at new baby vacation so what is the resolution? Thanks Tziporet From dorfman.eli at gmail.com Thu Nov 13 07:51:22 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Thu, 13 Nov 2008 17:51:22 +0200 Subject: ***SPAM*** [ofa-general] [PATCH] opensm/osm_mcast_tbl.c wrong max mcast lid cause the sm to set invalid MFT block. Message-ID: <491C4CFA.8000006@gmail.com> wrong max mcast lid cause the sm to set invalid MFT block. when mcmember tries to set mcast lid beyond mcast capability (e.g. 0xc400), the sm accepts this and tries to set invalid block. Signed-off-by: Eli Dorfman --- opensm/opensm/osm_mcast_tbl.c | 6 +++--- opensm/opensm/osm_sa_mcmember_record.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/opensm/opensm/osm_mcast_tbl.c b/opensm/opensm/osm_mcast_tbl.c index 92fbb63..17fb69c 100644 --- a/opensm/opensm/osm_mcast_tbl.c +++ b/opensm/opensm/osm_mcast_tbl.c @@ -81,7 +81,7 @@ osm_mcast_tbl_init(IN osm_mcast_tbl_t * const p_tbl, IB_MCAST_BLOCK_SIZE) / IB_MCAST_BLOCK_SIZE) - 1); - p_tbl->max_mlid_ho = (uint16_t) (IB_LID_MCAST_START_HO + capacity); + p_tbl->max_mlid_ho = (uint16_t) (IB_LID_MCAST_START_HO + capacity - 1); /* The number of bytes needed in the mask table is: @@ -216,7 +216,7 @@ osm_mcast_tbl_set_block(IN osm_mcast_tbl_t * const p_tbl, mlid_start_ho = (uint16_t) (block_num * IB_MCAST_BLOCK_SIZE); - if (mlid_start_ho + IB_MCAST_BLOCK_SIZE > p_tbl->max_mlid_ho) + if (mlid_start_ho + IB_MCAST_BLOCK_SIZE - 1 > p_tbl->max_mlid_ho) return (IB_INVALID_PARAMETER); for (i = 0; i < IB_MCAST_BLOCK_SIZE; i++) @@ -274,7 +274,7 @@ osm_mcast_tbl_get_block(IN osm_mcast_tbl_t * const p_tbl, mlid_start_ho = (uint16_t) (block_num * IB_MCAST_BLOCK_SIZE); - if (mlid_start_ho + IB_MCAST_BLOCK_SIZE > p_tbl->max_mlid_ho) + if (mlid_start_ho + IB_MCAST_BLOCK_SIZE - 1 > p_tbl->max_mlid_ho) return (IB_INVALID_PARAMETER); for (i = 0; i < IB_MCAST_BLOCK_SIZE; i++) diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c index 5dd286a..6007b06 100644 --- a/opensm/opensm/osm_sa_mcmember_record.c +++ b/opensm/opensm/osm_sa_mcmember_record.c @@ -846,7 +846,7 @@ osm_mcmr_rcv_create_new_mgrp(IN osm_sa_t * sa, mlid = __get_new_mlid(sa, mcm_rec.mlid); if (mlid == 0) { OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1B19: " - "__get_new_mlid failed\n"); + "__get_new_mlid failed request mlid 0x%04x\n", mcm_rec.mlid); status = IB_SA_MAD_STATUS_NO_RESOURCES; goto Exit; } -- 1.5.5 From jackm at mellanox.co.il Thu Nov 13 08:06:32 2008 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Thu, 13 Nov 2008 18:06:32 +0200 Subject: [ofa-general] Re: [PATCH] mlx4/profile.c: fix warning res_namedefined but not used In-Reply-To: Message-ID: <5D49E7A8952DC44FB38C38FA0D758EADF195AA@mtlexch01.mtl.com> This looks fine to me. - Jack > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Roland Dreier > Sent: Tuesday, November 04, 2008 9:17 PM > To: Alexander Beregalov > Cc: general at lists.openfabrics.org > Subject: [ofa-general] Re: [PATCH] mlx4/profile.c: fix > warning res_namedefined but not used > > > Thanks. What if we fix this like the following instead -- > change mlx4_dbg so it always looks to the compiler like it > uses all its parameters? This generates the same code for > me, and looks cleaner in that it actually reduces the amount > of #ifdef'ed stuff. > --- > drivers/net/mlx4/mlx4.h | 9 +++------ > 1 files changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/mlx4/mlx4.h > b/drivers/net/mlx4/mlx4.h index fa431fa..56a2e21 100644 > --- a/drivers/net/mlx4/mlx4.h > +++ b/drivers/net/mlx4/mlx4.h > @@ -87,6 +87,9 @@ enum { > > #ifdef CONFIG_MLX4_DEBUG > extern int mlx4_debug_level; > +#else /* CONFIG_MLX4_DEBUG */ > +#define mlx4_debug_level (0) > +#endif /* CONFIG_MLX4_DEBUG */ > > #define mlx4_dbg(mdev, format, arg...) > \ > do { > \ > @@ -94,12 +97,6 @@ extern int mlx4_debug_level; > dev_printk(KERN_DEBUG, > &mdev->pdev->dev, format, ## arg); \ > } while (0) > > -#else /* CONFIG_MLX4_DEBUG */ > - > -#define mlx4_dbg(mdev, format, arg...) do { (void) mdev; } while (0) > - > -#endif /* CONFIG_MLX4_DEBUG */ > - > #define mlx4_err(mdev, format, arg...) \ > dev_err(&mdev->pdev->dev, format, ## arg) > #define mlx4_info(mdev, format, arg...) \ > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-> bin/mailman/listinfo/general > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > From sashak at voltaire.com Thu Nov 13 08:41:18 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 13 Nov 2008 18:41:18 +0200 Subject: [ofa-general] Re: [PATCH] osmtest/osmt_multicast.c: some refinements to the multicast flow In-Reply-To: <491C385D.9090909@dev.mellanox.co.il> References: <491C385D.9090909@dev.mellanox.co.il> Message-ID: <20081113164118.GY27271@sashak.voltaire.com> On 16:23 Thu 13 Nov , Yevgeny Kliteynik wrote: > Hi Sasha, > > Here are some osmtest refinements (multicast flow) that > I did while debugging the recent two multicast bugs in > opensm: some comments fixes, creating a group that was > removed because last full member left, and adding one > query to check that invalid delete request really fails. > > Signed-off-by: Yevgeny Kliteynik Applied. Thanks. Sasha From chu11 at llnl.gov Thu Nov 13 09:20:02 2008 From: chu11 at llnl.gov (Al Chu) Date: Thu, 13 Nov 2008 09:20:02 -0800 Subject: [ofa-general] Re: [opensm patch][1/2] fix qos config parsing bugs In-Reply-To: <20081113002403.GI27271@sashak.voltaire.com> References: <1225404078.1197.533.camel@cardanus.llnl.gov> <20081111191958.GA8894@sashak.voltaire.com> <1226447872.6239.2.camel@cardanus.llnl.gov> <20081113002403.GI27271@sashak.voltaire.com> Message-ID: <1226596802.7156.41.camel@cardanus.llnl.gov> Hey Sasha, On Thu, 2008-11-13 at 02:24 +0200, Sasha Khapyorsky wrote: > Hi Al, > > On 15:57 Tue 11 Nov , Al Chu wrote: > > > > Sorry, I may have not explained it well. Lets say I do this in the > > config file. > > > > qos_vlarb_high FOOBAR > > # qos_ca_vlarb_high BLAH > > qos_swe_vlarb_high XYZZY > > > > I currently expect qos_ca_vlarb_high to use the value of FOOBAR because > > I commented out the field. But it uses OSM_DEFAULT_QOS_HIGH_LIMIT > > instead. The reason is because qos_build_config() checks for NULL to > > use default vs. non-default values. > > > > p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; > > > > Under the above situation where I've commented out veral fields, opt- > > >vlarb_high is always non-NULL b/c it was set to > > OSM_DEFAULT_QOS_HIGH_LIMIT. Thus OSM_DEFAULT_QOS_HIGH_LIMIT is used > > instead of FOOBAR. > > > > > > 2) > > > > > > > > In qos_build_config() we load the high_limit like this: > > > > > > > > cfg->vl_high_limit = (uint8_t) opt->high_limit; > > > > > > > > So there is no way to tell the qos_ca, qos_swe, qos_rtr, etc. high_limit > > > > options to "go back to" the default high_limit. It just assumes that > > > > whatever is input (or was set by default) is what you should use. > > > > > > Right. What is a limitation here? That an user cannot set this to > > > "no value"? But she/he can just skip it. > > > > Similar to the above issue, lets say I want to do: > > > > qos_high_limit 8 > > # qos_ca_high_limit 15 > > # qos_swe_high_limit 15 > > > > I want qos_ca_high_limit and qos_swe_high_limit to use whatever I set in > > qos_high_limit. But the code doesn't allow for this. > > > > > > > > > 3) > > > > > > > > Some fields like qos_vlarb_high are assumed to be correctly set and can > > > > segfault opensm. > > > > > > qos_build_config() assumes that valid parameters are used. And we are > > > using this this way (I hope :)) (finally it is not library API). > > > > I think the issue is the osm_subnet.c code did not properly check all > > inputs, and subsequently some inputs used in qos_build_config() were > > bad. I think > > > > qos_vlarb_high (null) > > > > was something I tried that opensm seg-faulted on. > > Ok. I see now. > > Probably it will be simpler just to generate a valid qos parameter sets > right after parser (in verification time)? Ahh, I see what you did. It's much cleaner this way. > Like in your modified (and > rebased against recent patches) patch below? Patch looks good to me. Thanks, Al > > Sasha > > > >From a973a8a1ea6c805cf07965d86731ae04510266ce Mon Sep 17 00:00:00 2001 > From: Al Chu > Date: Mon, 10 Nov 2008 13:41:04 -0800 > Subject: [PATCH] fix qos config parsing bugs > > Signed-off-by: Albert Chu > Signed-off-by: Sasha Khapyorsky > --- > opensm/include/opensm/osm_subnet.h | 12 +- > opensm/opensm/osm_qos.c | 6 +- > opensm/opensm/osm_subnet.c | 298 ++++++++++++++++++++--------------- > 3 files changed, 181 insertions(+), 135 deletions(-) > > diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h > index a16cbce..2bcd232 100644 > --- a/opensm/include/opensm/osm_subnet.h > +++ b/opensm/include/opensm/osm_subnet.h > @@ -100,7 +100,7 @@ struct osm_qos_policy; > */ > typedef struct osm_qos_options { > unsigned max_vls; > - unsigned high_limit; > + int high_limit; > char *vlarb_high; > char *vlarb_low; > char *sl2vl; > @@ -109,20 +109,20 @@ typedef struct osm_qos_options { > * FIELDS > * > * max_vls > -* The number of maximum VLs on the Subnet > +* The number of maximum VLs on the Subnet (0 == use default) > * > * high_limit > * The limit of High Priority component of VL Arbitration > -* table (IBA 7.6.9) > +* table (IBA 7.6.9) (-1 == use default) > * > * vlarb_high > -* High priority VL Arbitration table template. > +* High priority VL Arbitration table template. (NULL == use default) > * > * vlarb_low > -* Low priority VL Arbitration table template. > +* Low priority VL Arbitration table template. (NULL == use default) > * > * sl2vl > -* SL2VL Mapping table (IBA 7.6.6) template. > +* SL2VL Mapping table (IBA 7.6.6) template. (NULL == use default) > * > *********/ > > diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c > index 1679ae0..b451c25 100644 > --- a/opensm/opensm/osm_qos.c > +++ b/opensm/opensm/osm_qos.c > @@ -382,7 +382,11 @@ static void qos_build_config(struct qos_config *cfg, > memset(cfg, 0, sizeof(*cfg)); > > cfg->max_vls = opt->max_vls > 0 ? opt->max_vls : dflt->max_vls; > - cfg->vl_high_limit = (uint8_t) opt->high_limit; > + > + if (opt->high_limit >= 0) > + cfg->vl_high_limit = (uint8_t) opt->high_limit; > + else > + cfg->vl_high_limit = (uint8_t) dflt->high_limit; > > p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; > for (i = 0; i < 2 * IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; i++) { > diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c > index 8569043..1c9777e 100644 > --- a/opensm/opensm/osm_subnet.c > +++ b/opensm/opensm/osm_subnet.c > @@ -370,6 +370,15 @@ static void subn_set_default_qos_options(IN osm_qos_options_t * opt) > opt->sl2vl = OSM_DEFAULT_QOS_SL2VL; > } > > +static void subn_init_qos_options(IN osm_qos_options_t * opt) > +{ > + opt->max_vls = 0; > + opt->high_limit = -1; > + opt->vlarb_high = NULL; > + opt->vlarb_low = NULL; > + opt->sl2vl = NULL; > +} > + > /********************************************************************** > **********************************************************************/ > void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) > @@ -457,11 +466,11 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) > p_opt->no_clients_rereg = FALSE; > p_opt->prefix_routes_file = OSM_DEFAULT_PREFIX_ROUTES_FILE; > p_opt->consolidate_ipv6_snm_req = FALSE; > - subn_set_default_qos_options(&p_opt->qos_options); > - subn_set_default_qos_options(&p_opt->qos_ca_options); > - subn_set_default_qos_options(&p_opt->qos_sw0_options); > - subn_set_default_qos_options(&p_opt->qos_swe_options); > - subn_set_default_qos_options(&p_opt->qos_rtr_options); > + subn_init_qos_options(&p_opt->qos_options); > + subn_init_qos_options(&p_opt->qos_ca_options); > + subn_init_qos_options(&p_opt->qos_sw0_options); > + subn_init_qos_options(&p_opt->qos_swe_options); > + subn_init_qos_options(&p_opt->qos_rtr_options); > } > > /********************************************************************** > @@ -526,6 +535,21 @@ opts_unpack_uint32(IN char *p_req_key, > /********************************************************************** > **********************************************************************/ > static void > +opts_unpack_int32(IN char *p_req_key, > + IN char *p_key, IN char *p_val_str, IN int32_t * p_val) > +{ > + if (!strcmp(p_req_key, p_key)) { > + int32_t val = strtol(p_val_str, NULL, 0); > + if (val != *p_val) { > + log_config_value(p_key, "%d", val); > + *p_val = val; > + } > + } > +} > + > +/********************************************************************** > + **********************************************************************/ > +static void > opts_unpack_uint16(IN char *p_req_key, > IN char *p_key, IN char *p_val_str, IN uint16_t * p_val) > { > @@ -651,7 +675,7 @@ subn_parse_qos_options(IN const char *prefix, > snprintf(name, sizeof(name), "%s_max_vls", prefix); > opts_unpack_uint32(name, p_key, p_val_str, &opt->max_vls); > snprintf(name, sizeof(name), "%s_high_limit", prefix); > - opts_unpack_uint32(name, p_key, p_val_str, &opt->high_limit); > + opts_unpack_int32(name, p_key, p_val_str, &opt->high_limit); > snprintf(name, sizeof(name), "%s_vlarb_high", prefix); > opts_unpack_charp(name, p_key, p_val_str, &opt->vlarb_high); > snprintf(name, sizeof(name), "%s_vlarb_low", prefix); > @@ -786,138 +810,142 @@ osm_parse_prefix_routes_file(IN osm_subn_t * const p_subn) > > /********************************************************************** > **********************************************************************/ > - > -static void subn_verify_max_vls(unsigned *max_vls, const char *prefix) > +static void subn_verify_max_vls(unsigned *max_vls, const char *prefix, unsigned dflt) > { > - if (*max_vls > 15) { > - log_report(" Invalid Cached Option:%s_max_vls=%u:" > - "Using Default:%u\n", > - prefix, *max_vls, OSM_DEFAULT_QOS_MAX_VLS); > - *max_vls = OSM_DEFAULT_QOS_MAX_VLS; > + if (!(*max_vls) || *max_vls > 15) { > + log_report(" Invalid Cached Option: %s_max_vls=%u: " > + "Using Default = %u\n", prefix, *max_vls, dflt); > + *max_vls = dflt; > } > } > > -static void subn_verify_high_limit(unsigned *high_limit, const char *prefix) > +static void subn_verify_high_limit(int *high_limit, const char *prefix, int dflt) > { > - if (*high_limit > 255) { > - log_report(" Invalid Cached Option:%s_high_limit=%u:" > - "Using Default:%u\n", > - prefix, *high_limit, OSM_DEFAULT_QOS_HIGH_LIMIT); > - *high_limit = OSM_DEFAULT_QOS_HIGH_LIMIT; > + if (*high_limit < 0 || *high_limit > 255) { > + log_report(" Invalid Cached Option: %s_high_limit=%d: " > + "Using Default: %d\n", prefix, *high_limit, dflt); > + *high_limit = dflt; > } > } > > -static void subn_verify_vlarb(char *vlarb, const char *prefix, > - const char *suffix) > +static void subn_verify_vlarb(char **vlarb, const char *prefix, > + const char *suffix, char *dflt) > { > - if (vlarb) { > - char *str, *tok, *end, *ptr; > - int count = 0; > - > - str = strdup(vlarb); > - > - tok = strtok_r(str, ",\n", &ptr); > - while (tok) { > - char *vl_str, *weight_str; > - > - vl_str = tok; > - weight_str = strchr(tok, ':'); > - > - if (weight_str) { > - long vl, weight; > - > - *weight_str = '\0'; > - weight_str++; > - > - vl = strtol(vl_str, &end, 0); > - > - if (*end) > - log_report(" Warning: Cached Option " > - "%s_vlarb_%s:vl=%s " > - "improperly formatted\n", > - prefix, suffix, vl_str); > - else if (vl < 0 || vl > 14) > - log_report(" Warning: Cached Option " > - "%s_vlarb_%s:vl=%ld out " > - "of range\n", > - prefix, suffix, vl); > - > - weight = strtol(weight_str, &end, 0); > - > - if (*end) > - log_report(" Warning: Cached Option " > - "%s_vlarb_%s:weight=%s " > - "improperly formatted\n", > - prefix, suffix, weight_str); > - else if (weight < 0 || weight > 255) > - log_report(" Warning: Cached Option " > - "%s_vlarb_%s:weight=%ld " > - "out of range\n", > - prefix, suffix, weight); > - } else > - log_report(" Warning: Cached Option " > - "%s_vlarb_%s:vl:weight=%s " > - "improperly formatted\n", > - prefix, suffix, tok); > + char *str, *tok, *end, *ptr; > + int count = 0; > + > + if (*vlarb == NULL) { > + log_report(" Invalid Cached Option: %s_vlarb_%s: " > + "Using Default\n", prefix, suffix); > + *vlarb = dflt; > + return; > + } > > - count++; > - tok = strtok_r(NULL, ",\n", &ptr); > - } > + str = strdup(*vlarb); > + > + tok = strtok_r(str, ",\n", &ptr); > + while (tok) { > + char *vl_str, *weight_str; > > - if (count > 64) > - log_report(" Warning: Cached Option %s_vlarb_%s: " > - "> 64 listed: excess vl:weight pairs " > - "will be dropped\n", prefix, suffix); > + vl_str = tok; > + weight_str = strchr(tok, ':'); > > - free(str); > + if (weight_str) { > + long vl, weight; > + > + *weight_str = '\0'; > + weight_str++; > + > + vl = strtol(vl_str, &end, 0); > + > + if (*end) > + log_report(" Warning: Cached Option " > + "%s_vlarb_%s:vl=%s" > + " improperly formatted\n", > + prefix, suffix, vl_str); > + else if (vl < 0 || vl > 14) > + log_report(" Warning: Cached Option " > + "%s_vlarb_%s:vl=%ld out of range\n", > + prefix, suffix, vl); > + > + weight = strtol(weight_str, &end, 0); > + > + if (*end) > + log_report(" Warning: Cached Option " > + "%s_vlarb_%s:weight=%s " > + "improperly formatted\n", > + prefix, suffix, weight_str); > + else if (weight < 0 || weight > 255) > + log_report(" Warning: Cached Option " > + "%s_vlarb_%s:weight=%ld " > + "out of range\n", > + prefix, suffix, weight); > + } else > + log_report(" Warning: Cached Option " > + "%s_vlarb_%s:vl:weight=%s " > + "improperly formatted\n", > + prefix, suffix, tok); > + > + count++; > + tok = strtok_r(NULL, ",\n", &ptr); > } > + > + if (count > 64) > + log_report(" Warning: Cached Option %s_vlarb_%s: > 64 listed:" > + " excess vl:weight pairs will be dropped\n", > + prefix, suffix); > + > + free(str); > } > > -static void subn_verify_sl2vl(char *sl2vl, const char *prefix) > +static void subn_verify_sl2vl(char **sl2vl, const char *prefix, char *dflt) > { > - if (sl2vl) { > - char *str, *tok, *end, *ptr; > - int count = 0; > + char *str, *tok, *end, *ptr; > + int count = 0; > + > + if (*sl2vl == NULL) { > + log_report(" Invalid Cached Option: %s_sl2vl: Using Default\n", > + prefix); > + *sl2vl = dflt; > + return; > + } > > - str = strdup(sl2vl); > + str = strdup(*sl2vl); > > - tok = strtok_r(str, ",\n", &ptr); > - while (tok) { > - long vl = strtol(tok, &end, 0); > + tok = strtok_r(str, ",\n", &ptr); > + while (tok) { > + long vl = strtol(tok, &end, 0); > > - if (*end) > - log_report(" Warning: Cached Option %s_sl2vl:" > - "vl=%s improperly formatted\n", > - prefix, tok); > - else if (vl < 0 || vl > 15) > - log_report(" Warning: Cached Option %s_sl2vl:" > - "vl=%ld out of range\n", > - prefix, vl); > - > - count++; > - tok = strtok_r(NULL, ",\n", &ptr); > - } > + if (*end) > + log_report(" Warning: Cached Option %s_sl2vl:vl=%s " > + "improperly formatted\n", prefix, tok); > + else if (vl < 0 || vl > 15) > + log_report(" Warning: Cached Option %s_sl2vl:vl=%ld " > + "out of range\n", prefix, vl); > > - if (count < 16) > - log_report(" Warning: Cached Option %s_sl2vl: < 16 VLs " > - "listed\n", prefix); > + count++; > + tok = strtok_r(NULL, ",\n", &ptr); > + } > > - if (count > 16) > - log_report(" Warning: Cached Option %s_sl2vl: " > - "> 16 listed: excess VLs will be dropped\n", > - prefix); > + if (count < 16) > + log_report(" Warning: Cached Option %s_sl2vl: < 16 VLs " > + "listed\n", prefix); > > - free(str); > - } > + if (count > 16) > + log_report(" Warning: Cached Option %s_sl2vl: > 16 listed: " > + "excess VLs will be dropped\n", prefix); > + > + free(str); > } > > -static void subn_verify_qos_set(osm_qos_options_t *set, const char *prefix) > +static void subn_verify_qos_set(osm_qos_options_t *set, const char *prefix, > + osm_qos_options_t *dflt) > { > - subn_verify_max_vls(&set->max_vls, prefix); > - subn_verify_high_limit(&set->high_limit, prefix); > - subn_verify_vlarb(set->vlarb_low, prefix, "low"); > - subn_verify_vlarb(set->vlarb_high, prefix, "high"); > - subn_verify_sl2vl(set->sl2vl, prefix); > + subn_verify_max_vls(&set->max_vls, prefix, dflt->max_vls); > + subn_verify_high_limit(&set->high_limit, prefix, dflt->high_limit); > + subn_verify_vlarb(&set->vlarb_low, prefix, "low", dflt->vlarb_low); > + subn_verify_vlarb(&set->vlarb_high, prefix, "high", dflt->vlarb_high); > + subn_verify_sl2vl(&set->sl2vl, prefix, dflt->sl2vl); > } > > static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) > @@ -957,11 +985,24 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) > } > > if (p_opts->qos) { > - subn_verify_qos_set(&p_opts->qos_options, "qos"); > - subn_verify_qos_set(&p_opts->qos_ca_options, "qos_ca"); > - subn_verify_qos_set(&p_opts->qos_sw0_options, "qos_sw0"); > - subn_verify_qos_set(&p_opts->qos_swe_options, "qos_swe"); > - subn_verify_qos_set(&p_opts->qos_rtr_options, "qos_rtr"); > + osm_qos_options_t dflt; > + > + /* the default options in qos_options must be correct. > + * every other one need not be, b/c those will default > + * back to whatever is in qos_options. > + */ > + > + subn_set_default_qos_options(&dflt); > + > + subn_verify_qos_set(&p_opts->qos_options, "qos", &dflt); > + subn_verify_qos_set(&p_opts->qos_ca_options, "qos_ca", > + &p_opts->qos_options); > + subn_verify_qos_set(&p_opts->qos_sw0_options, "qos_sw0", > + &p_opts->qos_options); > + subn_verify_qos_set(&p_opts->qos_swe_options, "qos_swe", > + &p_opts->qos_options); > + subn_verify_qos_set(&p_opts->qos_rtr_options, "qos_rtr", > + &p_opts->qos_options); > } > > #ifdef ENABLE_OSM_PERF_MGR > @@ -1267,30 +1308,31 @@ int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) > return -1; > } > > + subn_init_qos_options(&p_subn->opt.qos_options); > + subn_init_qos_options(&p_subn->opt.qos_ca_options); > + subn_init_qos_options(&p_subn->opt.qos_sw0_options); > + subn_init_qos_options(&p_subn->opt.qos_swe_options); > + subn_init_qos_options(&p_subn->opt.qos_rtr_options); > + > while (fgets(line, 1023, opts_file) != NULL) { > /* get the first token */ > p_key = strtok_r(line, " \t\n", &p_last); > if (p_key) { > p_val = strtok_r(NULL, " \t\n", &p_last); > > - subn_parse_qos_options("qos", > - p_key, p_val, > + subn_parse_qos_options("qos", p_key, p_val, > &p_subn->opt.qos_options); > > - subn_parse_qos_options("qos_ca", > - p_key, p_val, > + subn_parse_qos_options("qos_ca", p_key, p_val, > &p_subn->opt.qos_ca_options); > > - subn_parse_qos_options("qos_sw0", > - p_key, p_val, > + subn_parse_qos_options("qos_sw0", p_key, p_val, > &p_subn->opt.qos_sw0_options); > > - subn_parse_qos_options("qos_swe", > - p_key, p_val, > + subn_parse_qos_options("qos_swe", p_key, p_val, > &p_subn->opt.qos_swe_options); > > - subn_parse_qos_options("qos_rtr", > - p_key, p_val, > + subn_parse_qos_options("qos_rtr", p_key, p_val, > &p_subn->opt.qos_rtr_options); > > } -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From chu11 at llnl.gov Thu Nov 13 09:47:23 2008 From: chu11 at llnl.gov (Al Chu) Date: Thu, 13 Nov 2008 09:47:23 -0800 Subject: [ofa-general] [ipoib][patch] handle pkey input to create_child and delete_child consistently Message-ID: <1226598443.7156.52.camel@cardanus.llnl.gov> I noticed that the pkey is handled differently between ipoib's create_child and delete_child functions. So a user can create a interface with a pkey, but not delete it with the same pkey. Sort of makes it confusing for the average person. # sys/class/net/ib0 > echo 0x6fff > create_child # /sys/class/net/ib0 > echo 0x6fff > delete_child -bash: echo: write error: No such file or directory # /sys/class/net/ib0 > echo 0xefff > delete_child # /sys/class/net/ib0 > The attached patch simply bitwise-ORs the full membership bit into the delete_child function for consistency. A check for a valid full- membership bit on the create_child function would be fine as well, but IMO this is the lesser confusing option (and is backwards compatible to any scripts people have already written). Al -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-handle-pkey-in-create_child-and-delete_child-consist.patch Type: text/x-patch Size: 922 bytes Desc: not available URL: From hal.rosenstock at gmail.com Thu Nov 13 09:56:46 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 13 Nov 2008 12:56:46 -0500 Subject: ***SPAM*** [ofa-general] [PATCH] opensm/osm_mcast_tbl.c wrong max mcast lid cause the sm to set invalid MFT block. In-Reply-To: <491C4CFA.8000006@gmail.com> References: <491C4CFA.8000006@gmail.com> Message-ID: Hi Eli, On Thu, Nov 13, 2008 at 10:51 AM, Eli Dorfman wrote: > wrong max mcast lid cause the sm to set invalid MFT block. > when mcmember tries to set mcast lid beyond mcast capability (e.g. 0xc400), > the sm accepts this and tries to set invalid block. Good find (and nice test case). Do the switch SMA's reject those invalid sets ? I'm hoping that's the case. See below for minor question on the patch. > Signed-off-by: Eli Dorfman > > --- > opensm/opensm/osm_mcast_tbl.c | 6 +++--- > opensm/opensm/osm_sa_mcmember_record.c | 2 +- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/opensm/opensm/osm_mcast_tbl.c b/opensm/opensm/osm_mcast_tbl.c > index 92fbb63..17fb69c 100644 > --- a/opensm/opensm/osm_mcast_tbl.c > +++ b/opensm/opensm/osm_mcast_tbl.c > @@ -81,7 +81,7 @@ osm_mcast_tbl_init(IN osm_mcast_tbl_t * const p_tbl, > IB_MCAST_BLOCK_SIZE) / > IB_MCAST_BLOCK_SIZE) - 1); > > - p_tbl->max_mlid_ho = (uint16_t) (IB_LID_MCAST_START_HO + capacity); > + p_tbl->max_mlid_ho = (uint16_t) (IB_LID_MCAST_START_HO + capacity - 1); > > /* > The number of bytes needed in the mask table is: > @@ -216,7 +216,7 @@ osm_mcast_tbl_set_block(IN osm_mcast_tbl_t * const p_tbl, > > mlid_start_ho = (uint16_t) (block_num * IB_MCAST_BLOCK_SIZE); > > - if (mlid_start_ho + IB_MCAST_BLOCK_SIZE > p_tbl->max_mlid_ho) > + if (mlid_start_ho + IB_MCAST_BLOCK_SIZE - 1 > p_tbl->max_mlid_ho) > return (IB_INVALID_PARAMETER); > > for (i = 0; i < IB_MCAST_BLOCK_SIZE; i++) > @@ -274,7 +274,7 @@ osm_mcast_tbl_get_block(IN osm_mcast_tbl_t * const p_tbl, > > mlid_start_ho = (uint16_t) (block_num * IB_MCAST_BLOCK_SIZE); > > - if (mlid_start_ho + IB_MCAST_BLOCK_SIZE > p_tbl->max_mlid_ho) > + if (mlid_start_ho + IB_MCAST_BLOCK_SIZE - 1 > p_tbl->max_mlid_ho) > return (IB_INVALID_PARAMETER); > > for (i = 0; i < IB_MCAST_BLOCK_SIZE; i++) > diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c > index 5dd286a..6007b06 100644 > --- a/opensm/opensm/osm_sa_mcmember_record.c > +++ b/opensm/opensm/osm_sa_mcmember_record.c > @@ -846,7 +846,7 @@ osm_mcmr_rcv_create_new_mgrp(IN osm_sa_t * sa, > mlid = __get_new_mlid(sa, mcm_rec.mlid); > if (mlid == 0) { > OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1B19: " > - "__get_new_mlid failed\n"); > + "__get_new_mlid failed request mlid 0x%04x\n", mcm_rec.mlid); ^^^^^^^^^^^^^^^^ Should this be cl_ntoh16(mcm_rec.mlid) ? -- Hal > status = IB_SA_MAD_STATUS_NO_RESOURCES; > goto Exit; > } > -- > 1.5.5 > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Thu Nov 13 07:22:44 2008 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 13 Nov 2008 07:22:44 -0800 Subject: [ofa-general][PATCH 1/3]mlx4: Multiple completion vectors support References: <4907348E.7060508@mellanox.co.il> <490A8FA9.7080802@pobox.com> <490DA91A.1030703@pobox.com> <490DD27C.4070109@pobox.com> <491C41F0.3080304@mellanox.co.il> Message-ID: > What is the status of this? > I know its in mlx_core but mainly needed for mlnx_en and has minimal > impact on the IB side > I think Roland is at new baby vacation so what is the resolution? This is 2.6.29 material, and I should be able to get to it in the next few weeks. - R. From kliteyn at dev.mellanox.co.il Thu Nov 13 14:22:35 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Fri, 14 Nov 2008 00:22:35 +0200 Subject: [ofa-general] [PATCH] opensm/osm_lid_mgr.c: ignore and overwrite guid2lid (windows) Message-ID: <491CA8AB.1010801@dev.mellanox.co.il> Hi Sasha, When Windows is crashing with BSOD, it might corrupt files that were previously opened for writing, even if the files are closed. As a result, we might see corrupted guid2lid file, and OpenSM will exit on such error. This patch makes SM ignore (and later overwrite) corrupted guid2lid files. The patch has already been accepted into ofw. I'm posting it to openib too, so that when some day WinSM will be synchronized with OpenSM, this fix won't be lost. Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_lid_mgr.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_lid_mgr.c b/opensm/opensm/osm_lid_mgr.c index 0c536a8..c135d4a 100644 --- a/opensm/opensm/osm_lid_mgr.c +++ b/opensm/opensm/osm_lid_mgr.c @@ -261,6 +261,12 @@ osm_lid_mgr_init(IN osm_lid_mgr_t * const p_mgr, IN osm_sm_t *sm) /* we use the stored guid to lid table if not forced to reassign */ if (!p_mgr->p_subn->opt.reassign_lids) { if (osm_db_restore(p_mgr->p_g2l)) { +#ifndef __WIN__ + /* + * When Windows is BSODing, it might corrupt files that + * were previously opened for writing, even if the files + * are closed, so we might see corrupted guid2lid file. + */ if (p_mgr->p_subn->opt.exit_on_fatal) { osm_log(p_mgr->p_log, OSM_LOG_SYS, "FATAL: Error restoring Guid-to-Lid " @@ -268,6 +274,7 @@ osm_lid_mgr_init(IN osm_lid_mgr_t * const p_mgr, IN osm_sm_t *sm) status = IB_ERROR; goto Exit; } else +#endif OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR, "ERR 0317: Error restoring Guid-to-Lid " "persistent database\n"); -- 1.5.1.4 From vlad at lists.openfabrics.org Fri Nov 14 03:24:46 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Fri, 14 Nov 2008 03:24:46 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081114-0200 daily build status Message-ID: <20081114112447.24D79E60DCB@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From michael.heinz at qlogic.com Fri Nov 14 08:27:33 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Fri, 14 Nov 2008 10:27:33 -0600 Subject: [ofa-general] ib_ucm does not start correctly on redhat 4 boxes. Message-ID: On my Suse machines, the ib_ucm module loads normally and creates its /dev/infiniband/ucm0 file correctly - but on the redhat boxes, the device file is never created, even though the module loads. Does anyone know of a fix? I manually created the file with mknod and that worked; so obviously the module loaded correctly, it's just the device file that's not getting initialized. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsquyres at cisco.com Fri Nov 14 08:29:29 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Fri, 14 Nov 2008 11:29:29 -0500 Subject: [ofa-general] ib_ucm does not start correctly on redhat 4 boxes. In-Reply-To: References: Message-ID: <897AF2B5-724C-46C8-AB5E-F8559D5B4162@cisco.com> I filed a ticket about this long ago. Still hasn't been fixed: https://bugs.openfabrics.org/show_bug.cgi?id=963 On Nov 14, 2008, at 11:27 AM, Mike Heinz wrote: > On my Suse machines, the ib_ucm module loads normally and creates > its /dev/infiniband/ucm0 file correctly - but on the redhat boxes, > the device file is never created, even though the module loads. > > Does anyone know of a fix? I manually created the file with mknod > and that worked; so obviously the module loaded correctly, it's just > the device file that's not getting initialized. > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- Jeff Squyres Cisco Systems From michael.heinz at qlogic.com Fri Nov 14 08:35:57 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Fri, 14 Nov 2008 10:35:57 -0600 Subject: [ofa-general] ib_ucm does not start correctly on redhat 4 boxes. In-Reply-To: <897AF2B5-724C-46C8-AB5E-F8559D5B4162@cisco.com> References: <897AF2B5-724C-46C8-AB5E-F8559D5B4162@cisco.com> Message-ID: It's odd because a quick look at the code doesn't show anything tremendously weird. I wonder if it's a bug in RHEL.... -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Jeff Squyres [mailto:jsquyres at cisco.com] Sent: Friday, November 14, 2008 11:29 AM To: Mike Heinz Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] ib_ucm does not start correctly on redhat 4 boxes. I filed a ticket about this long ago. Still hasn't been fixed: https://bugs.openfabrics.org/show_bug.cgi?id=963 On Nov 14, 2008, at 11:27 AM, Mike Heinz wrote: > On my Suse machines, the ib_ucm module loads normally and creates its > /dev/infiniband/ucm0 file correctly - but on the redhat boxes, the > device file is never created, even though the module loads. > > Does anyone know of a fix? I manually created the file with mknod and > that worked; so obviously the module loaded correctly, it's just the > device file that's not getting initialized. > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general -- Jeff Squyres Cisco Systems From tziporet at mellanox.co.il Fri Nov 14 09:08:24 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Fri, 14 Nov 2008 19:08:24 +0200 Subject: [ofa-general] OFED 1.4 bugs status and OFED meetings Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD0FE73D@mtlexch01.mtl.com> Hi, This is the bugs status Bug owners - please update bugs status (I think I saw some commits so maybe some of them are already fixed) and see if they are really critical for the release 1323 blo stefan.roscher at de.ibm.com IB/ehca: possibility of kernel panic under certain circu... 1242 cri yannick.cote at qlogic.com kernel panic while running mpi2007 against ofed1.4 -- ib_... 1289 maj amirv at mellanox.co.il Ib and ipoib doesnt respond while running multiple tests ... 1349 maj amirv at mellanox.co.il Kernel panic on sdp 1379 maj vu at mellanox.com Cannot unload ib_srpt module on SRP target 1377 maj vu at mellanox.com Deadlock occurred during HA test 1380 maj vu at mellanox.com Cannot unload ib_srpt module on SRP target 1279 min amirv at mellanox.co.il ltp_sdp connect "already connected successful" very slow 1331 min amirv at mellanox.co.il SDP connect to 0.0.0.0 doesn't work I don't think we need a meeting on Monday (I personally will not be able to attend) If we only have bugs in SDP and SRP we should go ahead and build RC5 on Monday Reminder to all - please send release notes Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsquyres at cisco.com Fri Nov 14 09:15:35 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Fri, 14 Nov 2008 12:15:35 -0500 Subject: [ofa-general] Re: [ewg] OFED 1.4 bugs status and OFED meetings In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EAD0FE73D@mtlexch01.mtl.com> References: <5D49E7A8952DC44FB38C38FA0D758EAD0FE73D@mtlexch01.mtl.com> Message-ID: <3EFE4684-EAA2-4A49-B0F5-927962D52A12@cisco.com> On Nov 14, 2008, at 12:08 PM, Tziporet Koren wrote: > I don't think we need a meeting on Monday (I personally will not be > able to attend) > Ok. Unless, I hear differently by COB today (US Eastern time), I'll cancel the phone bridge for Monday. -- Jeff Squyres Cisco Systems From akepner at sgi.com Fri Nov 14 11:43:17 2008 From: akepner at sgi.com (akepner at sgi.com) Date: Fri, 14 Nov 2008 11:43:17 -0800 Subject: [ofa-general] opensm: bad multicast forwarding table entries In-Reply-To: References: <20081112221846.GE25248@sgi.com> Message-ID: <20081114194317.GM25248@sgi.com> FWIW, I asked for the additional data that Hal requested. But this time there are no occurrences of "Disconnected switch|HCA" errors from 'ibdiagnet -r'. The entire cluster was recently rebooted (probably the IB switches, too), opensm restarted, etc. So that seems to have cleared things up, at least for now. But this is something that we've seen on quite a few occasions, so we'll keep looking for it, and grab what debug info we can when it crops up again. -- Arthur From hal.rosenstock at gmail.com Fri Nov 14 13:35:15 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Fri, 14 Nov 2008 16:35:15 -0500 Subject: [ofa-general] Re: rate assignment for path queries In-Reply-To: <20081113131703.GV27271@sashak.voltaire.com> References: <20081113131703.GV27271@sashak.voltaire.com> Message-ID: On Thu, Nov 13, 2008 at 8:17 AM, Sasha Khapyorsky wrote: > Hi Or, > > On 09:20 Thu 13 Nov , Or Gerlitz wrote: >> >> If opensm doesn't have a match on any qos-assignment rule (eg when there's >> no qos-config file), when coming to serve sa path query, my understanding >> is that the "qos related fields" of the partition would be used. >> >> For example, I have set the following partition config file which assigns >> to the 0x8001 partition, and run without any qos file. >> >> Default=0x7fff,ipoib : ALL=full; >> RED=0x8001, ipoib, sl=1, rate=2, defmember=full : ALL=full; >> RED=0x8002, ipoib, sl=2, rate=3, defmember=full : ALL=full; >> >> When a path query is issued, Indeed sl=1 is returned but I see that a >> rate=6 (20Gbs) is returned where I configured rate=2 (2.5 Gbs). > > For my best knowledge rate=2 in partition config file will be related to > corresponded IPoIB multicast group for this partition, and not to > PathRecord. There is a form of PR query that supports returning information on MGIDs when used as a DGID. > In PathRecord you get maximum available rate on the > requested path. Here you are talking about current OpenSM implementation. -- Hal >> Have I done anything wrong? is it a known issue? what does it means >> when the SM prints "min rate = 6" > > Here "min rate" means minimal common rate on the path. > > Sasha > >> >> Or. >> >> >> Nov 13 02:12:49 219374 [42803940] 0x08 -> PathRecord dump: >> service_id..............0x0000000000000000 >> dgid....................0xfe80000000000000 : 0x0002c90300026be7 >> sgid....................0xfe80000000000000 : 0x0002c90300026be3 >> dlid....................0x0 >> slid....................0x0 >> hop_flow_raw............0x0 >> tclass..................0x0 >> num_path_revers.........0x1 >> pkey....................0x8001 >> qos_class...............0x0 >> sl......................0x0 >> mtu.....................0x3 >> rate....................0x0 >> pkt_life................0x0 >> preference..............0x0 >> resv2...................0x0 >> resv3...................0x0 >> Nov 13 02:12:49 219386 [42803940] 0x10 -> __osm_pr_rcv_check_mcast_dest: [ >> Nov 13 02:12:49 219390 [42803940] 0x10 -> __osm_pr_rcv_check_mcast_dest: ] >> Nov 13 02:12:49 219394 [42803940] 0x08 -> osm_pr_rcv_process: Unicast destination requested >> Nov 13 02:12:49 219398 [42803940] 0x10 -> __osm_pr_rcv_get_end_points: [ >> Nov 13 02:12:49 219403 [42803940] 0x10 -> __osm_pr_rcv_get_end_points: ] >> Nov 13 02:12:49 219407 [42803940] 0x10 -> __osm_pr_rcv_process_pair: [ >> Nov 13 02:12:49 219411 [42803940] 0x10 -> __osm_pr_rcv_get_port_pair_paths: [ >> Nov 13 02:12:49 219415 [42803940] 0x08 -> __osm_pr_rcv_get_port_pair_paths: Src port 0x0002c90300026be3, Dst port 0x0002c90300026be7 >> Nov 13 02:12:49 219420 [42803940] 0x10 -> osm_port_share_pkey: [ >> Nov 13 02:12:49 219424 [42803940] 0x10 -> osm_port_share_pkey: ] >> Nov 13 02:12:49 219428 [42803940] 0x10 -> osm_port_share_pkey: [ >> Nov 13 02:12:49 219432 [42803940] 0x10 -> osm_port_share_pkey: ] >> Nov 13 02:12:49 219436 [42803940] 0x10 -> osm_port_share_pkey: [ >> Nov 13 02:12:49 219440 [42803940] 0x10 -> osm_port_share_pkey: ] >> Nov 13 02:12:49 219444 [42803940] 0x08 -> __osm_pr_rcv_get_port_pair_paths: Src LIDs [0x7-0x7], Dest LIDs [0x8-0x8] >> Nov 13 02:12:49 219449 [42803940] 0x10 -> __osm_pr_rcv_get_lid_pair_path: [ >> Nov 13 02:12:49 219453 [42803940] 0x08 -> __osm_pr_rcv_get_lid_pair_path: Src LID 0x7, Dest LID 0x8 >> Nov 13 02:12:49 219458 [42803940] 0x10 -> __osm_pr_rcv_get_path_parms: [ >> Nov 13 02:12:49 219464 [42803940] 0x08 -> __osm_pr_rcv_get_path_parms: Path min MTU = 4, min rate = 6 >> Nov 13 02:12:49 219471 [42803940] 0x08 -> __osm_pr_rcv_get_path_parms: Path params: mtu = 4, rate = 6, packet lifetime = 18, pkey = 0x8001, sl = 1 >> Nov 13 02:12:49 219476 [42803940] 0x10 -> __osm_pr_rcv_get_path_parms: ] >> Nov 13 02:12:49 219480 [42803940] 0x10 -> __osm_pr_rcv_get_path_parms: [ >> Nov 13 02:12:49 219484 [42803940] 0x08 -> __osm_pr_rcv_get_path_parms: Path min MTU = 4, min rate = 6 >> Nov 13 02:12:49 219489 [42803940] 0x08 -> __osm_pr_rcv_get_path_parms: Path params: mtu = 4, rate = 6, packet lifetime = 18, pkey = 0x8001, sl = 1 > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From hal.rosenstock at gmail.com Fri Nov 14 13:39:45 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Fri, 14 Nov 2008 16:39:45 -0500 Subject: [ofa-general] Re: rate assignment for path queries In-Reply-To: <491C2EFE.4060900@voltaire.com> References: <20081113131703.GV27271@sashak.voltaire.com> <491C2EFE.4060900@voltaire.com> Message-ID: On Thu, Nov 13, 2008 at 8:43 AM, Or Gerlitz wrote: > Sasha Khapyorsky wrote: >>> >>> RED=0x8001, ipoib, sl=1, rate=2, defmember=full : ALL=full; >>> >>> When a path query is issued, Indeed sl=1 is returned but I see that a >>> rate=6 (20Gbs) is returned where I configured rate=2 (2.5 Gbs). >> >> For my best knowledge rate=2 in partition config file will be related to >> corresponded IPoIB multicast group for this partition, and not to >> PathRecord. In PathRecord you get maximum available rate on the requested >> path. > > I understand your comment about the relation to multicast join and not path > queries. However, currently, where there's no rule in the qos-config file > (or no file) that matches the path query, the SM does provide the SL > assigned to the partition (specified in the query) through the pkey file but > it doesn't do so for the Rate. So you say that for QoS = > assignment one should use the qos-policy file, let it be. I think Sasha is not saying "should use qos-policy file". You're asking about the pre QoS annex Qos implementation in OpenSM and I think this could be viewed as an omission (bug/feature). I think it could easily be changed in SA PR/MPR support. -- Hal > Or. > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From weiny2 at llnl.gov Fri Nov 14 14:28:48 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Fri, 14 Nov 2008 14:28:48 -0800 Subject: [ofa-general] Re: [PATCH V2] Add check for previous versions of plugins. In-Reply-To: <20081109174733.GA30265@sashak.voltaire.com> References: <20081104095812.2ff5920c.weiny2@llnl.gov> <20081109174733.GA30265@sashak.voltaire.com> Message-ID: <20081114142848.75c64c94.weiny2@llnl.gov> I believe this will work. I incorporated your patch but I made this explicit so it will hopefully be clear what is going on. Ira >From 061822466a157bb425600ee0b63cc80ff038d615 Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Mon, 3 Nov 2008 15:50:15 -0800 Subject: [PATCH] Add check for previous versions of plugins. If old interface plugins are available to OpenSM they will cause a crash. Check for this old version and error out gracefully. Signed-off-by: Ira Weiny --- opensm/include/opensm/osm_event_plugin.h | 1 + opensm/opensm/osm_event_plugin.c | 11 +++++++++++ 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/opensm/include/opensm/osm_event_plugin.h b/opensm/include/opensm/osm_event_plugin.h index b2deeba..0922c65 100644 --- a/opensm/include/opensm/osm_event_plugin.h +++ b/opensm/include/opensm/osm_event_plugin.h @@ -148,6 +148,7 @@ typedef struct osm_epi_trap_event { * The version should be set to OSM_EVENT_PLUGIN_INTERFACE_VER */ #define OSM_EVENT_PLUGIN_IMPL_NAME "osm_event_plugin" +#define OSM_ORIG_EVENT_PLUGIN_INTERFACE_VER 1 #define OSM_EVENT_PLUGIN_INTERFACE_VER 2 typedef struct osm_event_plugin { const char *osm_version; diff --git a/opensm/opensm/osm_event_plugin.c b/opensm/opensm/osm_event_plugin.c index c6999f5..b0dc549 100644 --- a/opensm/opensm/osm_event_plugin.c +++ b/opensm/opensm/osm_event_plugin.c @@ -66,6 +66,7 @@ osm_epi_plugin_t *osm_epi_construct(osm_opensm_t *osm, char *plugin_name) { char lib_name[OSM_PATH_MAX]; + struct old_if { unsigned ver; } *old_impl; osm_epi_plugin_t *rc = NULL; if (!plugin_name || !*plugin_name) @@ -96,6 +97,16 @@ osm_epi_plugin_t *osm_epi_construct(osm_opensm_t *osm, char *plugin_name) goto Exit; } + /* check for old interface */ + old_impl = (struct old_if *) rc->impl; + if (old_impl->ver == OSM_ORIG_EVENT_PLUGIN_INTERFACE_VER) { + OSM_LOG(&osm->log, OSM_LOG_ERROR, "Error loading plugin: " + "\'%s\' contains a depricated interface version %d\n" + " Please recompile with the new interface.\n", + plugin_name, old_impl->ver); + goto Exit; + } + /* Check the version to make sure this module will work with us */ if (strcmp(rc->impl->osm_version, osm->osm_version)) { OSM_LOG(&osm->log, OSM_LOG_ERROR, "Error loading plugin" -- 1.5.4.5 From weiny2 at llnl.gov Fri Nov 14 14:54:06 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Fri, 14 Nov 2008 14:54:06 -0800 Subject: [ofa-general] [PATCH] Fix max parameter passed to umad_get_cas_names Message-ID: <20081114145406.57dff1a7.weiny2@llnl.gov> >From a9149f4e38081d206d0be0af2194f4e09f944f21 Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Fri, 14 Nov 2008 11:36:01 -0800 Subject: [PATCH] Fix max parameter passed to umad_get_cas_names Signed-off-by: Ira Weiny --- infiniband-diags/src/ibstat.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/infiniband-diags/src/ibstat.c b/infiniband-diags/src/ibstat.c index 6be1302..e2775ca 100644 --- a/infiniband-diags/src/ibstat.c +++ b/infiniband-diags/src/ibstat.c @@ -65,6 +65,8 @@ static int debug; +#define MAX_DEVICES 20 + char *argv0 = "ibstat"; static char *node_type_str[] = { @@ -201,7 +203,7 @@ usage(void) int main(int argc, char *argv[]) { - char names[20][UMAD_CA_NAME_LEN]; + char names[MAX_DEVICES][UMAD_CA_NAME_LEN]; int dev_port = -1; int list_only = 0, short_format = 0, list_ports = 0; int n, i; @@ -254,7 +256,7 @@ main(int argc, char *argv[]) if (umad_init() < 0) IBPANIC("can't init UMAD library"); - if ((n = umad_get_cas_names((void *)names, UMAD_CA_NAME_LEN)) < 0) + if ((n = umad_get_cas_names((void *)names, MAX_DEVICES)) < 0) IBPANIC("can't list IB device names"); if (argc) { -- 1.5.4.5 From arlin.r.davis at intel.com Fri Nov 14 17:02:51 2008 From: arlin.r.davis at intel.com (Arlin Davis) Date: Fri, 14 Nov 2008 17:02:51 -0800 Subject: [ofa-general] [PATCH] uDAPL release notes updated for OFED 1.4 Message-ID: <000a01c946bd$e295c840$4797070a@amr.corp.intel.com> uDAPL_release_notes.txt updated for OFED 1.4 Signed-off-by: Arlin Davis Tziporet, please pull into OFED 1.4. Thanks! diff --git a/uDAPL_release_notes.txt b/uDAPL_release_notes.txt index 23b3d8b..33bbf0e 100644 --- a/uDAPL_release_notes.txt +++ b/uDAPL_release_notes.txt @@ -1,15 +1,70 @@ Release Notes for - OFED 1.3.1 DAPL Release - June 2008 + OFED 1.4 DAPL Release + November 2008 - OFED 1.3.1 RELEASE NOTES + OFED 1.4 RELEASE NOTES This release of the DAPL reference implementation - is timed to coincide with OFED release 1.3.1 of the - Open Fabrics (www.openfabrics.org) software stack. + is timed to coincide with OFED release 1.3.1 of the + Open Fabrics (www.openfabrics.org) software stack. + + NEW SINCE OFED 1.3.1 + + OFED 1.4 includes new versions compat-dapl-1.2.12-1, dapl-2.0.15-1 + + Summary of changes since OFED 1.3.1 release: + + * New Features (scalability improvements - socket cm and UD support) + + 1. The new socket CM provider, introduced in 1.2.8 and 2.0.11 packages, + assumes homogeneous cluster and will setup the QP's based on local + HCA port attributes and exchanges QP information via socket's using + the hostname of each node. IPoIB and rdma_cm are NOT required for + this provider. QP attributes can be adjusted via the following + environment parameters: + + DAPL_ACK_TIMER (default=16 5 bits, 4.096us*2^ack_timer. 16 =268ms) + DAPL_ACK_RETRY (default=7 3 bits, 7 * 268ms = 1.8 seconds) + DAPL_RNR_TIMER (default=12 5 bits, 12 = 64ms, 28 = 163ms, 31 = 491ms) + DAPL_RNR_RETRY (default=7 3 bits, 7 = infinite) + DAPL_IB_MTU (default=1024, limited to active MTU max) + + The new socket cm entries in /etc/dat.conf provide a link to the actual + HCA device and port. Example v1 and v2 entries for a Mellanox connectx + device, port 1: + - OpenIB-mlx4_0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 1" "" + - ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" "" + + 2. New v2 definitions for IB unreliable datagram extension + (only supported in v2 scm provider, libdaploscm.so.2) + - Extended EP dat_service_type, with DAT_IB_SERVICE_TYPE_UD + - Add IB extension call dat_ib_post_send_ud(). + - Add address handle definition for UD calls. + - Add IB event definitions to provide remote AH via connect + and connect requests + - See dtestx (-d) source for example usage model + + * Bug Fixes + + v1,v2 - allow override of /etc/dat.conf via syscondir option + v1,v2 - fix dapltest transaction test to avoid cleanup before rdma complete + v1 - add ipath, ehca socket cm provider entries for v1.2, sync with v2.0 + v1,v2 - iWarp, 1 iov on rdma_reads, reduce iov's in dtest, add dat.conf entry + v1,v2 - add $(DESTDIR) on install/uninstall hooks + v2 - add new options to dtestx for UD testing + v2 - IB UD fixes in common code/socket cm provider to allow multiple EP support + v1,v2 - iWarp, 1 iov on rdma_reads, reduce iov's in dtest, add dat.conf entry + v1,v2 - add $(DESTDIR) on install/uninstall hooks + v2 - add new options to dtestx for UD testing + v2 - IB UD fixes in common code/socket cm provider to allow multiple EP support + v2 - fix dtest and dtestx build warnings + v1,v2 - socket cm fixes, added DAPL_IB_MTU, + changed default QP timers, include NULL definition. + v1,v2 - Fix compiler warnings: dat, dapl, dtest, and dapltest + + NEW SINCE OFED 1.3 - NEW SINCE OFED 1.3 OFED 1.3.1 includes new versions of uDAPL v1 (1.2.7-1) and v2 (2.0.9-1) Summary of changes since OFED 1.3 release: @@ -23,7 +78,7 @@ v1,v2 - long delay during dat_ia_open when DNS not configured v1,v2 - use rdma_read_in/out from ep_attr per consumer instead of HCA max - NEW SINCE OFED 1.2 + NEW SINCE OFED 1.2 * New Features 1. Add v2.0 library support for new 2.0 API Specification @@ -62,10 +117,10 @@ - dtest: typo in memset - BUILD: v1 and v2 uDAPL source install/build instructions (redhat example): + BUILD: v1 and v2 uDAPL source install/build instructions (redhat example): - # cd to distribution SRPMS directory - cd /tmp/OFED-1.3/SRPMS + # cd to distribution SRPMS directory + cd /tmp/OFED-1.3/SRPMS rpm -i dapl-1.2*.rpm rpm -i dapl-2.0*.rpm cd /usr/src/redhat/SOURCES @@ -110,7 +165,7 @@ DAPL_DBG_TYPE_CNTR = 0x1000 - NEW SINCE Gamma 3.2 and OFED 1.1 + NEW SINCE Gamma 3.2 and OFED 1.1 * New Features From panda at cse.ohio-state.edu Fri Nov 14 19:56:36 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri, 14 Nov 2008 22:56:36 -0500 (EST) Subject: [ofa-general] Announcing the release of MVAPICH 1.1 Message-ID: The MVAPICH team is pleased to announce the availability of MVAPICH-1.1 with the following NEW features: - New Features for OpenFabrics Gen2-IB Interface - eXtended Reliable Connection (XRC) support - Lock-free design to provide support for asynchronous progress at both sender and receiver to overlap computation and communication - Optimized MPI_allgather collective - Efficient intra-node shared memory communication support for diskless clusters - Enhanced Totalview Support with the new mpirun_rsh framework - New OpenFabrics Gen2-Hybrid Interface - Replaces the Gen2-UD interface of MVAPICH 1.0 series - Targeted for large-scale IB clusters (multi-thousand cores) to provide highest performance and minimal memory usage - Support for UD, RC and XRC transports - Adaptive selection during run-time (based on application and systems characteristics) to switch between RC and UD (or between XRC and UD) transports - Delivers performance and scalability with near constant memory footprint for communication contexts - Zero-copy protocol with UD for large data transfer - Multiple buffer organizations with XRC support - Shared memory communication between cores within a node - Efficient intra-node shared memory communication support for diskless clusters - Multi-core optimized collectives (MPI_Bcast, MPI_Barrier, MPI_Reduce and MPI_Allreduce) - Optimized MPI_Allgather collective - Enhanced Totalview Support with the new mpirun_rsh framework - New Features for MVAPICH-InfiniPath (QLogic) Interface - Enhanced Totalview Support with the new mpirun_rsh framework - New Features for Shared-Memory only Interface - Enhanced Totalview Support with the new mpirun_rsh framework More details on all features and supported platforms can be obtained by visiting the following URL: http://mvapich.cse.ohio-state.edu/overview/mvapich/features.shtml MVAPICH 1.1 is being made available with OFED 1.4. It is also tested with OFED 1.3. It continues to deliver excellent performance. Sample performance numbers include: OpenFabrics/Gen2-IB on EM64T quad-core with PCIe2 and ConnectX-QDR: - 1.17 microsec one-way latency (4 bytes) - 2569 MB/sec unidirectional bandwidth - 5025 MB/sec bidirectional bandwidth OpenFabrics/Gen2-Hybrid on EM64T quad-core with PCIe2 and ConnectX-QDR: - 1.18 microsec one-way latency (4 bytes) - 2571 MB/sec unidirectional bandwidth - 5027 MB/sec bidirectional bandwidth OpenFabrics/Gen2-IB on Opteron quad-core with PCIe and ConnectX-DDR: - 1.62 microsec one-way latency (4 bytes) - 1628 MB/sec unidirectional bandwidth - 2889 MB/sec bidirectional bandwidth InfiniPath on EM64T quad-core with PCIe2 and QLogic-DDR: - 1.28 microsec one-way latency (4 bytes) - 1953 MB/sec unidirectional bandwidth Performance numbers for several other platforms, system configurations and operations can be viewed by visiting `Performance' section of the project's web page. For downloading MVAPICH 1.1 package and accessing the anonymous SVN, please visit the following URL: http://mvapich.cse.ohio-state.edu/ All feedbacks, including bug reports, hints for performance tuning, patches and enhancements are welcome. Please post it to the mvapich-discuss mailing list. Thanks, The MVAPICH Team From hal.rosenstock at gmail.com Sat Nov 15 02:34:38 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Sat, 15 Nov 2008 05:34:38 -0500 Subject: [ofa-general] Re: rate assignment for path queries In-Reply-To: References: <20081113131703.GV27271@sashak.voltaire.com> <491C2EFE.4060900@voltaire.com> Message-ID: On Fri, Nov 14, 2008 at 4:39 PM, Hal Rosenstock wrote: > On Thu, Nov 13, 2008 at 8:43 AM, Or Gerlitz wrote: >> Sasha Khapyorsky wrote: >>>> >>>> RED=0x8001, ipoib, sl=1, rate=2, defmember=full : ALL=full; >>>> >>>> When a path query is issued, Indeed sl=1 is returned but I see that a >>>> rate=6 (20Gbs) is returned where I configured rate=2 (2.5 Gbs). >>> >>> For my best knowledge rate=2 in partition config file will be related to >>> corresponded IPoIB multicast group for this partition, and not to >>> PathRecord. In PathRecord you get maximum available rate on the requested >>> path. >> >> I understand your comment about the relation to multicast join and not path >> queries. However, currently, where there's no rule in the qos-config file >> (or no file) that matches the path query, the SM does provide the SL >> assigned to the partition (specified in the query) through the pkey file but >> it doesn't do so for the Rate. So you say that for QoS = >> assignment one should use the qos-policy file, let it be. > > I think Sasha is not saying "should use qos-policy file". You're > asking about the pre QoS annex Qos implementation in OpenSM and I > think this could be viewed as an omission (bug/feature). I think it > could easily be changed in SA PR/MPR support. It's the current semantics of rate just applying to the multicast group in the partition policy file as Sasha pointed out. Unicast traffic would be disadvantaged if using that rate. So if this were to be done, it would need another flag for these semantics there. Is this needed ? -- Hal > -- Hal > >> Or. >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > From vlad at lists.openfabrics.org Sat Nov 15 03:17:00 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sat, 15 Nov 2008 03:17:00 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081115-0200 daily build status Message-ID: <20081115111700.670B4E601B7@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From james_ at catbus.co.uk Sat Nov 15 02:36:35 2008 From: james_ at catbus.co.uk (James Beal) Date: Sat, 15 Nov 2008 10:36:35 +0000 Subject: [ofa-general] srp_daemon and partitions. Message-ID: <774A4005-446E-40D1-A70E-DBCBF12219F0@catbus.co.uk> We are currently investigating infiniband and we are so far very impressed with the ease of use of the OFED stack. However we seem to have run into an issue with the srp disc discovery. We wish to protect the storage from unwanted use. In a fibre channel san environment this would be done in two ways, firstly presentation ( configuring the controller as to which luns each WWN can access ) and secondly zoning which is configuring the switches that make the fabric as to which ports can communicate. If we can't do this it would restrict IB to a single use eg as a replacement for fibre switches. I can't see how to specify to either srp_daemon or ibsrpdm which pkey to use when discovering discs and a quick look at the source code doesn't inspire confidence as I can see pkey=ffff as a string in the code. I did try the following: One host with one adapter communicating with DDN controller, with no access control ( pkeys ) The correct lun information was discovered. root at isg-dev6:~# ibsrpdm -c id_ext = 50001ff3000501f0 ,ioc_guid = 50001ff3000501f0 ,dgid =fe8000000000000050001ff4000501f0,pkey=ffff,service_id=f0010500f31f0050 Access control was reasserted, and can be seen as the lun can no longer be discovered. root at isg-dev6:~# ibsrpdm -c The device was created by "hand" with the pkey set to the correct value echo "id_ext = 50001ff3000501f0 ,ioc_guid = 50001ff3000501f0 ,dgid = fe8000000000000050001ff4000501f0 ,pkey=1001,service_id=f0010500f31f0050" > /sys/class/infiniband_srp/ srp-mthca0-1/add_target And the device can be seen. multipath -ll 360001ff001f0dbac01000800000a6a6cdm-0 DDN ,S2A 9900 [size=5.2T][features=0][hwhandler=0] \_ round-robin 0 [prio=1][enabled] \_ 5:0:0:1 sdb 8:16 [active][ready] So the issue appears to be with ibsrpdm/srp_daemon not allowing the pkey to be set The following message suggests the same. user_mad: process ibsrpdm did not enable P_Key index support. user_mad: Documentation/infiniband/user_mad.txt has info on the new ABI. From ogerlitz at voltaire.com Sat Nov 15 22:17:42 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 16 Nov 2008 08:17:42 +0200 Subject: [ofa-general] Re: rate assignment for path queries In-Reply-To: References: <20081113131703.GV27271@sashak.voltaire.com> <491C2EFE.4060900@voltaire.com> Message-ID: <491FBB06.3050704@voltaire.com> Hal Rosenstock wrote: > It's the current semantics of rate just applying to the multicast group in the partition policy file as Sasha pointed out. Unicast traffic would be disadvantaged if using that rate. So if this were to be done, it would need another flag for these semantics there. > Is this needed ? > At this point of time, I don't see any need for a change here. Or. From amirv at mellanox.co.il Sat Nov 15 23:58:48 2008 From: amirv at mellanox.co.il (Amir Vadai) Date: Sun, 16 Nov 2008 09:58:48 +0200 Subject: [ofa-general] Re: OFED 1.4 bugs status and OFED meetings In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EAD0FE73D@mtlexch01.mtl.com> References: <5D49E7A8952DC44FB38C38FA0D758EAD0FE73D@mtlexch01.mtl.com> Message-ID: <491FD2B8.4060301@mellanox.co.il> BUG1279 and BUG1331 are very minor bugs and won't be fixed for the release. Tziporet Koren wrote: > > Hi, > > This is the bugs status > > Bug owners - please update bugs status (I think I saw some commits so > maybe some of them are already fixed) and see if they are really > critical for the release > > 1323 blo stefan.roscher at de.ibm.com IB/ehca: > possibility of kernel panic under certain circu... > > 1242 cri yannick.cote at qlogic.com kernel panic while > running mpi2007 against ofed1.4 -- ib_... > > 1289 maj amirv at mellanox.co.il Ib and ipoib doesnt > respond while running multiple tests ... > > 1349 maj amirv at mellanox.co.il Kernel panic on sdp > > 1379 maj vu at mellanox.com Cannot unload ib_srpt > module on SRP target > > 1377 maj vu at mellanox.com Deadlock occurred > during HA test > > 1380 maj vu at mellanox.com Cannot unload ib_srpt > module on SRP target > > 1279 min amirv at mellanox.co.il ltp_sdp connect > "already connected successful" very slow > > 1331 min amirv at mellanox.co.il SDP connect to 0.0.0.0 > doesn't work > > I don't think we need a meeting on Monday (I personally will not be > able to attend) > > If we only have bugs in SDP and SRP we should go ahead and build RC5 > on Monday > > Reminder to all - please send release notes > > Tziporet > From vlad at dev.mellanox.co.il Sun Nov 16 02:01:20 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Sun, 16 Nov 2008 12:01:20 +0200 Subject: [ofa-general] [PATCH] uDAPL release notes updated for OFED 1.4 In-Reply-To: <000a01c946bd$e295c840$4797070a@amr.corp.intel.com> References: <000a01c946bd$e295c840$4797070a@amr.corp.intel.com> Message-ID: <491FEF70.4060601@dev.mellanox.co.il> Arlin Davis wrote: > uDAPL_release_notes.txt updated for OFED 1.4 > > Signed-off-by: Arlin Davis > > Tziporet, please pull into OFED 1.4. Thanks! > Applied, Regards, Vladimir From dorfman.eli at gmail.com Sun Nov 16 02:58:46 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Sun, 16 Nov 2008 12:58:46 +0200 Subject: ***SPAM*** [ofa-general] [PATCH] opensm/osm_mcast_tbl.c wrong max mcast lid cause the sm to set invalid MFT block. In-Reply-To: References: <491C4CFA.8000006@gmail.com> Message-ID: <491FFCE6.1070309@gmail.com> Hal Rosenstock wrote: > Hi Eli, > > On Thu, Nov 13, 2008 at 10:51 AM, Eli Dorfman wrote: >> wrong max mcast lid cause the sm to set invalid MFT block. >> when mcmember tries to set mcast lid beyond mcast capability (e.g. 0xc400), >> the sm accepts this and tries to set invalid block. > > Good find (and nice test case). > > Do the switch SMA's reject those invalid sets ? I'm hoping that's the case. yes it is rejected as invalid. > > See below for minor question on the patch. > >> Signed-off-by: Eli Dorfman >> >> --- >> opensm/opensm/osm_mcast_tbl.c | 6 +++--- >> opensm/opensm/osm_sa_mcmember_record.c | 2 +- >> 2 files changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/opensm/opensm/osm_mcast_tbl.c b/opensm/opensm/osm_mcast_tbl.c >> index 92fbb63..17fb69c 100644 >> --- a/opensm/opensm/osm_mcast_tbl.c >> +++ b/opensm/opensm/osm_mcast_tbl.c >> @@ -81,7 +81,7 @@ osm_mcast_tbl_init(IN osm_mcast_tbl_t * const p_tbl, >> IB_MCAST_BLOCK_SIZE) / >> IB_MCAST_BLOCK_SIZE) - 1); >> >> - p_tbl->max_mlid_ho = (uint16_t) (IB_LID_MCAST_START_HO + capacity); >> + p_tbl->max_mlid_ho = (uint16_t) (IB_LID_MCAST_START_HO + capacity - 1); >> >> /* >> The number of bytes needed in the mask table is: >> @@ -216,7 +216,7 @@ osm_mcast_tbl_set_block(IN osm_mcast_tbl_t * const p_tbl, >> >> mlid_start_ho = (uint16_t) (block_num * IB_MCAST_BLOCK_SIZE); >> >> - if (mlid_start_ho + IB_MCAST_BLOCK_SIZE > p_tbl->max_mlid_ho) >> + if (mlid_start_ho + IB_MCAST_BLOCK_SIZE - 1 > p_tbl->max_mlid_ho) >> return (IB_INVALID_PARAMETER); >> >> for (i = 0; i < IB_MCAST_BLOCK_SIZE; i++) >> @@ -274,7 +274,7 @@ osm_mcast_tbl_get_block(IN osm_mcast_tbl_t * const p_tbl, >> >> mlid_start_ho = (uint16_t) (block_num * IB_MCAST_BLOCK_SIZE); >> >> - if (mlid_start_ho + IB_MCAST_BLOCK_SIZE > p_tbl->max_mlid_ho) >> + if (mlid_start_ho + IB_MCAST_BLOCK_SIZE - 1 > p_tbl->max_mlid_ho) >> return (IB_INVALID_PARAMETER); >> >> for (i = 0; i < IB_MCAST_BLOCK_SIZE; i++) >> diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c >> index 5dd286a..6007b06 100644 >> --- a/opensm/opensm/osm_sa_mcmember_record.c >> +++ b/opensm/opensm/osm_sa_mcmember_record.c >> @@ -846,7 +846,7 @@ osm_mcmr_rcv_create_new_mgrp(IN osm_sa_t * sa, >> mlid = __get_new_mlid(sa, mcm_rec.mlid); >> if (mlid == 0) { >> OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1B19: " >> - "__get_new_mlid failed\n"); >> + "__get_new_mlid failed request mlid 0x%04x\n", mcm_rec.mlid); > > ^^^^^^^^^^^^^^^^ > Should this be cl_ntoh16(mcm_rec.mlid) ? yes, i'll fix the patch. Thanks, Eli From dorfman.eli at gmail.com Sun Nov 16 03:06:17 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Sun, 16 Nov 2008 13:06:17 +0200 Subject: ***SPAM*** Re: [ofa-general] [PATCH v2] opensm/osm_mcast_tbl.c wrong max mcast lid cause the sm to set invalid MFT block. In-Reply-To: References: <491C4CFA.8000006@gmail.com> Message-ID: <491FFEA9.2090500@gmail.com> wrong max mcast lid cause the sm to set invalid MFT block. when mcmember tries to set mcast lid beyond mcast capability (e.g. 0xc400), the sm accepts this and tries to set invalid block. Signed-off-by: Eli Dorfman --- opensm/opensm/osm_mcast_tbl.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/opensm/opensm/osm_mcast_tbl.c b/opensm/opensm/osm_mcast_tbl.c index 92fbb63..17fb69c 100644 --- a/opensm/opensm/osm_mcast_tbl.c +++ b/opensm/opensm/osm_mcast_tbl.c @@ -81,7 +81,7 @@ osm_mcast_tbl_init(IN osm_mcast_tbl_t * const p_tbl, IB_MCAST_BLOCK_SIZE) / IB_MCAST_BLOCK_SIZE) - 1); - p_tbl->max_mlid_ho = (uint16_t) (IB_LID_MCAST_START_HO + capacity); + p_tbl->max_mlid_ho = (uint16_t) (IB_LID_MCAST_START_HO + capacity - 1); /* The number of bytes needed in the mask table is: @@ -216,7 +216,7 @@ osm_mcast_tbl_set_block(IN osm_mcast_tbl_t * const p_tbl, mlid_start_ho = (uint16_t) (block_num * IB_MCAST_BLOCK_SIZE); - if (mlid_start_ho + IB_MCAST_BLOCK_SIZE > p_tbl->max_mlid_ho) + if (mlid_start_ho + IB_MCAST_BLOCK_SIZE - 1 > p_tbl->max_mlid_ho) return (IB_INVALID_PARAMETER); for (i = 0; i < IB_MCAST_BLOCK_SIZE; i++) @@ -274,7 +274,7 @@ osm_mcast_tbl_get_block(IN osm_mcast_tbl_t * const p_tbl, mlid_start_ho = (uint16_t) (block_num * IB_MCAST_BLOCK_SIZE); - if (mlid_start_ho + IB_MCAST_BLOCK_SIZE > p_tbl->max_mlid_ho) + if (mlid_start_ho + IB_MCAST_BLOCK_SIZE - 1 > p_tbl->max_mlid_ho) return (IB_INVALID_PARAMETER); for (i = 0; i < IB_MCAST_BLOCK_SIZE; i++) -- 1.5.5 From dorfman.eli at gmail.com Sun Nov 16 03:08:04 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Sun, 16 Nov 2008 13:08:04 +0200 Subject: [ofa-general] ***SPAM*** [PATCH] opensm/osm_sa_mcmember_record.c print multicast lid in error message Message-ID: <491FFF14.6050006@gmail.com> print multicast lid in error message Signed-off-by: Eli Dorfman --- opensm/opensm/osm_sa_mcmember_record.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c index 5dd286a..4e77f06 100644 --- a/opensm/opensm/osm_sa_mcmember_record.c +++ b/opensm/opensm/osm_sa_mcmember_record.c @@ -846,7 +846,7 @@ osm_mcmr_rcv_create_new_mgrp(IN osm_sa_t * sa, mlid = __get_new_mlid(sa, mcm_rec.mlid); if (mlid == 0) { OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1B19: " - "__get_new_mlid failed\n"); + "__get_new_mlid failed request mlid 0x%04x\n", cl_ntoh16(mcm_rec.mlid)); status = IB_SA_MAD_STATUS_NO_RESOURCES; goto Exit; } -- 1.5.5 From vlad at lists.openfabrics.org Sun Nov 16 03:19:53 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sun, 16 Nov 2008 03:19:53 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081116-0200 daily build status Message-ID: <20081116111953.C7012E608DC@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From sashak at voltaire.com Sun Nov 16 04:16:25 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 16 Nov 2008 14:16:25 +0200 Subject: [ofa-general] [PATCH v2] opensm/osm_mcast_tbl.c wrong max mcast lid cause the sm to set invalid MFT block. In-Reply-To: <491FFEA9.2090500@gmail.com> References: <491C4CFA.8000006@gmail.com> <491FFEA9.2090500@gmail.com> Message-ID: <20081116121625.GA12418@sashak.voltaire.com> On 13:06 Sun 16 Nov , Eli Dorfman wrote: > wrong max mcast lid cause the sm to set invalid MFT block. > when mcmember tries to set mcast lid beyond mcast capability (e.g. 0xc400), > the sm accepts this and tries to set invalid block. > > Signed-off-by: Eli Dorfman Applied. Thanks. Sasha From sashak at voltaire.com Sun Nov 16 04:17:01 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 16 Nov 2008 14:17:01 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_sa_mcmember_record.c print multicast lid in error message In-Reply-To: <491FFF14.6050006@gmail.com> References: <491FFF14.6050006@gmail.com> Message-ID: <20081116121701.GB12418@sashak.voltaire.com> On 13:08 Sun 16 Nov , Eli Dorfman wrote: > print multicast lid in error message > > Signed-off-by: Eli Dorfman Applied. Thanks. Sasha From sashak at voltaire.com Sun Nov 16 04:19:37 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 16 Nov 2008 14:19:37 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_lid_mgr.c: ignore and overwrite guid2lid (windows) In-Reply-To: <491CA8AB.1010801@dev.mellanox.co.il> References: <491CA8AB.1010801@dev.mellanox.co.il> Message-ID: <20081116121937.GC12418@sashak.voltaire.com> On 00:22 Fri 14 Nov , Yevgeny Kliteynik wrote: > Hi Sasha, > > When Windows is crashing with BSOD, it might corrupt files that were > previously opened for writing, even if the files are closed. As a result, > we might see corrupted guid2lid file, and OpenSM will exit on such error. > This patch makes SM ignore (and later overwrite) corrupted guid2lid files. > > The patch has already been accepted into ofw. > > I'm posting it to openib too, so that when some day WinSM will be > synchronized with OpenSM, this fix won't be lost. > > Signed-off-by: Yevgeny Kliteynik Applied. Thanks. Sasha From sashak at voltaire.com Sun Nov 16 04:24:48 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 16 Nov 2008 14:24:48 +0200 Subject: [ofa-general] Re: [PATCH] Fix max parameter passed to umad_get_cas_names In-Reply-To: <20081114145406.57dff1a7.weiny2@llnl.gov> References: <20081114145406.57dff1a7.weiny2@llnl.gov> Message-ID: <20081116122448.GD12418@sashak.voltaire.com> On 14:54 Fri 14 Nov , Ira Weiny wrote: > From a9149f4e38081d206d0be0af2194f4e09f944f21 Mon Sep 17 00:00:00 2001 > From: Ira Weiny > Date: Fri, 14 Nov 2008 11:36:01 -0800 > Subject: [PATCH] Fix max parameter passed to umad_get_cas_names > > > Signed-off-by: Ira Weiny Applied. Thanks. Sasha From sashak at voltaire.com Sun Nov 16 04:37:00 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 16 Nov 2008 14:37:00 +0200 Subject: [ofa-general] Re: [opensm patch][1/2] fix qos config parsing bugs In-Reply-To: <1226596802.7156.41.camel@cardanus.llnl.gov> References: <1225404078.1197.533.camel@cardanus.llnl.gov> <20081111191958.GA8894@sashak.voltaire.com> <1226447872.6239.2.camel@cardanus.llnl.gov> <20081113002403.GI27271@sashak.voltaire.com> <1226596802.7156.41.camel@cardanus.llnl.gov> Message-ID: <20081116123700.GF12418@sashak.voltaire.com> On 09:20 Thu 13 Nov , Al Chu wrote: > Hey Sasha, > > On Thu, 2008-11-13 at 02:24 +0200, Sasha Khapyorsky wrote: > > Hi Al, > > > > On 15:57 Tue 11 Nov , Al Chu wrote: > > > > > > Sorry, I may have not explained it well. Lets say I do this in the > > > config file. > > > > > > qos_vlarb_high FOOBAR > > > # qos_ca_vlarb_high BLAH > > > qos_swe_vlarb_high XYZZY > > > > > > I currently expect qos_ca_vlarb_high to use the value of FOOBAR because > > > I commented out the field. But it uses OSM_DEFAULT_QOS_HIGH_LIMIT > > > instead. The reason is because qos_build_config() checks for NULL to > > > use default vs. non-default values. > > > > > > p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; > > > > > > Under the above situation where I've commented out veral fields, opt- > > > >vlarb_high is always non-NULL b/c it was set to > > > OSM_DEFAULT_QOS_HIGH_LIMIT. Thus OSM_DEFAULT_QOS_HIGH_LIMIT is used > > > instead of FOOBAR. > > > > > > > > 2) > > > > > > > > > > In qos_build_config() we load the high_limit like this: > > > > > > > > > > cfg->vl_high_limit = (uint8_t) opt->high_limit; > > > > > > > > > > So there is no way to tell the qos_ca, qos_swe, qos_rtr, etc. high_limit > > > > > options to "go back to" the default high_limit. It just assumes that > > > > > whatever is input (or was set by default) is what you should use. > > > > > > > > Right. What is a limitation here? That an user cannot set this to > > > > "no value"? But she/he can just skip it. > > > > > > Similar to the above issue, lets say I want to do: > > > > > > qos_high_limit 8 > > > # qos_ca_high_limit 15 > > > # qos_swe_high_limit 15 > > > > > > I want qos_ca_high_limit and qos_swe_high_limit to use whatever I set in > > > qos_high_limit. But the code doesn't allow for this. > > > > > > > > > > > > 3) > > > > > > > > > > Some fields like qos_vlarb_high are assumed to be correctly set and can > > > > > segfault opensm. > > > > > > > > qos_build_config() assumes that valid parameters are used. And we are > > > > using this this way (I hope :)) (finally it is not library API). > > > > > > I think the issue is the osm_subnet.c code did not properly check all > > > inputs, and subsequently some inputs used in qos_build_config() were > > > bad. I think > > > > > > qos_vlarb_high (null) > > > > > > was something I tried that opensm seg-faulted on. > > > > Ok. I see now. > > > > Probably it will be simpler just to generate a valid qos parameter sets > > right after parser (in verification time)? > > Ahh, I see what you did. It's much cleaner this way. > > > Like in your modified (and > > rebased against recent patches) patch below? > > Patch looks good to me. Applied. Thakns. Sasha From sashak at voltaire.com Sun Nov 16 04:41:15 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 16 Nov 2008 14:41:15 +0200 Subject: [ofa-general] Re: [PATCH V2] Add check for previous versions of plugins. In-Reply-To: <20081114142848.75c64c94.weiny2@llnl.gov> References: <20081104095812.2ff5920c.weiny2@llnl.gov> <20081109174733.GA30265@sashak.voltaire.com> <20081114142848.75c64c94.weiny2@llnl.gov> Message-ID: <20081116124115.GG12418@sashak.voltaire.com> On 14:28 Fri 14 Nov , Ira Weiny wrote: > I believe this will work. I incorporated your patch but I made this explicit > so it will hopefully be clear what is going on. > > Ira > > > From 061822466a157bb425600ee0b63cc80ff038d615 Mon Sep 17 00:00:00 2001 > From: Ira Weiny > Date: Mon, 3 Nov 2008 15:50:15 -0800 > Subject: [PATCH] Add check for previous versions of plugins. > > If old interface plugins are available to OpenSM they will cause a crash. > Check for this old version and error out gracefully. > > Signed-off-by: Ira Weiny Applied. Thanks. Sasha From sashak at voltaire.com Sun Nov 16 06:40:29 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 16 Nov 2008 16:40:29 +0200 Subject: [ofa-general] opensm: bad multicast forwarding table entries In-Reply-To: <20081114194317.GM25248@sgi.com> References: <20081112221846.GE25248@sgi.com> <20081114194317.GM25248@sgi.com> Message-ID: <20081116144029.GD6183@sashak.voltaire.com> On 11:43 Fri 14 Nov , akepner at sgi.com wrote: > > But this is something that we've seen on quite a few occasions, > so we'll keep looking for it, and grab what debug info we can > when it crops up again. Thanks! Sasha From sashak at voltaire.com Sun Nov 16 06:42:24 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 16 Nov 2008 16:42:24 +0200 Subject: [ofa-general] Re: rate assignment for path queries In-Reply-To: References: <20081113131703.GV27271@sashak.voltaire.com> Message-ID: <20081116144224.GE6183@sashak.voltaire.com> On 16:35 Fri 14 Nov , Hal Rosenstock wrote: > On Thu, Nov 13, 2008 at 8:17 AM, Sasha Khapyorsky wrote: > > Hi Or, > > > > On 09:20 Thu 13 Nov , Or Gerlitz wrote: > >> > >> If opensm doesn't have a match on any qos-assignment rule (eg when there's > >> no qos-config file), when coming to serve sa path query, my understanding > >> is that the "qos related fields" of the partition would be used. > >> > >> For example, I have set the following partition config file which assigns > >> to the 0x8001 partition, and run without any qos file. > >> > >> Default=0x7fff,ipoib : ALL=full; > >> RED=0x8001, ipoib, sl=1, rate=2, defmember=full : ALL=full; > >> RED=0x8002, ipoib, sl=2, rate=3, defmember=full : ALL=full; > >> > >> When a path query is issued, Indeed sl=1 is returned but I see that a > >> rate=6 (20Gbs) is returned where I configured rate=2 (2.5 Gbs). > > > > For my best knowledge rate=2 in partition config file will be related to > > corresponded IPoIB multicast group for this partition, and not to > > PathRecord. > > There is a form of PR query that supports returning information on > MGIDs when used as a DGID. > > > In PathRecord you get maximum available rate on the > > requested path. > > Here you are talking about current OpenSM implementation. Yes. Sasha From constantine.gavrilov at gmail.com Sun Nov 16 07:34:32 2008 From: constantine.gavrilov at gmail.com (Constantine Gavrilov) Date: Sun, 16 Nov 2008 17:34:32 +0200 Subject: [ofa-general] SDP Fixes Message-ID: <49203D88.7020103@gmail.com> While playing with SDP code in OFED 1.3.1 (latest stable), I have encountered a number of bugs in the zero-copy send code: * sdp_bz_setup() code does not handle the case of kernel data segment correctly (kernel sockets) * sdp_bz_setup() does not pass ENOMEM, EFAULT or other errors to sendmsg(). In fact, a negative possible return from get_user_pages() is nor handled. * the deallocation of bz descriptor in sendmsg() is not handled properly -- it is allocated many times, but freed once. * sdp_bzcopy_get() code does not raise reference count for all pages in the bz descriptor (only the "partial" pages will get the count raised). However, the send completion code will call put_page() on all entries, leading to a crash for page-aligned transfers. Attached, please find a patch that solves these problems. With this patch, I can use SDP and send page-aligned kernel buffers even for zero-copy case. Still, I do not see any performance benefit when using the zero-copy method. I have tried various thresholds (32K, 64K, 128K), and zero-copy was always slower. It seems that the penalty of memcpy() is negligible compared to the penalty of reconfiguring the card to use different addresses. Also, looking at the sndmsg() code, I can say that allocation and deallocation of bz descriptor for each iov element is not optimal. Instead, an existing bz descriptor can be re-used if it fits. -- ---------------------------------------- Constantine Gavrilov Kernel Developer Platform Group XIV, an IBM global brand 1 Azrieli Center, Tel-Aviv Phone: +972-3-6074672 Fax: +972-3-6959749 ---------------------------------------- -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sdp_patch.diff.txt URL: From halr at obsidianresearch.com Sun Nov 16 10:37:28 2008 From: halr at obsidianresearch.com (Hal Rosenstock) Date: Sun, 16 Nov 2008 11:37:28 -0700 Subject: [ofa-general] [PATCH 1/2] libibumad: Add UMAD_MAX_DEVICES define Message-ID: <49206868.5040303@obsidianresearch.com> Sasha, Following Ira's ibstat patch... -- Hal -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch-umad-maxdevices1 URL: From halr at obsidianresearch.com Sun Nov 16 10:37:31 2008 From: halr at obsidianresearch.com (Hal Rosenstock) Date: Sun, 16 Nov 2008 11:37:31 -0700 Subject: [ofa-general] [PATCH 2/2] infiniband-diags/ibstat.c: Use UMAD_MAX_DEVICES define Message-ID: <4920686B.3010804@obsidianresearch.com> Sasha, Please see attached patch. -- Hal -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch-ibstat-maxdevices1 URL: From halr at obsidianresearch.com Sun Nov 16 10:37:34 2008 From: halr at obsidianresearch.com (Hal Rosenstock) Date: Sun, 16 Nov 2008 11:37:34 -0700 Subject: [ofa-general] [PATCH][TRIVIAL] opensm/osm_trap_rcv.c: Fix typo Message-ID: <4920686E.9070209@obsidianresearch.com> Sasha, Please see attached patch. -- Hal -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch-osmtrap1 URL: From john.donners at sara.nl Mon Nov 17 02:02:08 2008 From: john.donners at sara.nl (John Donners) Date: Mon, 17 Nov 2008 11:02:08 +0100 Subject: [ofa-general] [Fwd: EHCA_ERR:ehcau_modify_qp ibv_cmd_modify_qp() failed ret=22] Message-ID: <49214120.2090400@sara.nl> Dear all, I work for the support team at the SARA supercomputing center in Amsterdam. We are debugging an application that uses OpenIB directly. Soon after startup the application fails and the error message is: PID5c2b ehca0 EHCA_ERR:ehcau_modify_qp ibv_cmd_modify_qp() failed ret=22 qp=0x2aa3f880 qp_num=1c8f Last System Error Message from Task 0:: Invalid argument PID10e7 ehca0 EHCA_ERR:ehcau_modify_qp ibv_cmd_modify_qp() failed ret=22 qp=0x2aa3f070 qp_num=174c Last System Error Message from Task 32:: Invalid argument ERROR: 0031-250 task 35: Terminated ERROR: 0031-250 task 14: Terminated To be honest, I don't know the code and I haven't used ibverbs myself before either, but maybe you could shed some light on what this means and what we can do about it. We have a Power6 system with Infiniband running Suse Linux Enterprise Server 10. The system installation includes OFED-1.3 and libibverbs-1.1.1. With regards, John -- John Donners tel (31)20 5923055 SARA, Kruislaan 415 fax (31)20 6683167 1098 SJ Amsterdam john.donners at sara.nl The Netherlands From ogerlitz at voltaire.com Mon Nov 17 03:17:29 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 17 Nov 2008 13:17:29 +0200 Subject: [ofa-general] [PATCH] perftest: don't attach the sender QP In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EADF6284B@mtlexch01.mtl.com> References: <5D49E7A8952DC44FB38C38FA0D758EADF6284B@mtlexch01.mtl.com> Message-ID: <492152C9.8010509@voltaire.com> Oren Meron wrote: > What about the send_lat test? latency tests are typically based on RTT/2 measures which means that both sides do send and receive.... if this apply to the send_lat test, please don't apply the same patch over there. As for the bandwidth test, my patch I think has a defect that makes the client not to join also for the bidirectional test, where in that case it needs to, sorry. Also, I don't think to ever managing to see the server side statistics printed, this means that I could only see the sender bandwidth which is not necessarily the receiver bandwidth, the importance of seeing it BTW applies also the the unicast UD tests. One nice enhancement which you might want to look at would be to have a some sort of MGID supplied from the command line and attach to this MGID instead of the current implementation. This would allow to have > 1 receivers Or. From vlad at lists.openfabrics.org Mon Nov 17 03:36:17 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Mon, 17 Nov 2008 03:36:17 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081117-0200 daily build status Message-ID: <20081117113617.AAB33E608E5@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From kliteyn at dev.mellanox.co.il Mon Nov 17 04:56:28 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 17 Nov 2008 14:56:28 +0200 Subject: [ofa-general] [PATCH] opensm/osm_sa_mcmember_record.c: bad return state when leaving mcast Message-ID: <492169FC.7040609@dev.mellanox.co.il> Hi Sasha, Re-fixing our recent fix in handling multicast leave. When updating the state will cause port removal, port object will be freed, so bad things will happen if we try using it's state. Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_sa_mcmember_record.c | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c index 4e77f06..99aee1b 100644 --- a/opensm/opensm/osm_sa_mcmember_record.c +++ b/opensm/opensm/osm_sa_mcmember_record.c @@ -1085,10 +1085,14 @@ __osm_mcmr_rcv_leave_mgrp(IN osm_sa_t * sa, goto Exit; } + /* store state - we'll need it if the port is removed */ + mcmember_rec.scope_state = p_mcm_port->scope_state; + /* remove port or update join state */ removed = osm_mgrp_remove_port(sa->p_subn, sa->p_log, p_mgrp, p_mcm_port, p_recvd_mcmember_rec->scope_state&0x0F); - mcmember_rec.scope_state = p_mcm_port->scope_state; + if (!removed) + mcmember_rec.scope_state = p_mcm_port->scope_state; CL_PLOCK_RELEASE(sa->p_lock); -- 1.5.1.4 From kliteyn at dev.mellanox.co.il Mon Nov 17 04:58:10 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 17 Nov 2008 14:58:10 +0200 Subject: [ofa-general] [PATCH] opensm/osmtest: fixing some comments in mcast flow of osmtest Message-ID: <49216A62.5010300@dev.mellanox.co.il> Some cosmetics - fixing comments in multicast flow. Signed-off-by: Yevgeny Kliteynik --- opensm/osmtest/osmt_multicast.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/opensm/osmtest/osmt_multicast.c b/opensm/osmtest/osmt_multicast.c index 57a8772..165457c 100644 --- a/opensm/osmtest/osmt_multicast.c +++ b/opensm/osmtest/osmt_multicast.c @@ -2138,7 +2138,7 @@ ib_api_status_t osmt_run_mcast_flow(IN osmtest_t * const p_osmt) comp_mask = IB_MCR_COMPMASK_GID | IB_MCR_COMPMASK_PORT_GID | IB_MCR_COMPMASK_QKEY | IB_MCR_COMPMASK_PKEY | IB_MCR_COMPMASK_SL | IB_MCR_COMPMASK_FLOW | IB_MCR_COMPMASK_JOIN_STATE | IB_MCR_COMPMASK_TCLASS | /* all above are required */ IB_MCR_COMPMASK_RATE_SEL | IB_MCR_COMPMASK_RATE; /* link-local scope, non member (so we should not be able to delete) */ - /* but the FullMember bit should be gone */ + /* but the NonMember bit should be gone */ mc_req_rec.scope_state = 0x22; status = osmt_send_mcast_request(p_osmt, 0, @@ -2155,7 +2155,7 @@ ib_api_status_t osmt_run_mcast_flow(IN osmtest_t * const p_osmt) OSM_LOG(&p_osmt->log, OSM_LOG_INFO, "Validating Join State removal of Non Member bit (o15.0.1.14)...\n"); - if (p_mc_res->scope_state != 0x25) { /* scope is MSB - now only the non member & send only member have left */ + if (p_mc_res->scope_state != 0x25) { /* scope is MSB - now only the full member & send only member have left */ OSM_LOG(&p_osmt->log, OSM_LOG_ERROR, "ERR 02CA: " "Validating JoinState update failed. Expected 0x25 got: 0x%02X\n", p_mc_res->scope_state); -- 1.5.1.4 From orenmeron at mellanox.co.il Mon Nov 17 02:30:14 2008 From: orenmeron at mellanox.co.il (Oren Meron) Date: Mon, 17 Nov 2008 12:30:14 +0200 Subject: [ofa-general] [PATCH] perftest: don't attach the sender QP In-Reply-To: <491A7BAC.5030708@voltaire.com> Message-ID: <5D49E7A8952DC44FB38C38FA0D758EADF6284B@mtlexch01.mtl.com> Hi Or, Sorry for the late response. Applied and committed to OFED-1.4. What about the send_lat test ? Thanks. Oren Meron Performance -----Original Message----- From: Or Gerlitz [mailto:ogerlitz at voltaire.com] Sent: Wednesday, November 12, 2008 8:46 AM To: Oren Meron Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] [PATCH] perftest: don't attach the sender QP Or Gerlitz wrote: > don't attach the sender QP to the MGID > Oren, Did you had the chance to look into this patch? Or. > Signed-off-by: Or Gerlitz > > Index: perftest-1.2/send_bw.c > =================================================================== > --- perftest-1.2.orig/send_bw.c > +++ perftest-1.2/send_bw.c > @@ -421,7 +421,7 @@ static struct pingpong_context *pp_init_ > return NULL; > } > > - if ((user_parm->connection_type==UD) && (user_parm->use_mcg)) { > + if ((user_parm->connection_type==UD) && (user_parm->use_mcg) && > +!user_parm->servername) { > union ibv_gid gid; > uint8_t mcg_gid[16] = MCG_GID; > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From hal.rosenstock at gmail.com Mon Nov 17 07:30:54 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 17 Nov 2008 10:30:54 -0500 Subject: [ofa-general] erroneous ibdiagnet subnet check warning Message-ID: Hi Oren, ibdiagnet (version 1.3.0rc14 source undefined) reports: -I--------------------------------------------------- -I- IPoIB Subnets Check -I--------------------------------------------------- -I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps SL:0x00 -W- Suboptimal rate for group. Lowest member rate:20Gbps > group-rate:10Gbps but there are SDR links internal to the subnet so this warning is erroneous and should be fixed for this configuration. It's not just the member rates that need checking to determine this. I've filed bug 1394 for this issue. Also, ofed_info shows OFED-1.4-rc3 so is version 1.3.0rc14 on ibdiagnet correct ? Thanks for your attention to this. -- Hal From hal.rosenstock at gmail.com Mon Nov 17 07:59:13 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 17 Nov 2008 10:59:13 -0500 Subject: [ofa-general] OpenSM handling of defunct SMs Message-ID: Sasha, What I observe is that OpenSM 3.2.2 continues to poll/retry SMInfo for a now defunct SM which spams the OpenSM log. It looks like SMs are removed from the sm_guid_tbl only when the port is dropped/removed. Shouldn't it also be removed subsequent to a trap 144 which is indicating that the capability mask changed (and the new capability no longer include IsSM) ? I don't see this anywhere in the code. Am I missing something ? If so, should osm_port_info_rcv.c:__osm_pi_rcv_process_endport remove these so rather than: p_sm_tbl = &sm->p_subn->sm_guid_tbl; p_sm = (osm_remote_sm_t *) cl_qmap_get(p_sm_tbl, port_guid); if (p_sm != (osm_remote_sm_t *) cl_qmap_end(p_sm_tbl)) /* clean it up */ p_sm->smi.pri_state = 0xF0 & p_sm->smi.pri_state; if (p_pi->capability_mask & IB_PORT_CAP_IS_SM) { it should be something like: p_sm_tbl = &sm->p_subn->sm_guid_tbl; if (p_pi->capability_mask & IB_PORT_CAP_IS_SM) { p_sm = (osm_remote_sm_t *) cl_qmap_get(p_sm_tbl, port_guid); if (p_sm != (osm_remote_sm_t *) cl_qmap_end(p_sm_tbl)) /* clean it up */ p_sm->smi.pri_state = 0xF0 & p_sm->smi.pri_state; ... } else p_sm = (osm_remote_sm_t *) cl_qmap_remove(p_sm_tbl, port_guid); -- Hal From sashak at voltaire.com Mon Nov 17 20:21:24 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 18 Nov 2008 06:21:24 +0200 Subject: [ofa-general] Re: [PATCH 1/2] libibumad: Add UMAD_MAX_DEVICES define In-Reply-To: <49206868.5040303@obsidianresearch.com> References: <49206868.5040303@obsidianresearch.com> Message-ID: <20081118042124.GB10251@sashak.voltaire.com> On 11:37 Sun 16 Nov , Hal Rosenstock wrote: > Sasha, > > Following Ira's ibstat patch... > > -- Hal > libibumad: Add UMAD_MAX_DEVICES define > > Signed-off-by: Hal Rosenstock Applied. Thanks. Sasha From sashak at voltaire.com Mon Nov 17 20:21:41 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 18 Nov 2008 06:21:41 +0200 Subject: [ofa-general] Re: [PATCH 2/2] infiniband-diags/ibstat.c: Use UMAD_MAX_DEVICES define In-Reply-To: <4920686B.3010804@obsidianresearch.com> References: <4920686B.3010804@obsidianresearch.com> Message-ID: <20081118042141.GC10251@sashak.voltaire.com> On 11:37 Sun 16 Nov , Hal Rosenstock wrote: > Sasha, > > Please see attached patch. > > -- Hal > infiniband-diags/ibstat.c: Use UMAD_MAX_DEVICES define > > Signed-off-by: Hal Rosenstock Applied. Thanks. Sasha From sashak at voltaire.com Mon Nov 17 20:22:08 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 18 Nov 2008 06:22:08 +0200 Subject: [ofa-general] Re: [PATCH][TRIVIAL] opensm/osm_trap_rcv.c: Fix typo In-Reply-To: <4920686E.9070209@obsidianresearch.com> References: <4920686E.9070209@obsidianresearch.com> Message-ID: <20081118042208.GD10251@sashak.voltaire.com> On 11:37 Sun 16 Nov , Hal Rosenstock wrote: > Sasha, > > Please see attached patch. > > -- Hal > > opensm/osm_trap_rcv.c: Fix typo > > Signed-off-by: Hal Rosenstock Applied. Thanks. Sasha From sashak at voltaire.com Mon Nov 17 20:22:40 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 18 Nov 2008 06:22:40 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_sa_mcmember_record.c: bad return state when leaving mcast In-Reply-To: <492169FC.7040609@dev.mellanox.co.il> References: <492169FC.7040609@dev.mellanox.co.il> Message-ID: <20081118042240.GE10251@sashak.voltaire.com> On 14:56 Mon 17 Nov , Yevgeny Kliteynik wrote: > Hi Sasha, > > Re-fixing our recent fix in handling multicast leave. > When updating the state will cause port removal, port > object will be freed, so bad things will happen if we > try using it's state. > > Signed-off-by: Yevgeny Kliteynik Applied. Thanks. Sasha From sashak at voltaire.com Mon Nov 17 20:23:14 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 18 Nov 2008 06:23:14 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osmtest: fixing some comments in mcast flow of osmtest In-Reply-To: <49216A62.5010300@dev.mellanox.co.il> References: <49216A62.5010300@dev.mellanox.co.il> Message-ID: <20081118042314.GF10251@sashak.voltaire.com> On 14:58 Mon 17 Nov , Yevgeny Kliteynik wrote: > Some cosmetics - fixing comments in multicast flow. > > Signed-off-by: Yevgeny Kliteynik Applied. Thanks. Sasha From sashak at voltaire.com Mon Nov 17 20:41:15 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 18 Nov 2008 06:41:15 +0200 Subject: [ofa-general] [ANNOUNCE] management tarballs release Message-ID: <20081118044115.GG10251@sashak.voltaire.com> Hi, There is a new release of the management (OpenSM and infiniband diagnostics) tarballs available in: http://www.openfabrics.org/downloads/management/ md5sum: 89a49b57015524bc3f6ca8667b640b2d libibumad-1.2.3.tar.gz bf172da0e70dc4ce6cc625fde8707d00 libibmad-1.2.3.tar.gz 93e14f69ce5004bfdef1009f84a53eb7 opensm-3.2.4.tar.gz 32665d7fb2fe2bf734118b8530d4bbbb infiniband-diags-1.4.3.tar.gz All component versions are from recent master branch. Full change log is below. Sasha Al Chu (4): opensm: fix manpage typos fix documentation typos opensm: verify config inputs when config file is rescanned fix qos config parsing bugs Albert Chu (1): support dump_conf console command Doron Shoham (3): install QoS_management_in_OpenSM.txt change log_max_size to MB export osm_log_max in MB Eli Dorfman (3): opensm/osm_sa_path_record.c print port guids in error message opensm/osm_mcast_tbl.c wrong max mcast lid cause the sm to set invalid MFT block. opensm/osm_sa_mcmember_record.c print multicast lid in error message Hal Rosenstock (4): OpenSM/osm_subnet.c: Fix log_max_size conversion to MB libibumad: Add UMAD_MAX_DEVICES define infiniband-diags/ibstat.c: Use UMAD_MAX_DEVICES define opensm/osm_trap_rcv.c: Fix typo Ira Weiny (3): opensm/opensm/osm_state_mgr.c: Add check for valid physical port before using pointer. Fix max parameter passed to umad_get_cas_names opensm: Add check for previous versions of plugins. Or Gerlitz (1): opensm: fix iser service-id used for SL assignment Sasha Khapyorsky (29): opensm/osm_ucast_lash: fix extra memory allocations opensm/osm_ucast_lash: simplify get_phys_connection() prototype opensm/scripts: unify scripts' config opens/osm_inform.c: cosmetic changes opensm/opens.spec: add -D option for logrotate file install command opensm: remove update_master_sm_base_lid field in PortInfo madw context libibmad/src/mad.c: indentation fix libibmad/dump: print more PortInfo:CapabilityMask bits opensm: support more PortInfo:CapabilityMask bits opensm: osm_send_trap144() function opensm: send trap144 to master SM when priority is raised opensm: notify master SM with trap 144 opensm: hide function name with OSM_LOG_MSG_BOX() macro opensm: rename sm signal opensm: sweep on SIGCONT opensm/include/opensm/osm_switch.h: minor simplifications opensm/osm_switch.c: minor: shorter flow opensm/osm_ucast_cache.[ch]: indentation fixes make.dist: don't use ${date}git suffix for release opensm/osm_ucase_cache: simplify cached links allocation code opensm/osm_subnet.c: consolidate logging code opensm/osm_subnet.c: use strdup() function opensm/osm_subnet.c: consolidate qos parameters verification code opensm/osm_subnet.c: move osm_subn_rescan_conf_files() function opensm/osm_sa_mcmember_record: return a real port JoinState on update opensm/osm_sa_mcmember_record: simplify query code infiniband-diags/ibstat.c: remove casting opensm/osm_trap_rcv.c: kill some empty lines management: update versions Tim Meier (1): opensm: osm_opensm.c added a method to remove plugins Yevgeny Kliteynik (16): opensm/scripts/opensm.conf: remove obsolete config file opensm/opensm/Makefile.am: allow 'make dist' from non-source directory opensm: replace switch's fwd_tbl with simple LFT opensm: replace switch's fwd_tbl with simple LFT - remove obsolete files opensm/osm_ucast_ftree.c: some simplification in LFT handling opensm: free lft_buf if it matches switch's lft opensm/osm_ucast_cache: fixing coredump opensm/osm_sa.c: adding missing include opensm/osm_pkey.c: cosmetics in some log message opensm/ib_types.h: rename IB_MC_REC_STATE_SEND_ONLY_MEMBER opensm/osm_multicast.c: bug with joining/leaving mcast group opensm/Makefile.am: install QoS_management_in_OpenSM.txt osmtest/osmt_multicast.c: some refinements to the multicast flow opensm/osm_lid_mgr.c: ignore and overwrite guid2lid (windows) opensm/osm_sa_mcmember_record.c: bad return state when leaving mcast opensm/osmtest: fixing some comments in mcast flow of osmtest From sashak at voltaire.com Mon Nov 17 22:04:38 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 18 Nov 2008 08:04:38 +0200 Subject: [ofa-general] Re: OpenSM handling of defunct SMs In-Reply-To: References: Message-ID: <20081118060438.GJ10251@sashak.voltaire.com> Hi Hal, On 10:59 Mon 17 Nov , Hal Rosenstock wrote: > > What I observe is that OpenSM 3.2.2 continues to poll/retry SMInfo for > a now defunct SM which spams the OpenSM log. > > It looks like SMs are removed from the sm_guid_tbl only when the port > is dropped/removed. Shouldn't it also be removed subsequent to a trap > 144 which is indicating that the capability mask changed (and the new > capability no longer include IsSM) ? I don't see this anywhere in the > code. Am I missing something ? It looks like a bug. > > If so, should osm_port_info_rcv.c:__osm_pi_rcv_process_endport remove > these so rather than: > > p_sm_tbl = &sm->p_subn->sm_guid_tbl; > p_sm = (osm_remote_sm_t *) cl_qmap_get(p_sm_tbl, port_guid); > if (p_sm != (osm_remote_sm_t *) cl_qmap_end(p_sm_tbl)) > /* clean it up */ > p_sm->smi.pri_state = 0xF0 & p_sm->smi.pri_state; > > if (p_pi->capability_mask & IB_PORT_CAP_IS_SM) { > > it should be something like: > p_sm_tbl = &sm->p_subn->sm_guid_tbl; > if (p_pi->capability_mask & IB_PORT_CAP_IS_SM) { > p_sm = (osm_remote_sm_t *) cl_qmap_get(p_sm_tbl, port_guid); > if (p_sm != (osm_remote_sm_t *) cl_qmap_end(p_sm_tbl)) > /* clean it up */ > p_sm->smi.pri_state = 0xF0 & p_sm->smi.pri_state; > ... > } else > p_sm = (osm_remote_sm_t *) > cl_qmap_remove(p_sm_tbl, port_guid); Yes, I guess it should be something like this. Would you care about the patch? Sasha From james_ at catbus.co.uk Tue Nov 18 02:04:02 2008 From: james_ at catbus.co.uk (James Beal) Date: Tue, 18 Nov 2008 10:04:02 +0000 Subject: [ofa-general] srp_daemon and partitions. In-Reply-To: <774A4005-446E-40D1-A70E-DBCBF12219F0@catbus.co.uk> References: <774A4005-446E-40D1-A70E-DBCBF12219F0@catbus.co.uk> Message-ID: <0E7ABECE-3A66-45B6-8C14-02AAC9FBC16F@catbus.co.uk> If this is not the correct list for questions of this nature, would someone be so kind as to tell me where people would be interested in such a question ? On 15 Nov 2008, at 10:36, James Beal wrote: > > We are currently investigating infiniband and we are so far very > impressed with the ease of use of the OFED stack. However we seem to > have run into an issue with the srp disc discovery. > > We wish to protect the storage from unwanted use. In a fibre channel > san environment this would be done in two ways, firstly presentation > ( configuring the controller as to which luns each WWN can access ) > and secondly zoning which is configuring the switches that make the > fabric as to which ports can communicate. If we can't do this it > would restrict IB to a single use eg as a replacement for fibre > switches. > > I can't see how to specify to either srp_daemon or ibsrpdm which > pkey to use when discovering discs and a quick look at the source > code doesn't inspire confidence as I can see pkey=ffff as a string > in the code. > > I did try the following: > > One host with one adapter communicating with DDN controller, with > no access control ( pkeys ) > > The correct lun information was discovered. > > root at isg-dev6:~# ibsrpdm -c > id_ext > = > 50001ff3000501f0 > ,ioc_guid > = > 50001ff3000501f0 > ,dgid > = > fe8000000000000050001ff4000501f0,pkey=ffff,service_id=f0010500f31f0050 > > > Access control was reasserted, and can be seen as the lun can no > longer be discovered. > > root at isg-dev6:~# ibsrpdm -c > > The device was created by "hand" with the pkey set to the correct > value > > echo > "id_ext > = > 50001ff3000501f0 > ,ioc_guid > = > 50001ff3000501f0 > ,dgid > = > fe8000000000000050001ff4000501f0 > ,pkey=1001,service_id=f0010500f31f0050" > /sys/class/infiniband_srp/ > srp-mthca0-1/add_target > > And the device can be seen. > > multipath -ll > 360001ff001f0dbac01000800000a6a6cdm-0 DDN ,S2A 9900 > [size=5.2T][features=0][hwhandler=0] > \_ round-robin 0 [prio=1][enabled] > \_ 5:0:0:1 sdb 8:16 [active][ready] > > > So the issue appears to be with ibsrpdm/srp_daemon not allowing the > pkey to be set > > The following message suggests the same. > > user_mad: process ibsrpdm did not enable P_Key index support. > user_mad: Documentation/infiniband/user_mad.txt has info on the new > ABI. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sashak at voltaire.com Tue Nov 18 02:43:27 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 18 Nov 2008 12:43:27 +0200 Subject: [ofa-general] [PATCH] opensm/osm_subnet: don't reassign zeroed config params Message-ID: <20081118104327.GN10251@sashak.voltaire.com> If string config parameter is NULL and input is null_str don't reassign it again (and don't print useless "Loading Option" message). Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_subnet.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index dc35a04..d787fe8 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -623,7 +623,8 @@ opts_unpack_charp(IN char *p_req_key, IN char *p_key, IN char *p_val_str, IN char **p_val) { if (!strcmp(p_req_key, p_key) && p_val_str) { - if ((*p_val == NULL) || strcmp(p_val_str, *p_val)) { + const char *current_str = *p_val ? *p_val : null_str ; + if (strcmp(p_val_str, current_str)) { log_config_value(p_key, "%s", p_val_str); /* special case the "(null)" string */ if (strcmp(null_str, p_val_str) == 0) { -- 1.6.0.3.517.g759a From vlad at lists.openfabrics.org Tue Nov 18 03:21:20 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Tue, 18 Nov 2008 03:21:20 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081118-0200 daily build status Message-ID: <20081118112120.71977E60C8D@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From monis at Voltaire.COM Tue Nov 18 03:34:45 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Tue, 18 Nov 2008 13:34:45 +0200 Subject: [ofa-general] [PATCH] ipoib: Fix loss of connectivity after bonding failover on both sides In-Reply-To: <490B448C.5080306@Voltaire.COM> References: <490B448C.5080306@Voltaire.COM> Message-ID: <4922A855.2010109@Voltaire.COM> The patch assumes that the path query succeeds and therefore copies the HA from the kernel neighbor structure to ipoib_neigh after path query is sent. If path query fails (e.g. request timeout) the next won't be triggered by finding that HA was updated in ipoib_strart_xmit(). This leads to a longer time that the destination node remains unaccessible. The patch below is identical to Yossi's patch but without the copy of HA in neigh_add_path. diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index fddded7..ec433bf 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -709,26 +709,26 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) neigh = *to_ipoib_neigh(skb->dst->neighbour); - if (neigh->ah) - if (unlikely((memcmp(&neigh->dgid.raw, - skb->dst->neighbour->ha + 4, - sizeof(union ib_gid))) || - (neigh->dev != dev))) { - spin_lock_irqsave(&priv->lock, flags); - /* - * It's safe to call ipoib_put_ah() inside - * priv->lock here, because we know that - * path->ah will always hold one more reference, - * so ipoib_put_ah() will never do more than - * decrement the ref count. - */ + if (unlikely((memcmp(&neigh->dgid.raw, + skb->dst->neighbour->ha + 4, + sizeof(union ib_gid))) || + (neigh->dev != dev))) { + spin_lock_irqsave(&priv->lock, flags); + /* + * It's safe to call ipoib_put_ah() inside + * priv->lock here, because we know that + * path->ah will always hold one more reference, + * so ipoib_put_ah() will never do more than + * decrement the ref count. + */ + if (neigh->ah) ipoib_put_ah(neigh->ah); - list_del(&neigh->list); - ipoib_neigh_free(dev, neigh); - spin_unlock_irqrestore(&priv->lock, flags); - ipoib_path_lookup(skb, dev); - return NETDEV_TX_OK; - } + list_del(&neigh->list); + ipoib_neigh_free(dev, neigh); + spin_unlock_irqrestore(&priv->lock, flags); + ipoib_path_lookup(skb, dev); + return NETDEV_TX_OK; + } if (ipoib_cm_get(neigh)) { if (ipoib_cm_up(neigh)) { From vlad at dev.mellanox.co.il Tue Nov 18 03:38:05 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 18 Nov 2008 13:38:05 +0200 Subject: [ofa-general] OFED-1.4-rc5 is available Message-ID: <4922A91D.8060107@dev.mellanox.co.il> Hi, OFED-1.4-rc5 release is available on http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-rc5.tgz To get BUILD_ID run ofed_info Please report any issues in bugzilla https://bugs.openfabrics.org/ for OFED 1.4 Vladimir & Tziporet ======================================================================== Release information: ------------------------------ Linux Operating Systems: - RedHat EL4 up4: 2.6.9-42.ELsmp * - RedHat EL4 up5: 2.6.9-55.ELsmp - RedHat EL4 up6: 2.6.9-67.ELsmp - RedHat EL4 up7: 2.6.9-78.ELsmp - RedHat EL5: 2.6.18-8.el5 - RedHat EL5 up1: 2.6.18-53.el5 - RedHat EL5 up2: 2.6.18-92.el5 - CentOS 5.2: 2.6.18-92.el5 - Fedora C9: 2.6.25-14.fc9 * - SLES10: 2.6.16.21-0.8-smp - SLES10 SP1: 2.6.16.46-0.12-smp - SLES10 SP1 up1: 2.6.16.53-0.16-smp - SLES10 SP2: 2.6.16.60-0.21-smp - OpenSuSE 10.3: 2.6.22.5-31 * - kernel.org: 2.6.26 and 2.6.27 * Minimal QA for these versions Systems: * x86_64 * x86 * ia64 * ppc64 Main Changes from OFED-1.4-rc4 ============================== - Updated MPI packages: mvapich-1.1.0-3141, mvapich2-1.2p1-1 - Updated bonding package: ib-bonding-0.9.0-34 - Updated qperf: qperf-0.4.2-1 - 8 bugs fixed (see attached for details) - Attached kernel git tree changes: Tasks that should be completed for the release: ================================ 1. High priority bug fixes 2. Documentation update -------------- next part -------------- A non-text attachment was scrubbed... Name: ofed_kernel-1.4-rc4_rc5.log Type: text/x-log Size: 13169 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ofed-1.4-rc5-fixed-bugs.csv Type: text/csv Size: 1137 bytes Desc: not available URL: From sashak at voltaire.com Tue Nov 18 04:30:00 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 18 Nov 2008 14:30:00 +0200 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_state_mgr.c: Add check for valid physical port before using pointer. In-Reply-To: <20081112185457.GD27271@sashak.voltaire.com> References: <20081104095744.35893d4a.weiny2@llnl.gov> <20081110201333.GM313@sashak.voltaire.com> <20081110131140.52561f42.weiny2@llnl.gov> <20081112185457.GD27271@sashak.voltaire.com> Message-ID: <20081118123000.GO10251@sashak.voltaire.com> Hi, On 20:54 Wed 12 Nov , Sasha Khapyorsky wrote: > > > > I was wondering if it would return invalid ports ever. It would be easy for it > > to return only valid ports but perhaps that should be another function to > > preserve functionality? Looked at this. Another problematic place where this function is used is osm_sa_link_record.c - there when "any" port becomes invalid (which is possible case) it starts an endless recursion :(. So we will need to fix the function behavior. One option is to scan all ports and to return valid one. Another solution would be to update locally stored in OpenSM NodeInfo on each receive (something like below). Then osm_node_get_any_physp_ptr() will return a port where this node was accessed last. In this way it also could catch potential OtherLocalSetting changes (in NodeInfo, such as SystemImageGUID, etc.). Could anybody see any downsides with such approach? Sasha diff --git a/opossum/opensm/osm_node_info_rcv.c b/opensm/opensm/osm_node_info_rcv.c index 20b16d1..7d41cab 100644 --- a/opensm/opensm/osm_node_info_rcv.c +++ b/opensm/opensm/osm_node_info_rcv.c @@ -785,6 +785,8 @@ __osm_ni_rcv_process_existing(IN osm_sm_t * sm, break; } + p_node->node_info = *p_ni; + __osm_ni_rcv_set_links(sm, p_node, port_num, p_ni_context); OSM_LOG_EXIT(sm->p_log); From sashak at voltaire.com Tue Nov 18 04:51:30 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 18 Nov 2008 14:51:30 +0200 Subject: [ofa-general] [PATCH] opensm/osm_trap_rcv.c: separate port disabling code Message-ID: <20081118125130.GR10251@sashak.voltaire.com> Separate port disabling code (activated with "babbling_port_policy") into disable_port() function. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_trap_rcv.c | 108 ++++++++++++++++-------------------------- 1 files changed, 41 insertions(+), 67 deletions(-) diff --git a/opensm/opensm/osm_trap_rcv.c b/opensm/opensm/osm_trap_rcv.c index 3b05775..5de283b 100644 --- a/opensm/opensm/osm_trap_rcv.c +++ b/opensm/opensm/osm_trap_rcv.c @@ -232,6 +232,44 @@ static int __print_num_received(IN uint32_t num_received) return 0; } +static int disable_port(osm_sm_t *sm, osm_physp_t *p) +{ + uint8_t payload[IB_SMP_DATA_SIZE]; + osm_madw_context_t context; + ib_port_info_t *pi = (ib_port_info_t *)payload; + int ret; + + /* If trap 131, might want to disable peer port if available */ + /* but peer port has been observed not to respond to SM requests */ + + OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3810: " + "Disabling physical port 0x%016" PRIx64 " num:%u\n", + cl_ntoh64(osm_physp_get_port_guid(p)), p->port_num); + + memcpy(payload, &p->port_info, sizeof(ib_port_info_t)); + + /* Set port to disabled/down */ + ib_port_info_set_port_state(pi, IB_LINK_DOWN); + ib_port_info_set_port_phys_state(IB_PORT_PHYS_STATE_DISABLED, pi); + + /* Issue set of PortInfo */ + context.pi_context.node_guid = osm_node_get_node_guid(p->p_node); + context.pi_context.port_guid = osm_physp_get_port_guid(p); + context.pi_context.set_method = TRUE; + context.pi_context.light_sweep = FALSE; + context.pi_context.active_transition = FALSE; + + ret = osm_req_set(sm, osm_physp_get_dr_path_ptr(p), + payload, sizeof(payload), IB_MAD_ATTR_PORT_INFO, + cl_hton32(osm_physp_get_port_num(p)), + CL_DISP_MSGID_NONE, &context); + if (ret) + OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3811: " + "Request to set PortInfo failed\n"); + + return ret; +} + /********************************************************************** **********************************************************************/ static void @@ -454,73 +492,9 @@ __osm_trap_rcv_process_request(IN osm_sm_t * sm, Threshold for disabling a "babbling" port is exceeded */ if (sm->p_subn->opt. babbling_port_policy - && num_received >= 250) { - uint8_t - payload[IB_SMP_DATA_SIZE]; - ib_port_info_t *p_pi = - (ib_port_info_t *) payload; - const ib_port_info_t *p_old_pi; - osm_madw_context_t context; - - /* If trap 131, might want to disable peer port if available */ - /* but peer port has been observed not to respond to SM requests */ - - OSM_LOG(sm->p_log, OSM_LOG_ERROR, - "ERR 3810: " - "Disabling physical port lid:%u num:%u\n", - cl_ntoh16(p_ntci-> - data_details. - ntc_129_131. - lid), - p_ntci->data_details. - ntc_129_131.port_num); - - p_old_pi = &p_physp->port_info; - memcpy(payload, p_old_pi, - sizeof(ib_port_info_t)); - - /* Set port to disabled/down */ - ib_port_info_set_port_state - (p_pi, IB_LINK_DOWN); - ib_port_info_set_port_phys_state - (IB_PORT_PHYS_STATE_DISABLED, - p_pi); - - /* Issue set of PortInfo */ - context.pi_context.node_guid = - osm_node_get_node_guid - (osm_physp_get_node_ptr - (p_physp)); - context.pi_context.port_guid = - osm_physp_get_port_guid - (p_physp); - context.pi_context.set_method = - TRUE; - context.pi_context.light_sweep = - FALSE; - context.pi_context. - active_transition = FALSE; - - status = - osm_req_set(sm, - osm_physp_get_dr_path_ptr - (p_physp), - payload, - sizeof(payload), - IB_MAD_ATTR_PORT_INFO, - cl_hton32 - (osm_physp_get_port_num - (p_physp)), - CL_DISP_MSGID_NONE, - &context); - - if (status == IB_SUCCESS) - goto Exit; - - OSM_LOG(sm->p_log, - OSM_LOG_ERROR, "ERR 3811: " - "Request to set PortInfo failed\n"); - } + && num_received >= 250 + && disable_port(sm, p_physp) == 0) + goto Exit; OSM_LOG(sm->p_log, OSM_LOG_VERBOSE, "Marking unhealthy physical port by lid:%u num:%u\n", -- 1.6.0.3.517.g759a From sashak at voltaire.com Tue Nov 18 04:53:25 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 18 Nov 2008 14:53:25 +0200 Subject: [ofa-general] [PATCH] opensm: disable switch ports only In-Reply-To: <20081118125130.GR10251@sashak.voltaire.com> References: <20081118125130.GR10251@sashak.voltaire.com> Message-ID: <20081118125325.GS10251@sashak.voltaire.com> When "babbling port" policy is on disable switch ports even when trap source is endport. This will allow to handle disable ports remotely (with ibportstate, etc.). Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_trap_rcv.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_trap_rcv.c b/opensm/opensm/osm_trap_rcv.c index 5de283b..07c5183 100644 --- a/opensm/opensm/osm_trap_rcv.c +++ b/opensm/opensm/osm_trap_rcv.c @@ -239,6 +239,10 @@ static int disable_port(osm_sm_t *sm, osm_physp_t *p) ib_port_info_t *pi = (ib_port_info_t *)payload; int ret; + /* in case of endport - disable switch's peer port */ + if (osm_node_get_type(p->p_node) != IB_NODE_TYPE_SWITCH) + p = p->p_remote_physp; + /* If trap 131, might want to disable peer port if available */ /* but peer port has been observed not to respond to SM requests */ -- 1.6.0.3.517.g759a From hal.rosenstock at gmail.com Tue Nov 18 05:04:14 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 18 Nov 2008 08:04:14 -0500 Subject: [ofa-general] Re: [PATCH] opensm: disable switch ports only In-Reply-To: <20081118125325.GS10251@sashak.voltaire.com> References: <20081118125130.GR10251@sashak.voltaire.com> <20081118125325.GS10251@sashak.voltaire.com> Message-ID: On Tue, Nov 18, 2008 at 7:53 AM, Sasha Khapyorsky wrote: > > When "babbling port" policy is on disable switch ports even when trap > source is endport. So does disables the peer switch port to an endport which is babbling ? That could be made clearer in the description. What happens if the end port is switch port 0 ? > This will allow to handle disable ports remotely > (with ibportstate, etc.). I'm not following what you mean by the ibportstate comment here. What port can ibportstate now disable differently from before ? -- Hal > Signed-off-by: Sasha Khapyorsky > --- > opensm/opensm/osm_trap_rcv.c | 4 ++++ > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/opensm/opensm/osm_trap_rcv.c b/opensm/opensm/osm_trap_rcv.c > index 5de283b..07c5183 100644 > --- a/opensm/opensm/osm_trap_rcv.c > +++ b/opensm/opensm/osm_trap_rcv.c > @@ -239,6 +239,10 @@ static int disable_port(osm_sm_t *sm, osm_physp_t *p) > ib_port_info_t *pi = (ib_port_info_t *)payload; > int ret; > > + /* in case of endport - disable switch's peer port */ > + if (osm_node_get_type(p->p_node) != IB_NODE_TYPE_SWITCH) > + p = p->p_remote_physp; > + > /* If trap 131, might want to disable peer port if available */ > /* but peer port has been observed not to respond to SM requests */ > > -- > 1.6.0.3.517.g759a > > From halr at obsidianresearch.com Tue Nov 18 05:05:27 2008 From: halr at obsidianresearch.com (Hal Rosenstock) Date: Tue, 18 Nov 2008 06:05:27 -0700 Subject: [ofa-general] opensm/osm_port_info_rcv.c: Remove SM from sm_guid_tbl when IsSM is not present Message-ID: <4922BD97.403@obsidianresearch.com> Sasha, The following patch removes the SM from the sm_guid_table when IsSM is not present. Compile tested only as I don't have an environment to recreate this anymore. -- Hal From sashak at voltaire.com Tue Nov 18 05:29:22 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 18 Nov 2008 15:29:22 +0200 Subject: [ofa-general] Re: [PATCH] opensm: disable switch ports only In-Reply-To: References: <20081118125130.GR10251@sashak.voltaire.com> <20081118125325.GS10251@sashak.voltaire.com> Message-ID: <20081118132922.GT10251@sashak.voltaire.com> On 08:04 Tue 18 Nov , Hal Rosenstock wrote: > On Tue, Nov 18, 2008 at 7:53 AM, Sasha Khapyorsky wrote: > > > > When "babbling port" policy is on disable switch ports even when trap > > source is endport. > > So does disables the peer switch port to an endport which is babbling > ? Yes. > That could be made clearer in the description. Ok. > What happens if the end port is switch port 0 ? When it should work as usual (it doesn't have remote port). > > This will allow to handle disable ports remotely > > (with ibportstate, etc.). > > I'm not following what you mean by the ibportstate comment here. What > port can ibportstate now disable differently from before ? It is the same, but when endport is disabled how could we reenable this remotely via downed link? Sasha > > -- Hal > > > Signed-off-by: Sasha Khapyorsky > > --- > > opensm/opensm/osm_trap_rcv.c | 4 ++++ > > 1 files changed, 4 insertions(+), 0 deletions(-) > > > > diff --git a/opensm/opensm/osm_trap_rcv.c b/opensm/opensm/osm_trap_rcv.c > > index 5de283b..07c5183 100644 > > --- a/opensm/opensm/osm_trap_rcv.c > > +++ b/opensm/opensm/osm_trap_rcv.c > > @@ -239,6 +239,10 @@ static int disable_port(osm_sm_t *sm, osm_physp_t *p) > > ib_port_info_t *pi = (ib_port_info_t *)payload; > > int ret; > > > > + /* in case of endport - disable switch's peer port */ > > + if (osm_node_get_type(p->p_node) != IB_NODE_TYPE_SWITCH) > > + p = p->p_remote_physp; > > + > > /* If trap 131, might want to disable peer port if available */ > > /* but peer port has been observed not to respond to SM requests */ > > > > -- > > 1.6.0.3.517.g759a > > > > From sashak at voltaire.com Tue Nov 18 05:30:25 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 18 Nov 2008 15:30:25 +0200 Subject: [ofa-general] Re: opensm/osm_port_info_rcv.c: Remove SM from sm_guid_tbl when IsSM is not present In-Reply-To: <4922BD97.403@obsidianresearch.com> References: <4922BD97.403@obsidianresearch.com> Message-ID: <20081118133025.GU10251@sashak.voltaire.com> Hi Hal, On 06:05 Tue 18 Nov , Hal Rosenstock wrote: > > The following patch removes the SM from the sm_guid_table when IsSM is not > present. Compile tested only as I don't have an environment to recreate > this anymore. Did you forget the patch? Sasha From rdreier at cisco.com Tue Nov 18 08:01:58 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 18 Nov 2008 08:01:58 -0800 Subject: [ofa-general] [PATCH] ipoib: Fix loss of connectivity after bonding failover on both sides In-Reply-To: <4922A855.2010109@Voltaire.COM> (Moni Shoua's message of "Tue, 18 Nov 2008 13:34:45 +0200") References: <490B448C.5080306@Voltaire.COM> <4922A855.2010109@Voltaire.COM> Message-ID: > The patch below is identical to Yossi's patch but without the copy of HA in neigh_add_path. Why did Yossi include that copy? Does this patch still fix everything? - R. From hal.rosenstock at gmail.com Tue Nov 18 08:05:46 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 18 Nov 2008 11:05:46 -0500 Subject: [ofa-general] Re: [PATCH] opensm: disable switch ports only In-Reply-To: <20081118132922.GT10251@sashak.voltaire.com> References: <20081118125130.GR10251@sashak.voltaire.com> <20081118125325.GS10251@sashak.voltaire.com> <20081118132922.GT10251@sashak.voltaire.com> Message-ID: On Tue, Nov 18, 2008 at 8:29 AM, Sasha Khapyorsky wrote: > On 08:04 Tue 18 Nov , Hal Rosenstock wrote: >> On Tue, Nov 18, 2008 at 7:53 AM, Sasha Khapyorsky wrote: >> > >> > When "babbling port" policy is on disable switch ports even when trap >> > source is endport. >> >> So does disables the peer switch port to an endport which is babbling >> ? > > Yes. > >> That could be made clearer in the description. > > Ok. > >> What happens if the end port is switch port 0 ? > > When it should work as usual (it doesn't have remote port). > >> > This will allow to handle disable ports remotely >> > (with ibportstate, etc.). >> >> I'm not following what you mean by the ibportstate comment here. What >> port can ibportstate now disable differently from before ? > > It is the same, OK. > but when endport is disabled how could we reenable this > remotely via downed link? I don't understand what you mean. ibportstate does not allow disabling of end port. -- Hal > Sasha > >> >> -- Hal >> >> > Signed-off-by: Sasha Khapyorsky >> > --- >> > opensm/opensm/osm_trap_rcv.c | 4 ++++ >> > 1 files changed, 4 insertions(+), 0 deletions(-) >> > >> > diff --git a/opensm/opensm/osm_trap_rcv.c b/opensm/opensm/osm_trap_rcv.c >> > index 5de283b..07c5183 100644 >> > --- a/opensm/opensm/osm_trap_rcv.c >> > +++ b/opensm/opensm/osm_trap_rcv.c >> > @@ -239,6 +239,10 @@ static int disable_port(osm_sm_t *sm, osm_physp_t *p) >> > ib_port_info_t *pi = (ib_port_info_t *)payload; >> > int ret; >> > >> > + /* in case of endport - disable switch's peer port */ >> > + if (osm_node_get_type(p->p_node) != IB_NODE_TYPE_SWITCH) >> > + p = p->p_remote_physp; >> > + >> > /* If trap 131, might want to disable peer port if available */ >> > /* but peer port has been observed not to respond to SM requests */ >> > >> > -- >> > 1.6.0.3.517.g759a >> > >> > > From monis at Voltaire.COM Tue Nov 18 08:11:49 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Tue, 18 Nov 2008 18:11:49 +0200 Subject: [ofa-general] [PATCH] ipoib: Fix loss of connectivity after bonding failover on both sides In-Reply-To: References: <490B448C.5080306@Voltaire.COM> <4922A855.2010109@Voltaire.COM> Message-ID: <4922E945.3030102@Voltaire.COM> Roland Dreier wrote: > > The patch below is identical to Yossi's patch but without the copy of HA in neigh_add_path. > > Why did Yossi include that copy? Does this patch still fix everything? > > - R. Yossi's intention was to save compares (gid size long) in ipoib_start_xmit(). The thought was that once the condition to start a new path query is met there is no need to meet it again (especially when the cost is high) The only thing is what happens when path query fails (I explained it above) and this is why I think its better to remove the copy. The new patch still fixes the basic problem that it intends to (as explained by Yossi) thanks MoniS From halr at obsidianresearch.com Tue Nov 18 08:12:58 2008 From: halr at obsidianresearch.com (Hal Rosenstock) Date: Tue, 18 Nov 2008 09:12:58 -0700 Subject: [ofa-general] [PATCH] opensm/osm_port_info_rcv.c: Remove SM from sm_guid_tbl when IsSM is not present Message-ID: <4922E98A.7020403@obsidianresearch.com> Sasha, The following patch (attached this time:-) removes the SM from the sm_guid_table when IsSM is not present. Compile tested only as I don't have an environment to recreate this anymore. -- Hal -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch-pir-issm1 URL: From sashak at voltaire.com Tue Nov 18 08:56:32 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 18 Nov 2008 18:56:32 +0200 Subject: [ofa-general] Re: [PATCH] opensm: disable switch ports only In-Reply-To: References: <20081118125130.GR10251@sashak.voltaire.com> <20081118125325.GS10251@sashak.voltaire.com> <20081118132922.GT10251@sashak.voltaire.com> Message-ID: <20081118165632.GW10251@sashak.voltaire.com> On 11:05 Tue 18 Nov , Hal Rosenstock wrote: > > > but when endport is disabled how could we reenable this > > remotely via downed link? > > I don't understand what you mean. ibportstate does not allow disabling > of end port. Right. And this is the reason to disable switch external port. And also if we have (hypothetically) the tool which is able to disable/enable endport, we will not be able to access this endport via downed link, only local reset/reboot will help. Sasha From hal.rosenstock at gmail.com Tue Nov 18 09:01:37 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 18 Nov 2008 12:01:37 -0500 Subject: [ofa-general] ***SPAM*** Re: [PATCH] opensm: disable switch ports only In-Reply-To: <20081118165632.GW10251@sashak.voltaire.com> References: <20081118125130.GR10251@sashak.voltaire.com> <20081118125325.GS10251@sashak.voltaire.com> <20081118132922.GT10251@sashak.voltaire.com> <20081118165632.GW10251@sashak.voltaire.com> Message-ID: On Tue, Nov 18, 2008 at 11:56 AM, Sasha Khapyorsky wrote: > On 11:05 Tue 18 Nov , Hal Rosenstock wrote: >> >> > but when endport is disabled how could we reenable this >> > remotely via downed link? >> >> I don't understand what you mean. ibportstate does not allow disabling >> of end port. > > Right. And this is the reason to disable switch external port. And also > if we have (hypothetically) the tool which is able to disable/enable > endport, we will not be able to access this endport via downed link, > only local reset/reboot will help. Yes, disabling the switch peer port is the alternative to disabling the end port. The latter is not allowed and I agree that disabling the switch peer port is a better choice IMO as the admin can't shoot himself in the foot and have to reset/reboot to reenable. -- Hal > > Sasha > From weiny2 at llnl.gov Tue Nov 18 14:06:08 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 18 Nov 2008 14:06:08 -0800 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_state_mgr.c: Add check for valid physical port before using pointer. In-Reply-To: <20081118123000.GO10251@sashak.voltaire.com> References: <20081104095744.35893d4a.weiny2@llnl.gov> <20081110201333.GM313@sashak.voltaire.com> <20081110131140.52561f42.weiny2@llnl.gov> <20081112185457.GD27271@sashak.voltaire.com> <20081118123000.GO10251@sashak.voltaire.com> Message-ID: <20081118140608.19ac0963.weiny2@llnl.gov> I am not sure this will fix my bug. The stack trace in my bug ended with: #0 osm_vendor_get (h_bind=0x0, mad_size=256, p_vw=0x69bbe8) at The h_bind was being extracted from the osm_physp_t object. Would this fix ensure that the h_bind pointer was valid in the osm_physp_t object returned? I used the "osm_physp_is_valid" function because the port_guid in osm_physp_t object was only set after port_info returned valid data which would also ensure that h_bind was set up correctly. That happens through the call path: osm_pi_rcv_process->osm_physp_init->osm_dr_path_init Currently osm_node_t->node_info is set when __osm_ni_rcv_process_new calls osm_node_new. osm_node_new calls osm_node_init_physp->osm_physp_init->osm_dr_path_init; but only on the portnum which in the NodeInfo SMP. Perhaps osm_physp_init needs to be called again as in this patch: diff --git a/opensm/opensm/osm_node_info_rcv.c b/opensm/opensm/osm_node_info_rcv.c index 20b16d1..5749a66 100644 --- a/opensm/opensm/osm_node_info_rcv.c +++ b/opensm/opensm/osm_node_info_rcv.c @@ -785,6 +785,9 @@ __osm_ni_rcv_process_existing(IN osm_sm_t * sm, break; } + p_node->node_info = *p_ni; + osm_node_init_physp(p_node, p_madw); + __osm_ni_rcv_set_links(sm, p_node, port_num, p_ni_context); OSM_LOG_EXIT(sm->p_log); Thoughts? Ira On Tue, 18 Nov 2008 14:30:00 +0200 Sasha Khapyorsky wrote: > Hi, > > On 20:54 Wed 12 Nov , Sasha Khapyorsky wrote: > > > > > > I was wondering if it would return invalid ports ever. It would be easy for it > > > to return only valid ports but perhaps that should be another function to > > > preserve functionality? > > Looked at this. Another problematic place where this function is used is > osm_sa_link_record.c - there when "any" port becomes invalid (which is > possible case) it starts an endless recursion :(. So we will need to fix > the function behavior. > > One option is to scan all ports and to return valid one. Another solution > would be to update locally stored in OpenSM NodeInfo on each receive > (something like below). Then osm_node_get_any_physp_ptr() will return a > port where this node was accessed last. > > In this way it also could catch potential OtherLocalSetting changes (in > NodeInfo, such as SystemImageGUID, etc.). > > Could anybody see any downsides with such approach? > > Sasha > > > diff --git a/opossum/opensm/osm_node_info_rcv.c b/opensm/opensm/osm_node_info_rcv.c > index 20b16d1..7d41cab 100644 > --- a/opensm/opensm/osm_node_info_rcv.c > +++ b/opensm/opensm/osm_node_info_rcv.c > @@ -785,6 +785,8 @@ __osm_ni_rcv_process_existing(IN osm_sm_t * sm, > break; > } > > + p_node->node_info = *p_ni; > + > __osm_ni_rcv_set_links(sm, p_node, port_num, p_ni_context); > > OSM_LOG_EXIT(sm->p_log); From meier3 at llnl.gov Tue Nov 18 17:10:37 2008 From: meier3 at llnl.gov (Timothy A. Meier) Date: Tue, 18 Nov 2008 17:10:37 -0800 Subject: [ofa-general] [PATCH] Opensm: main exit codes Message-ID: <4923678D.3080701@llnl.gov> Hey Sasha, I thought it would be useful to define a set of exit codes for opensm. A quick examination of main.c showed a few different ways to terminate. How about this patch? Obviously this doesn't catch every possible exit scenario, but its a start that can be built upon. >From d38854b804caac77ba7985fdf2314e412420cdad Mon Sep 17 00:00:00 2001 From: Tim Meier Date: Tue, 18 Nov 2008 16:51:14 -0800 Subject: [PATCH] Opensm: main exit codes Defined a set of exits codes and modified main() to use them as much as possible. Signed-off-by: Tim Meier --- opensm/include/opensm/osm_opensm.h | 24 ++++++++++++++++++++++++ opensm/opensm/main.c | 30 ++++++++++++++++-------------- 2 files changed, 40 insertions(+), 14 deletions(-) diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h index c121be4..5e78dba 100644 --- a/opensm/include/opensm/osm_opensm.h +++ b/opensm/include/opensm/osm_opensm.h @@ -87,6 +87,30 @@ BEGIN_C_DECLS * Steve King, Intel * *********/ +/****d* OpenSM: OpenSM/osm_exit_type_t +* NAME +* osm_exit_type_t +* +* DESCRIPTION +* Enumerates the possible exit codes that +* are provided by OpenSM. +* +* SYNOPSIS +*/ +typedef enum _osm_exit_type { + OSM_EXIT_TYPE_NORMAL = 0, + OSM_EXIT_TYPE_GENERIC_ERR, + OSM_EXIT_TYPE_USAGE, + OSM_EXIT_TYPE_FORK_ERR, + OSM_EXIT_TYPE_DIFFERENT_DEBUG_MODE, + OSM_EXIT_TYPE_DUPLICATE_OSM_GUID, + OSM_EXIT_TYPE_CONFIG_PARSE_ERR, + OSM_EXIT_TYPE_CONF_FILE_WRITE_ERR, + OSM_EXIT_TYPE_INVALID_ARG_VAL, + OSM_EXIT_TYPE_UNKNOWN_CMDLINE_ARG, + OSM_EXIT_TYPE_UNKNOWN +} osm_exit_type_t; +/***********/ /****d* OpenSM: OpenSM/osm_routing_engine_type_t * NAME * osm_routing_engine_type_t diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index 53648d6..d3aa55c 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -347,7 +347,7 @@ static void show_usage(void) printf("--help, -h, -?\n" " Display this usage info then exit.\n\n"); fflush(stdout); - exit(2); + exit(OSM_EXIT_TYPE_USAGE); } /********************************************************************** @@ -451,17 +451,17 @@ static int daemonize(osm_opensm_t * osm) if ((pid = fork()) < 0) { perror("fork"); - exit(-1); + exit(OSM_EXIT_TYPE_FORK_ERR); } else if (pid > 0) - exit(0); + exit(OSM_EXIT_TYPE_NORMAL); setsid(); if ((pid = fork()) < 0) { perror("fork"); - exit(-1); + exit(OSM_EXIT_TYPE_FORK_ERR); } else if (pid > 0) - exit(0); + exit(OSM_EXIT_TYPE_NORMAL); close(0); close(1); @@ -516,6 +516,7 @@ int main(int argc, char *argv[]) { osm_opensm_t osm; osm_subn_opt_t opt; + int exit_code = OSM_EXIT_TYPE_NORMAL; ib_net64_t sm_key = 0; ib_api_status_t status; uint32_t temp, dbg_lvl; @@ -595,7 +596,7 @@ int main(int argc, char *argv[]) "ERROR: OpenSM and Complib were compiled using different modes\n"); fprintf(stderr, "ERROR: OpenSM debug:%d Complib debug:%d \n", osm_is_debug(), cl_is_debug()); - exit(1); + exit(OSM_EXIT_TYPE_DIFFERENT_DEBUG_MODE); } #if defined (_DEBUG_) && defined (OSM_VENDOR_INTF_OPENIB) enable_stack_dump(1); @@ -615,7 +616,7 @@ int main(int argc, char *argv[]) long_option, NULL); switch (next_option) { case 12: /* --version - already printed above */ - exit(0); + exit(OSM_EXIT_TYPE_NORMAL); break; case 'F': if (config_file_done) @@ -623,7 +624,7 @@ int main(int argc, char *argv[]) printf("Reloading config from `%s`:\n", optarg); if (osm_subn_parse_conf_file(optarg, &opt)) { printf("cannot parse config file.\n"); - exit(1); + exit(OSM_EXIT_TYPE_CONFIG_PARSE_ERR); } printf("Rescaning command line:\n"); config_file_done = 1; @@ -755,7 +756,7 @@ int main(int argc, char *argv[]) if (temp > 7) { fprintf(stderr, "ERROR: LMC must be 7 or less.\n"); - return (-1); + exit(OSM_EXIT_TYPE_INVALID_ARG_VAL); } opt.lmc = (uint8_t) temp; printf(" LMC = %d\n", temp); @@ -821,7 +822,7 @@ int main(int argc, char *argv[]) if (0 > temp || 15 < temp) { fprintf(stderr, "ERROR: priority must be between 0 and 15\n"); - return (-1); + exit (OSM_EXIT_TYPE_INVALID_ARG_VAL); } opt.sm_priority = (uint8_t) temp; printf(" Priority = %d\n", temp); @@ -931,7 +932,7 @@ int main(int argc, char *argv[]) case -1: break; /* done with option */ default: /* something wrong */ - abort(); + exit(OSM_EXIT_TYPE_UNKNOWN_CMDLINE_ARG); } } while (next_option != -1); @@ -945,7 +946,7 @@ int main(int argc, char *argv[]) status = osm_subn_write_conf_file(conf_template, &opt); if (status) printf("\nosm_subn_write_conf_file failed!\n"); - exit(status); + exit(status? OSM_EXIT_TYPE_CONF_FILE_WRITE_ERR: OSM_EXIT_TYPE_NORMAL); } if (vendor_debug) @@ -967,7 +968,7 @@ int main(int argc, char *argv[]) /* We will just exit, and not go to Exit, since we don't want the destroy to be called. */ complib_exit(); - return (status); + exit (status); } /* @@ -982,6 +983,7 @@ int main(int argc, char *argv[]) printf("\nError from osm_opensm_bind (0x%X)\n", status); printf ("Perhaps another instance of OpenSM is already running\n"); + exit_code = OSM_EXIT_TYPE_DUPLICATE_OSM_GUID; goto Exit; } @@ -1021,5 +1023,5 @@ Exit: osm_opensm_destroy(&osm); complib_exit(); - exit(0); + exit(exit_code); } -- 1.5.4.5 -- Timothy A. Meier Computer Scientist ICCD/High Performance Computing 925.422.3341 meier3 at llnl.gov -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: 0001-Opensm-main-exit-codes.patch URL: From kliteyn at dev.mellanox.co.il Wed Nov 19 01:43:14 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 19 Nov 2008 11:43:14 +0200 Subject: [ofa-general] [PATCH] opensm/osm_lid_mgr.c: cosmetics in log message Message-ID: <4923DFB2.8000305@dev.mellanox.co.il> Sasha, Small log message cosmetics fix. Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_lid_mgr.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/opensm/opensm/osm_lid_mgr.c b/opensm/opensm/osm_lid_mgr.c index c135d4a..c90292a 100644 --- a/opensm/opensm/osm_lid_mgr.c +++ b/opensm/opensm/osm_lid_mgr.c @@ -1042,7 +1042,7 @@ __osm_lid_mgr_set_physp_pi(IN osm_lid_mgr_t * const p_mgr, (op_vls != ib_port_info_get_op_vls(p_old_pi))) { OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, "Sending Link Down to GUID 0x%016" - PRIx64 "port %d due to op_vls or " + PRIx64 " port %d due to op_vls or " "mtu change. MTU:%u,%u VL_CAP:%u,%u\n", cl_ntoh64(osm_physp_get_port_guid(p_physp)), port_num, mtu, -- 1.5.1.4 From kliteyn at dev.mellanox.co.il Wed Nov 19 01:51:48 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 19 Nov 2008 11:51:48 +0200 Subject: [ofa-general] [PATCH] opensm/osm_state_mgr.c: bug fix in unicast cache Message-ID: <4923E1B4.2030600@dev.mellanox.co.il> Hi Sasha, When there are errors during initialization and new heavy sweep is forced, unicast cache might hold a snapshot of the previous routing, and since there might be no *topology* changes, ucast cache will apply that cached routing, which might be wrong. This patch invalidates cache explicitly if there were initialization errors in addition to few other cases. This fix addresses bug #1398. Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_state_mgr.c | 16 ++++++++++++---- 1 files changed, 12 insertions(+), 4 deletions(-) diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index 841438c..d00e8ff 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -1064,6 +1064,18 @@ static void do_sweep(osm_sm_t * sm) } /* + * Unicast cache should be invalidated if: + * - every sweep is a heavy sweep + * - there were errors during initialization + * - subnet re-route is requested + */ + if (sm->p_subn->opt.use_ucast_cache && + (sm->p_subn->opt.force_heavy_sweep || + sm->p_subn->subnet_initialization_error || + sm->p_subn->force_reroute)) + osm_ucast_cache_invalidate(&sm->ucast_mgr); + + /* * If we don't need to do a heavy sweep and we want to do a reroute, * just reroute only. */ @@ -1079,10 +1091,6 @@ static void do_sweep(osm_sm_t * sm) /* Re-program the switches fully */ sm->p_subn->ignore_existing_lfts = TRUE; - /* we want to re-route, so cache should be invalidated */ - if (sm->p_subn->opt.use_ucast_cache) - osm_ucast_cache_invalidate(&sm->ucast_mgr); - osm_ucast_mgr_process(&sm->ucast_mgr); /* Reset flag */ -- 1.5.1.4 From vlad at lists.openfabrics.org Wed Nov 19 03:43:01 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Wed, 19 Nov 2008 03:43:01 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081119-0200 daily build status Message-ID: <20081119114302.18511E60E83@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From dorfman.eli at gmail.com Wed Nov 19 04:08:27 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Wed, 19 Nov 2008 14:08:27 +0200 Subject: ***SPAM*** Re: [ofa-general] ***SPAM*** Re: [PATCH] opensm: disable switch ports only In-Reply-To: References: <20081118125130.GR10251@sashak.voltaire.com> <20081118125325.GS10251@sashak.voltaire.com> <20081118132922.GT10251@sashak.voltaire.com> <20081118165632.GW10251@sashak.voltaire.com> Message-ID: <492401BB.3050808@gmail.com> Hal Rosenstock wrote: > On Tue, Nov 18, 2008 at 11:56 AM, Sasha Khapyorsky wrote: >> On 11:05 Tue 18 Nov , Hal Rosenstock wrote: >>>> but when endport is disabled how could we reenable this >>>> remotely via downed link? >>> I don't understand what you mean. ibportstate does not allow disabling >>> of end port. >> Right. And this is the reason to disable switch external port. And also >> if we have (hypothetically) the tool which is able to disable/enable >> endport, we will not be able to access this endport via downed link, >> only local reset/reboot will help. > > Yes, disabling the switch peer port is the alternative to disabling > the end port. The latter is not allowed and I agree that disabling the > switch peer port is a better choice IMO as the admin can't shoot > himself in the foot and have to reset/reboot to reenable. > More generic approach would be to disable the port with the least hop count. this will address the case of inter switch link where the most remote port (from opensm) is sending traps. in that case we would like to disable the nearest switch port. Eli. From hal.rosenstock at gmail.com Wed Nov 19 07:00:25 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Wed, 19 Nov 2008 10:00:25 -0500 Subject: [ofa-general] ***SPAM*** Re: [PATCH] opensm: disable switch ports only In-Reply-To: <492401BB.3050808@gmail.com> References: <20081118125130.GR10251@sashak.voltaire.com> <20081118125325.GS10251@sashak.voltaire.com> <20081118132922.GT10251@sashak.voltaire.com> <20081118165632.GW10251@sashak.voltaire.com> <492401BB.3050808@gmail.com> Message-ID: On Wed, Nov 19, 2008 at 7:08 AM, Eli Dorfman wrote: > More generic approach would be to disable the port with the least hop count. > this will address the case of inter switch link where the most remote port (from opensm) is sending traps. > in that case we would like to disable the nearest switch port. Yes, that's another approach and has the potential advantage of disabling fewer ports when more ports are babbling. It does assume a closer interswitch link which doesn't affect any other endports. Anyhow, IMO the trap rate issue has been around long enough to have been fixed in those SMAs. -- Hal > Eli. From sashak at voltaire.com Wed Nov 19 09:50:25 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 19 Nov 2008 19:50:25 +0200 Subject: [ofa-general] [PATCH] opensm: fix QoS config bug Message-ID: <20081119175025.GH6183@sashak.voltaire.com> Then file is not given or OpenSM cannot open it config verification procedure is not running and as result QoS parameters still have wrong values - OpenSM crashes later when '-Q' is used. This addresses bug #1401. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_subnet.h | 1 + opensm/opensm/main.c | 2 ++ opensm/opensm/osm_subnet.c | 8 +++++--- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index 2bcd232..d97d5f4 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -1100,6 +1100,7 @@ int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t * const p_opt); * Assumes the conf file is part of the cache dir which defaults to * OSM_DEFAULT_CACHE_DIR or OSM_CACHE_DIR the name is opensm.opts *********/ +int osm_subn_verify_config(osm_subn_opt_t * const p_opt); END_C_DECLS #endif /* _OSM_SUBNET_H_ */ diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index 53648d6..999e92f 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -948,6 +948,8 @@ int main(int argc, char *argv[]) exit(status); } + osm_subn_verify_config(&opt); + if (vendor_debug) osm_vendor_set_debug(osm.p_vendor, vendor_debug); diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index d787fe8..c41962d 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -949,7 +949,7 @@ static void subn_verify_qos_set(osm_qos_options_t *set, const char *prefix, subn_verify_sl2vl(&set->sl2vl, prefix, dflt->sl2vl); } -static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) +int osm_subn_verify_config(IN osm_subn_opt_t * const p_opts) { if (p_opts->lmc > 7) { log_report(" Invalid Cached Option Value:lmc = %u:" @@ -1024,6 +1024,8 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts) OSM_PERFMGR_DEFAULT_MAX_OUTSTANDING_QUERIES; } #endif + + return 0; } /********************************************************************** @@ -1285,7 +1287,7 @@ int osm_subn_parse_conf_file(char *file_name, osm_subn_opt_t * const p_opts) } fclose(opts_file); - subn_verify_conf_file(p_opts); + osm_subn_verify_config(p_opts); return 0; } @@ -1340,7 +1342,7 @@ int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) } fclose(opts_file); - subn_verify_conf_file(&p_subn->opt); + osm_subn_verify_config(&p_subn->opt); osm_parse_prefix_routes_file(p_subn); -- 1.6.0.3.517.g759a From sashak at voltaire.com Wed Nov 19 10:30:20 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 19 Nov 2008 20:30:20 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_port_info_rcv.c: Remove SM from sm_guid_tbl when IsSM is not present In-Reply-To: <4922E98A.7020403@obsidianresearch.com> References: <4922E98A.7020403@obsidianresearch.com> Message-ID: <20081119183020.GJ6183@sashak.voltaire.com> Hi Hal, On 09:12 Tue 18 Nov , Hal Rosenstock wrote: > Sasha, > > The following patch (attached this time:-) removes the SM from the > sm_guid_table when IsSM is not present. Compile tested only as I don't have > an environment to recreate this anymore. > > -- Hal > opensm/osm_port_info_rcv.c: Remove SM from sm_guid_tbl when IsSM is > not present in PortInfo:CapabilityMask > > SM should be removed from the sm_guid_tbl subsequent to a trap 144 > indicating the capability mask changed (and the new capabilities > no longer include IsSM). > > As a result of this, move clearing of SM state to be conditionalized on > IsSM present rather than regardless of whether IsSM is set > > Prior to this patch, the OpenSM log is spammed with error messages on > SubnGets of SMInfo attribute. > > Signed-off-by: Hal Rosenstock > > diff --git a/opensm/opensm/osm_port_info_rcv.c b/opensm/opensm/osm_port_info_rcv.c > index 47eb457..97ec5b3 100644 > --- a/opensm/opensm/osm_port_info_rcv.c > +++ b/opensm/opensm/osm_port_info_rcv.c > @@ -149,17 +149,17 @@ __osm_pi_rcv_process_endport(IN osm_sm_t * sm, > */ > __osm_pi_rcv_set_sm(sm, p_physp); > } else { > - /* > - Before querying the SM - we want to make sure we clean its state, so > - if the querying fails we recognize that this SM is not active. > - */ > p_sm_tbl = &sm->p_subn->sm_guid_tbl; > - p_sm = (osm_remote_sm_t *) cl_qmap_get(p_sm_tbl, port_guid); > - if (p_sm != (osm_remote_sm_t *) cl_qmap_end(p_sm_tbl)) > - /* clean it up */ > - p_sm->smi.pri_state = 0xF0 & p_sm->smi.pri_state; > - > if (p_pi->capability_mask & IB_PORT_CAP_IS_SM) { > + /* > + * Before querying the SM - we want to make sure we > + * clean its state, so if the querying fails we > + * recognize that this SM is not active. > + */ > + p_sm = (osm_remote_sm_t *) cl_qmap_get(p_sm_tbl, port_guid); > + if (p_sm != (osm_remote_sm_t *) cl_qmap_end(p_sm_tbl)) > + /* clean it up */ > + p_sm->smi.pri_state = 0xF0 & p_sm->smi.pri_state; > if (sm->p_subn->opt.ignore_other_sm) > OSM_LOG(sm->p_log, OSM_LOG_VERBOSE, > "Ignoring SM on port 0x%" PRIx64 "\n", > @@ -171,7 +171,8 @@ __osm_pi_rcv_process_endport(IN osm_sm_t * sm, > cl_ntoh64(port_guid)); > > /* > - This port indicates it's an SM and it's not our own port. > + This port indicates it's an SM and > + it's not our own port. > Acquire the SMInfo Attribute. > */ > memset(&context, 0, sizeof(context)); > @@ -190,7 +191,8 @@ __osm_pi_rcv_process_endport(IN osm_sm_t * sm, > "Failure requesting SMInfo (%s)\n", > ib_get_err_str(status)); > } > - } > + } else > + cl_qmap_remove(p_sm_tbl, port_guid); Isn't it should be freed too? Something like: p_sm = cl_qmap_remove(p_sm_tbl, port_guid); free(p_sm); Sasha > } > > OSM_LOG_EXIT(sm->p_log); From sashak at voltaire.com Wed Nov 19 10:48:36 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 19 Nov 2008 20:48:36 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_lid_mgr.c: cosmetics in log message In-Reply-To: <4923DFB2.8000305@dev.mellanox.co.il> References: <4923DFB2.8000305@dev.mellanox.co.il> Message-ID: <20081119184836.GK6183@sashak.voltaire.com> On 11:43 Wed 19 Nov , Yevgeny Kliteynik wrote: > Sasha, > > Small log message cosmetics fix. > > Signed-off-by: Yevgeny Kliteynik Applied. Thanks. Sasha From hal.rosenstock at gmail.com Wed Nov 19 10:52:33 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Wed, 19 Nov 2008 13:52:33 -0500 Subject: ***SPAM*** Re: [ofa-general] Re: [PATCH] opensm/osm_port_info_rcv.c: Remove SM from sm_guid_tbl when IsSM is not present In-Reply-To: <20081119183020.GJ6183@sashak.voltaire.com> References: <4922E98A.7020403@obsidianresearch.com> <20081119183020.GJ6183@sashak.voltaire.com> Message-ID: Sasha, On Wed, Nov 19, 2008 at 1:30 PM, Sasha Khapyorsky wrote: > Hi Hal, > > On 09:12 Tue 18 Nov , Hal Rosenstock wrote: >> Sasha, >> >> The following patch (attached this time:-) removes the SM from the >> sm_guid_table when IsSM is not present. Compile tested only as I don't have >> an environment to recreate this anymore. >> >> -- Hal > >> opensm/osm_port_info_rcv.c: Remove SM from sm_guid_tbl when IsSM is >> not present in PortInfo:CapabilityMask >> >> SM should be removed from the sm_guid_tbl subsequent to a trap 144 >> indicating the capability mask changed (and the new capabilities >> no longer include IsSM). >> >> As a result of this, move clearing of SM state to be conditionalized on >> IsSM present rather than regardless of whether IsSM is set >> >> Prior to this patch, the OpenSM log is spammed with error messages on >> SubnGets of SMInfo attribute. >> >> Signed-off-by: Hal Rosenstock >> >> diff --git a/opensm/opensm/osm_port_info_rcv.c b/opensm/opensm/osm_port_info_rcv.c >> index 47eb457..97ec5b3 100644 >> --- a/opensm/opensm/osm_port_info_rcv.c >> +++ b/opensm/opensm/osm_port_info_rcv.c >> @@ -149,17 +149,17 @@ __osm_pi_rcv_process_endport(IN osm_sm_t * sm, >> */ >> __osm_pi_rcv_set_sm(sm, p_physp); >> } else { >> - /* >> - Before querying the SM - we want to make sure we clean its state, so >> - if the querying fails we recognize that this SM is not active. >> - */ >> p_sm_tbl = &sm->p_subn->sm_guid_tbl; >> - p_sm = (osm_remote_sm_t *) cl_qmap_get(p_sm_tbl, port_guid); >> - if (p_sm != (osm_remote_sm_t *) cl_qmap_end(p_sm_tbl)) >> - /* clean it up */ >> - p_sm->smi.pri_state = 0xF0 & p_sm->smi.pri_state; >> - >> if (p_pi->capability_mask & IB_PORT_CAP_IS_SM) { >> + /* >> + * Before querying the SM - we want to make sure we >> + * clean its state, so if the querying fails we >> + * recognize that this SM is not active. >> + */ >> + p_sm = (osm_remote_sm_t *) cl_qmap_get(p_sm_tbl, port_guid); >> + if (p_sm != (osm_remote_sm_t *) cl_qmap_end(p_sm_tbl)) >> + /* clean it up */ >> + p_sm->smi.pri_state = 0xF0 & p_sm->smi.pri_state; >> if (sm->p_subn->opt.ignore_other_sm) >> OSM_LOG(sm->p_log, OSM_LOG_VERBOSE, >> "Ignoring SM on port 0x%" PRIx64 "\n", >> @@ -171,7 +171,8 @@ __osm_pi_rcv_process_endport(IN osm_sm_t * sm, >> cl_ntoh64(port_guid)); >> >> /* >> - This port indicates it's an SM and it's not our own port. >> + This port indicates it's an SM and >> + it's not our own port. >> Acquire the SMInfo Attribute. >> */ >> memset(&context, 0, sizeof(context)); >> @@ -190,7 +191,8 @@ __osm_pi_rcv_process_endport(IN osm_sm_t * sm, >> "Failure requesting SMInfo (%s)\n", >> ib_get_err_str(status)); >> } >> - } >> + } else >> + cl_qmap_remove(p_sm_tbl, port_guid); > > Isn't it should be freed too? Something like: > > p_sm = cl_qmap_remove(p_sm_tbl, port_guid); > free(p_sm); Oops; my bad; revised patch shortly. -- Hal > Sasha > >> } >> >> OSM_LOG_EXIT(sm->p_log); > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From halr at obsidianresearch.com Wed Nov 19 10:55:07 2008 From: halr at obsidianresearch.com (Hal Rosenstock) Date: Wed, 19 Nov 2008 11:55:07 -0700 Subject: [ofa-general] [PATCHv2] opensm/osm_port_info_rcv.c: Remove SM from sm_guid_tbl when IsSM is not Message-ID: <4924610B.90809@obsidianresearch.com> Sasha, The following patch removes the SM from the sm_guid_table when IsSM is not present. v2 of this fixes the memory leak you pointed out in the original version. Compile tested only as I don't have an environment to recreate this anymore. -- Hal -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch-pir-issm2 URL: From sashak at voltaire.com Wed Nov 19 11:00:59 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 19 Nov 2008 21:00:59 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_state_mgr.c: bug fix in unicast cache In-Reply-To: <4923E1B4.2030600@dev.mellanox.co.il> References: <4923E1B4.2030600@dev.mellanox.co.il> Message-ID: <20081119190059.GM6183@sashak.voltaire.com> Hi Yevgeny, On 11:51 Wed 19 Nov , Yevgeny Kliteynik wrote: > Hi Sasha, > > When there are errors during initialization and new > heavy sweep is forced, unicast cache might hold a > snapshot of the previous routing, and since there > might be no *topology* changes, ucast cache will > apply that cached routing, which might be wrong. > > This patch invalidates cache explicitly if there > were initialization errors in addition to few other > cases. > > This fix addresses bug #1398. > > Signed-off-by: Yevgeny Kliteynik > --- > opensm/opensm/osm_state_mgr.c | 16 ++++++++++++---- > 1 files changed, 12 insertions(+), 4 deletions(-) > > diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c > index 841438c..d00e8ff 100644 > --- a/opensm/opensm/osm_state_mgr.c > +++ b/opensm/opensm/osm_state_mgr.c > @@ -1064,6 +1064,18 @@ static void do_sweep(osm_sm_t * sm) > } > > /* > + * Unicast cache should be invalidated if: > + * - every sweep is a heavy sweep > + * - there were errors during initialization > + * - subnet re-route is requested > + */ > + if (sm->p_subn->opt.use_ucast_cache && > + (sm->p_subn->opt.force_heavy_sweep || Why 'opt.force_heavy_sweep' should be there? It is possible to enforce heavy sweep without routing cache just by using: opt.force_heavy_sweep TRUE opt.use_ucast_cache FALSE Sasha > + sm->p_subn->subnet_initialization_error || > + sm->p_subn->force_reroute)) > + osm_ucast_cache_invalidate(&sm->ucast_mgr); > + > + /* > * If we don't need to do a heavy sweep and we want to do a reroute, > * just reroute only. > */ > @@ -1079,10 +1091,6 @@ static void do_sweep(osm_sm_t * sm) > /* Re-program the switches fully */ > sm->p_subn->ignore_existing_lfts = TRUE; > > - /* we want to re-route, so cache should be invalidated */ > - if (sm->p_subn->opt.use_ucast_cache) > - osm_ucast_cache_invalidate(&sm->ucast_mgr); > - > osm_ucast_mgr_process(&sm->ucast_mgr); > > /* Reset flag */ > -- > 1.5.1.4 > From yossi.openib at gmail.com Wed Nov 19 11:30:27 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Wed, 19 Nov 2008 21:30:27 +0200 Subject: [ofa-general] [PATCH] ipoib: Fix loss of connectivity after bonding failover on both sides In-Reply-To: References: <490B448C.5080306@Voltaire.COM> <4922A855.2010109@Voltaire.COM> Message-ID: <49246953.4020101@gmail.com> I included that copy to avoid the logic of releasing/allocating ipoib neighbour for every packet xmit'ed before the patch query completes. I thought that it's good enough to do it just once, for the first time. Therefore, to have the mgid test pass for the second xmit, I copied the mgid even if path query fails. It turns out that it's not a good thing to do that, because if a path query fails nothing will trigger it, but ARP refresh, and it takes too much time. In case of SM failover, the first path query indeed fails. So, the best thing is probably to remove this "optimization". Besides that, the patch works same as before. Roland Dreier wrote: > > The patch below is identical to Yossi's patch but without the copy of HA in neigh_add_path. > > Why did Yossi include that copy? Does this patch still fix everything? > > - R. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Wed Nov 19 11:33:32 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 19 Nov 2008 21:33:32 +0200 Subject: [ofa-general] Re: [PATCHv2] opensm/osm_port_info_rcv.c: Remove SM from sm_guid_tbl when IsSM is not In-Reply-To: <4924610B.90809@obsidianresearch.com> References: <4924610B.90809@obsidianresearch.com> Message-ID: <20081119193332.GN6183@sashak.voltaire.com> Hi Hal, On 11:55 Wed 19 Nov , Hal Rosenstock wrote: > Sasha, > > The following patch removes the SM from the sm_guid_table when IsSM is not > present. > v2 of this fixes the memory leak you pointed out in the original version. > Compile tested only as I don't have an environment to recreate this > anymore. > > -- Hal > > opensm/osm_port_info_rcv.c: Remove SM from sm_guid_tbl when IsSM is > not present in PortInfo:CapabilityMask > > SM should be removed from the sm_guid_tbl subsequent to a trap 144 > indicating the capability mask changed (and the new capabilities > no longer include IsSM). > > As a result of this, move clearing of SM state to be conditionalized on > IsSM present rather than regardless of whether IsSM is set. > > Prior to this patch, the OpenSM log is spammed with error messages on > SubnGets of SMInfo attribute. > > Signed-off-by: Hal Rosenstock > --- > v2 fixes memory leak pointed out by Sasha. > > diff --git a/opensm/opensm/osm_port_info_rcv.c b/opensm/opensm/osm_port_info_rcv.c > index 47eb457..5988dc3 100644 > --- a/opensm/opensm/osm_port_info_rcv.c > +++ b/opensm/opensm/osm_port_info_rcv.c > @@ -149,17 +149,17 @@ __osm_pi_rcv_process_endport(IN osm_sm_t * sm, > */ > __osm_pi_rcv_set_sm(sm, p_physp); > } else { > - /* > - Before querying the SM - we want to make sure we clean its state, so > - if the querying fails we recognize that this SM is not active. > - */ > p_sm_tbl = &sm->p_subn->sm_guid_tbl; > - p_sm = (osm_remote_sm_t *) cl_qmap_get(p_sm_tbl, port_guid); > - if (p_sm != (osm_remote_sm_t *) cl_qmap_end(p_sm_tbl)) > - /* clean it up */ > - p_sm->smi.pri_state = 0xF0 & p_sm->smi.pri_state; > - > if (p_pi->capability_mask & IB_PORT_CAP_IS_SM) { > + /* > + * Before querying the SM - we want to make sure we > + * clean its state, so if the querying fails we > + * recognize that this SM is not active. > + */ > + p_sm = (osm_remote_sm_t *) cl_qmap_get(p_sm_tbl, port_guid); > + if (p_sm != (osm_remote_sm_t *) cl_qmap_end(p_sm_tbl)) > + /* clean it up */ > + p_sm->smi.pri_state = 0xF0 & p_sm->smi.pri_state; > if (sm->p_subn->opt.ignore_other_sm) > OSM_LOG(sm->p_log, OSM_LOG_VERBOSE, > "Ignoring SM on port 0x%" PRIx64 "\n", > @@ -171,7 +171,8 @@ __osm_pi_rcv_process_endport(IN osm_sm_t * sm, > cl_ntoh64(port_guid)); > > /* > - This port indicates it's an SM and it's not our own port. > + This port indicates it's an SM and > + it's not our own port. > Acquire the SMInfo Attribute. > */ > memset(&context, 0, sizeof(context)); > @@ -190,6 +191,9 @@ __osm_pi_rcv_process_endport(IN osm_sm_t * sm, > "Failure requesting SMInfo (%s)\n", > ib_get_err_str(status)); > } > + } else { > + p_sm = (osm_remote_sm_t *) cl_qmap_remove(p_sm_tbl, port_guid); > + free(p_sm); Sorry about my simplified example. Actually it should be: if (p_sm != (osm_remote_sm_t *) cl_qmap_end(p_sm_tbl)) free(p_sm); Since many ports may not have IsSM bit. The patch is applied with this fix. Thanks. Sasha From sashak at voltaire.com Wed Nov 19 11:38:06 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 19 Nov 2008 21:38:06 +0200 Subject: [ofa-general] ***SPAM*** Re: [PATCH] opensm: disable switch ports only In-Reply-To: <492401BB.3050808@gmail.com> References: <20081118125130.GR10251@sashak.voltaire.com> <20081118125325.GS10251@sashak.voltaire.com> <20081118132922.GT10251@sashak.voltaire.com> <20081118165632.GW10251@sashak.voltaire.com> <492401BB.3050808@gmail.com> Message-ID: <20081119193806.GO6183@sashak.voltaire.com> Hi Eli, On 14:08 Wed 19 Nov , Eli Dorfman wrote: > > More generic approach would be to disable the port with the least hop count. > this will address the case of inter switch link where the most remote port (from opensm) is sending traps. > in that case we would like to disable the nearest switch port. Seems reasonable for me. Would you care about a patch? Sasha From yosefe at Voltaire.COM Wed Nov 19 11:53:27 2008 From: yosefe at Voltaire.COM (Yossi Etigin) Date: Wed, 19 Nov 2008 21:53:27 +0200 Subject: [ofa-general] [PATCH] ipoib: do not join broadcast group if interface is brought down Message-ID: <49246EB7.3070607@Voltaire.COM> Because ipoib_workqueue is not flushed when ipoib interface is brought down, ipoib_mcast_join() may trigger a join to the broadcast group after priv->broadcast was set to NULL (during cleanup). This will cause ipoib to be joined to the broadcast group when interface is down. As a side effect, this breaks the optimization of setting qkey only when joining the broadcast group. Signed-off-by: Yossi Etigin -- Fix bugzilla 1370. Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-11-19 21:33:54.000000000 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-11-19 21:40:12.000000000 +0200 @@ -565,7 +565,8 @@ void ipoib_mcast_join_task(struct work_s ipoib_warn(priv, "ib_query_port failed\n"); } - if (!priv->broadcast) { + rtnl_lock(); + if (test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) && !priv->broadcast) { struct ipoib_mcast *broadcast; broadcast = ipoib_mcast_alloc(dev, 1); @@ -576,6 +577,7 @@ void ipoib_mcast_join_task(struct work_s queue_delayed_work(ipoib_workqueue, &priv->mcast_join_task, HZ); mutex_unlock(&mcast_mutex); + rtnl_unlock(); return; } @@ -587,6 +589,7 @@ void ipoib_mcast_join_task(struct work_s __ipoib_mcast_add(dev, priv->broadcast); spin_unlock_irq(&priv->lock); } + rtnl_unlock(); if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { if (!test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags)) -- --Yossi From kliteyn at dev.mellanox.co.il Wed Nov 19 14:00:06 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 20 Nov 2008 00:00:06 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_state_mgr.c: bug fix in unicast cache In-Reply-To: <20081119190059.GM6183@sashak.voltaire.com> References: <4923E1B4.2030600@dev.mellanox.co.il> <20081119190059.GM6183@sashak.voltaire.com> Message-ID: <49248C66.2010403@dev.mellanox.co.il> Sasha Khapyorsky wrote: > Hi Yevgeny, > > On 11:51 Wed 19 Nov , Yevgeny Kliteynik wrote: >> Hi Sasha, >> >> When there are errors during initialization and new >> heavy sweep is forced, unicast cache might hold a >> snapshot of the previous routing, and since there >> might be no *topology* changes, ucast cache will >> apply that cached routing, which might be wrong. >> >> This patch invalidates cache explicitly if there >> were initialization errors in addition to few other >> cases. >> >> This fix addresses bug #1398. >> >> Signed-off-by: Yevgeny Kliteynik >> --- >> opensm/opensm/osm_state_mgr.c | 16 ++++++++++++---- >> 1 files changed, 12 insertions(+), 4 deletions(-) >> >> diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c >> index 841438c..d00e8ff 100644 >> --- a/opensm/opensm/osm_state_mgr.c >> +++ b/opensm/opensm/osm_state_mgr.c >> @@ -1064,6 +1064,18 @@ static void do_sweep(osm_sm_t * sm) >> } >> >> /* >> + * Unicast cache should be invalidated if: >> + * - every sweep is a heavy sweep >> + * - there were errors during initialization >> + * - subnet re-route is requested >> + */ >> + if (sm->p_subn->opt.use_ucast_cache && >> + (sm->p_subn->opt.force_heavy_sweep || > > Why 'opt.force_heavy_sweep' should be there? It is possible to enforce > heavy sweep without routing cache just by using: > > opt.force_heavy_sweep TRUE > opt.use_ucast_cache FALSE Well, it doesn't have to be there. The opt.force_heavy_sweep is kind of debug mode of opensm, so I just wanted to disable cache in that case. Want me to remove it and repost the patch? -- Yevgeny > Sasha > >> + sm->p_subn->subnet_initialization_error || >> + sm->p_subn->force_reroute)) >> + osm_ucast_cache_invalidate(&sm->ucast_mgr); >> + >> + /* >> * If we don't need to do a heavy sweep and we want to do a reroute, >> * just reroute only. >> */ >> @@ -1079,10 +1091,6 @@ static void do_sweep(osm_sm_t * sm) >> /* Re-program the switches fully */ >> sm->p_subn->ignore_existing_lfts = TRUE; >> >> - /* we want to re-route, so cache should be invalidated */ >> - if (sm->p_subn->opt.use_ucast_cache) >> - osm_ucast_cache_invalidate(&sm->ucast_mgr); >> - >> osm_ucast_mgr_process(&sm->ucast_mgr); >> >> /* Reset flag */ >> -- >> 1.5.1.4 >> > From tziporet at dev.mellanox.co.il Wed Nov 19 16:02:22 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Wed, 19 Nov 2008 18:02:22 -0600 Subject: [ofa-general][PATCH 1/3]mlx4: Multiple completion vectors support In-Reply-To: References: <4907348E.7060508@mellanox.co.il> <490A8FA9.7080802@pobox.com> <490DA91A.1030703@pobox.com> <490DD27C.4070109@pobox.com> <491C41F0.3080304@mellanox.co.il> Message-ID: <4924A90E.9050205@mellanox.co.il> Roland Dreier wrote: > This is 2.6.29 material, and I should be able to get to it in the next > few weeks. > Great Tziporet From sashak at voltaire.com Wed Nov 19 16:42:59 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 20 Nov 2008 02:42:59 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_state_mgr.c: bug fix in unicast cache In-Reply-To: <49248C66.2010403@dev.mellanox.co.il> References: <4923E1B4.2030600@dev.mellanox.co.il> <20081119190059.GM6183@sashak.voltaire.com> <49248C66.2010403@dev.mellanox.co.il> Message-ID: <20081120004259.GP6486@sashak.voltaire.com> On 00:00 Thu 20 Nov , Yevgeny Kliteynik wrote: > Sasha Khapyorsky wrote: >> Hi Yevgeny, >> On 11:51 Wed 19 Nov , Yevgeny Kliteynik wrote: >>> Hi Sasha, >>> >>> When there are errors during initialization and new >>> heavy sweep is forced, unicast cache might hold a >>> snapshot of the previous routing, and since there >>> might be no *topology* changes, ucast cache will >>> apply that cached routing, which might be wrong. >>> >>> This patch invalidates cache explicitly if there >>> were initialization errors in addition to few other >>> cases. >>> >>> This fix addresses bug #1398. >>> >>> Signed-off-by: Yevgeny Kliteynik >>> --- >>> opensm/opensm/osm_state_mgr.c | 16 ++++++++++++---- >>> 1 files changed, 12 insertions(+), 4 deletions(-) >>> >>> diff --git a/opensm/opensm/osm_state_mgr.c >>> b/opensm/opensm/osm_state_mgr.c >>> index 841438c..d00e8ff 100644 >>> --- a/opensm/opensm/osm_state_mgr.c >>> +++ b/opensm/opensm/osm_state_mgr.c >>> @@ -1064,6 +1064,18 @@ static void do_sweep(osm_sm_t * sm) >>> } >>> >>> /* >>> + * Unicast cache should be invalidated if: >>> + * - every sweep is a heavy sweep >>> + * - there were errors during initialization >>> + * - subnet re-route is requested >>> + */ >>> + if (sm->p_subn->opt.use_ucast_cache && >>> + (sm->p_subn->opt.force_heavy_sweep || >> Why 'opt.force_heavy_sweep' should be there? It is possible to enforce >> heavy sweep without routing cache just by using: >> opt.force_heavy_sweep TRUE >> opt.use_ucast_cache FALSE > > Well, it doesn't have to be there. > The opt.force_heavy_sweep is kind of debug mode of opensm, > so I just wanted to disable cache in that case. > Want me to remove it and repost the patch? Yes, please. Sasha From dorfman.eli at gmail.com Thu Nov 20 00:00:38 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Thu, 20 Nov 2008 10:00:38 +0200 Subject: [ofa-general] ***SPAM*** [PATCH] opensm/osm_trap_rcv.c disable the port with the least hop count Message-ID: <49251926.9090509@gmail.com> disable the port with the least hop count. this will address the case of inter switch link where the most remote port (from opensm) is sending traps. in that case we would like to disable the nearest switch port (from opensm). Signed-off-by: Eli Dorfman --- opensm/opensm/osm_trap_rcv.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/opensm/opensm/osm_trap_rcv.c b/opensm/opensm/osm_trap_rcv.c index 07c5183..d1dfbd4 100644 --- a/opensm/opensm/osm_trap_rcv.c +++ b/opensm/opensm/osm_trap_rcv.c @@ -239,8 +239,8 @@ static int disable_port(osm_sm_t *sm, osm_physp_t *p) ib_port_info_t *pi = (ib_port_info_t *)payload; int ret; - /* in case of endport - disable switch's peer port */ - if (osm_node_get_type(p->p_node) != IB_NODE_TYPE_SWITCH) + /* select the nearest port to master opensm */ + if (p->dr_path.hop_count > p->p_remote_physp->dr_path.hop_count) p = p->p_remote_physp; /* If trap 131, might want to disable peer port if available */ -- 1.5.5 From kliteyn at dev.mellanox.co.il Thu Nov 20 00:33:19 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 20 Nov 2008 10:33:19 +0200 Subject: [ofa-general] [PATCH v2] opensm/osm_state_mgr.c: bug fix in unicast cache Message-ID: <492520CF.4080001@dev.mellanox.co.il> Hi Sasha, When there are errors during initialization and new heavy sweep is forced, unicast cache might hold a snapshot of the previous routing, and since there might be no *topology* changes, unicast cache will apply that cached routing, which might be wrong. This patch invalidates cache explicitly if there were initialization errors in addition to few other cases. V2: don't invalidate cache when opt.force_heavy_sweep is on. This fix addresses bug #1398. Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_state_mgr.c | 13 +++++++++---- 1 files changed, 9 insertions(+), 4 deletions(-) diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index 841438c..788da51 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -1064,6 +1064,15 @@ static void do_sweep(osm_sm_t * sm) } /* + * Unicast cache should be invalidated if there were errors + * during initialization or if subnet re-route is requested. + */ + if (sm->p_subn->opt.use_ucast_cache && + (sm->p_subn->subnet_initialization_error || + sm->p_subn->force_reroute)) + osm_ucast_cache_invalidate(&sm->ucast_mgr); + + /* * If we don't need to do a heavy sweep and we want to do a reroute, * just reroute only. */ @@ -1079,10 +1088,6 @@ static void do_sweep(osm_sm_t * sm) /* Re-program the switches fully */ sm->p_subn->ignore_existing_lfts = TRUE; - /* we want to re-route, so cache should be invalidated */ - if (sm->p_subn->opt.use_ucast_cache) - osm_ucast_cache_invalidate(&sm->ucast_mgr); - osm_ucast_mgr_process(&sm->ucast_mgr); /* Reset flag */ -- 1.5.1.4 From jackm at dev.mellanox.co.il Thu Nov 20 02:11:45 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 20 Nov 2008 12:11:45 +0200 Subject: [ofa-general] Race condition in userspace libraries with create/destroy qp Message-ID: <200811201211.46527.jackm@dev.mellanox.co.il> Roland, Mazal Tov again on the birth of your son. I hope all is well. Has your latency improved (by some miracle)? There seems to be a race in libmlx4 (which our regression testing found). mlx4_create_qp and mlx4_destroy_qp are not atomic WRT each other. If one thread is destroying a QP while another is creating a qp, there is a race hole. The destroying thread can lose its timeslice after it has deleted the QP from kernel space, but before it has cleared it from userspace store (mlx4_clear_qp). If the other thread creates a qp during this break, it gets the same QP base number and overwrites the destroyed QPs entry with mlx4_store_qp(). When the destroying thread resumes, it clears the new entry from the userspace store via mlx4_clear_qp. I'm debating between a couple of options: 1. move the mlx4_qp_clear to precede ibv_cmd_destroy_qp. However, what if we're still getting completions for this qp? Ouch. 2. Create a mutex for this purpose, and use it to force the create and destroy qp operations to be atomic WRT the ibv_cmd_xxx_qp operations and the store/clear qp operations. 3. Force kernel space to avoid allocating a just-deleted qp number (this is my least favorite option). My preference is for #2, as being the simplest to implement and having no side-effects. What do you think? - Jack (BTW libmthca has the same issue). ======================================================== From file libmlx4/src/verbs.c: mlx4_create_qp snippet: ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr, &cmd.ibv_cmd, sizeof cmd, &resp, sizeof resp); if (ret) goto err_rq_db; ret = mlx4_store_qp(to_mctx(pd->context), qp->ibv_qp.qp_num, qp); if (ret) goto err_destroy; mlx4_destroy_qp snippet: ret = ibv_cmd_destroy_qp(ibqp); if (ret) return ret; ==> CAN LOSE TIME SLICE HERE!!! mlx4_lock_cqs(ibqp); __mlx4_cq_clean(to_mcq(ibqp->recv_cq), ibqp->qp_num, ibqp->srq ? to_msrq(ibqp->srq) : NULL); if (ibqp->send_cq != ibqp->recv_cq) __mlx4_cq_clean(to_mcq(ibqp->send_cq), ibqp->qp_num, NULL); mlx4_clear_qp(to_mctx(ibqp->context), ibqp->qp_num); mlx4_unlock_cqs(ibqp); From vlad at lists.openfabrics.org Thu Nov 20 03:25:44 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Thu, 20 Nov 2008 03:25:44 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081120-0200 daily build status Message-ID: <20081120112545.0F20CE60DF1@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.27 Passed on i686 with linux-2.6.26 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From kliteyn at dev.mellanox.co.il Thu Nov 20 03:58:27 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 20 Nov 2008 13:58:27 +0200 Subject: [ofa-general] [PATCH] opensm/osm_switch.h: use updated LFT for routing Message-ID: <492550E3.90805@dev.mellanox.co.il> Hi Sasha, Function osm_switch_get_port_by_lid() was using the switch's LFT, so this LFT might not be updated to recent routing. I think that this was also relevant before the LFT simplification. One immediate outcome of this bug is opensm.fdbs file - when it is dumped from the switch LFT (and not from lft_buf), it sometimes doesn't match the lst file. Signed-off-by: Yevgeny Kliteynik --- opensm/include/opensm/osm_switch.h | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/opensm/include/opensm/osm_switch.h b/opensm/include/opensm/osm_switch.h index caa0bc5..f06931c 100644 --- a/opensm/include/opensm/osm_switch.h +++ b/opensm/include/opensm/osm_switch.h @@ -411,7 +411,11 @@ osm_switch_get_port_by_lid(IN const osm_switch_t * const p_sw, { if (lid_ho == 0 || lid_ho > IB_LID_UCAST_END_HO) return OSM_NO_PATH; - return p_sw->lft[lid_ho]; + + if (p_sw->lft_buf) + return p_sw->lft_buf[lid_ho]; + else + return p_sw->lft[lid_ho]; } /* * PARAMETERS -- 1.5.1.4 From kliteyn at dev.mellanox.co.il Thu Nov 20 04:46:11 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 20 Nov 2008 14:46:11 +0200 Subject: [ofa-general] Re: [PATCH v2] opensm: free lft_buf if it matches switch's lft In-Reply-To: <20081031043226.GH16455@sashak.voltaire.com> References: <4909DAC8.4040602@dev.mellanox.co.il> <20081030214519.GN7502@sashak.voltaire.com> <490A2C5D.4080309@dev.mellanox.co.il> <20081031043226.GH16455@sashak.voltaire.com> Message-ID: <49255C13.5030503@dev.mellanox.co.il> Sasha Khapyorsky wrote: > On 23:51 Thu 30 Oct , Yevgeny Kliteynik wrote: >> Sure, why not. That way the memory would be freed faster. > > Patch? > > Sasha > I can do something like the following patch, but I have some strange feeling that I'm missing something... Can there be some flow that would cause lft_buf to be freed while not all the lft blocks were received yet, and then remaining blocks might change switch->lft (after the switch->lft_buf was already freed)? I can't think of any particular example, just a general concern... -- Yevgeny Free lft_buf when newly received lft block makes switch's lft identical to lft_buf. Signed-off-by: Yevgeny Kliteynik --- opensm/include/opensm/osm_switch.h | 7 +++++++ opensm/opensm/osm_ucast_mgr.c | 7 ------- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/opensm/include/opensm/osm_switch.h b/opensm/include/opensm/osm_switch.h index f06931c..af8a50e 100644 --- a/opensm/include/opensm/osm_switch.h +++ b/opensm/include/opensm/osm_switch.h @@ -729,6 +729,13 @@ osm_switch_set_lft_block(IN osm_switch_t * const p_sw, return IB_INVALID_PARAMETER; memcpy(&p_sw->lft[lid_start], p_block, IB_SMP_DATA_SIZE); + + if (p_sw->lft_buf && + !memcmp(p_sw->lft, p_sw->lft_buf, IB_LID_UCAST_END_HO + 1)) { + free(p_sw->lft_buf); + p_sw->lft_buf = NULL; + } + return IB_SUCCESS; } /* diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c index 175817c..7f1a816 100644 --- a/opensm/opensm/osm_ucast_mgr.c +++ b/opensm/opensm/osm_ucast_mgr.c @@ -399,13 +399,6 @@ int osm_ucast_mgr_set_fwd_table(IN osm_ucast_mgr_t * const p_mgr, goto Exit; } - if (!p_sw->need_update && - !memcmp(p_sw->lft, p_sw->lft_buf, IB_LID_UCAST_END_HO + 1)) { - free(p_sw->lft_buf); - p_sw->lft_buf = NULL; - goto Exit; - } - for (block_id_ho = 0; osm_switch_get_lft_block(p_sw, block_id_ho, block); block_id_ho++) { -- 1.5.1.4 From yevgenyp at mellanox.co.il Thu Nov 20 06:55:18 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Thu, 20 Nov 2008 16:55:18 +0200 Subject: [ofa-general] mlx4_en: Memory leak on completion queue free. Message-ID: <49257A56.9030609@mellanox.co.il> If port is being destroyed without being activated before, CQ resources are not freed. Signed-off-by: Yevgeny Petrilin --- Hello Jeff, this regression fix for 2.6.28 drivers/net/mlx4/en_cq.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/net/mlx4/en_cq.c b/drivers/net/mlx4/en_cq.c index 1368a80..1a936f4 100644 --- a/drivers/net/mlx4/en_cq.c +++ b/drivers/net/mlx4/en_cq.c @@ -68,6 +68,8 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv, err = mlx4_en_map_buffer(&cq->wqres.buf); if (err) mlx4_free_hwq_res(mdev->dev, &cq->wqres, cq->buf_size); + else + cq->buf = (struct mlx4_cqe *) cq->wqres.buf.direct.buf; return err; } @@ -82,7 +84,6 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq) cq->mcq.arm_db = cq->wqres.db.db + 1; *cq->mcq.set_ci_db = 0; *cq->mcq.arm_db = 0; - cq->buf = (struct mlx4_cqe *) cq->wqres.buf.direct.buf; memset(cq->buf, 0, cq->buf_size); err = mlx4_cq_alloc(mdev->dev, cq->size, &cq->wqres.mtt, &mdev->priv_uar, -- 1.5.4 From tziporet at mellanox.co.il Thu Nov 20 07:21:56 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 20 Nov 2008 17:21:56 +0200 Subject: [ofa-general] OFED 1.4 - delay the GA to Dec 4 Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD01006654@mtlexch01.mtl.com> Hi All, I have Just reviewed bugs status with Vlad. We have 11 major and critical bugs, and we will not be able to fix all of them in one week Thus - I delay the GA release to Dec 4 (since we have thanks-giving holiday next week) I also suggest we will create RC6 by end of next week - since most of the bugs are assigned to people in Israel and we do not have vacation next week We will review the release status at the EWG meeting next week. Bug owners - please reply with status update and also update bug report Bugs list: 1370 blo vlad at mellanox.co.il Ping over IPoIB I/F fails after ifconfig down and up 1242 cri yannick.cote at qlogic.com kernel panic while running mpi2007 against ofed1.4 -- ib_... 1198 cri yosefe at voltaire.com hang during ipoib create_child/ifdown 1348 maj amirv at mellanox.co.il Sdp sockets doesnt closed after programs end 1349 maj amirv at mellanox.co.il Kernel panic on sdp 1289 maj jackm at mellanox.co.il Ib and ipoib doesnt respond while running multiple tests ... 1389 maj jackm at mellanox.co.il poll_cq sometimes fail in a multithreaded test 1401 maj sashak at voltaire.com segmentation fault when running opensm -Q 1377 maj vu at mellanox.com Deadlock occured during HA test 1380 maj vu at mellanox.com Cannot unload ib_srpt module on SRP target 1395 maj vu at mellanox.com kernel panic during SRP HA test Tziporet & Vlad -------------- next part -------------- An HTML attachment was scrubbed... URL: From vst at vlnb.net Thu Nov 20 07:24:18 2008 From: vst at vlnb.net (Vladislav Bolkhovitin) Date: Thu, 20 Nov 2008 18:24:18 +0300 Subject: [ofa-general] SRP/mlx4 interrupts throttling performance In-Reply-To: <49189567.1010804@harr.org> References: <48E386F6.5040502@fusionio.com> <48E6498A.3070002@mellanox.com> <48E65FE0.2060602@harr.org> <48E67ACC.1020903@harr.org> <48E695F9.80703@harr.org> <48E9E681.8090600@vlnb.net> <48EA2F42.80008@harr.org> <48EB8CBC.30303@harr.org> <48EB96C5.2060202@vlnb.net> <48EBA581.4040301@mellanox.com> <48EBA72B.4000909@harr.org> <48EBBDB1.1080203@harr.org> <48EBE6B6.4060804@mellanox.com> <48ECEA4D.7080504@harr.org> <48ED3489.4030905@harr.org> <48F79CF8.3010905@vlnb.net> <48FE6C84.7030300@harr.org> <48FEDA26.4080304@vlnb.net> <48FF2D1A.8000101@harr.org> <48FF5F42.2050902@vlnb.net> <48FF60D3.9020809@harr.org> <4901F14C.6000006@harr.org> <490210EE.2070000@vlnb.net> <49022553.1020804@harr.org> <490B45ED.3020203@vlnb.net> <4910A622.4050906@harr.org> <4911D827.10705@vlnb.net> <49121715.4040804@harr.org> <4912C684.5000505@vlnb.net> <491307C7.50008@harr.org> <49131A85.2010102@vlnb.net> <49189567.1010804@harr.org> Message-ID: <49258122.6040808@vlnb.net> Cameron Harr wrote: > New results, with markers. > ---- > type=randwrite bs=512 drives=1 scst_threads=1 srptthread=1 iops=65612.40 > type=randwrite bs=4k drives=1 scst_threads=1 srptthread=1 iops=54934.31 > type=randwrite bs=512 drives=2 scst_threads=1 srptthread=1 iops=82514.57 > type=randwrite bs=4k drives=2 scst_threads=1 srptthread=1 iops=79680.42 > type=randwrite bs=512 drives=1 scst_threads=2 srptthread=1 iops=60439.73 > type=randwrite bs=4k drives=1 scst_threads=2 srptthread=1 iops=51510.68 > type=randwrite bs=512 drives=2 scst_threads=2 srptthread=1 iops=102735.07 > type=randwrite bs=4k drives=2 scst_threads=2 srptthread=1 iops=78558.77 > type=randwrite bs=512 drives=1 scst_threads=3 srptthread=1 iops=62941.35 > type=randwrite bs=4k drives=1 scst_threads=3 srptthread=1 iops=51924.17 > type=randwrite bs=512 drives=2 scst_threads=3 srptthread=1 iops=120961.39 > type=randwrite bs=4k drives=2 scst_threads=3 srptthread=1 iops=75411.52 > type=randwrite bs=512 drives=1 scst_threads=1 srptthread=0 iops=50891.13 > type=randwrite bs=4k drives=1 scst_threads=1 srptthread=0 iops=50199.90 > type=randwrite bs=512 drives=2 scst_threads=1 srptthread=0 iops=58711.87 > type=randwrite bs=4k drives=2 scst_threads=1 srptthread=0 iops=74504.65 > type=randwrite bs=512 drives=1 scst_threads=2 srptthread=0 iops=61043.73 > type=randwrite bs=4k drives=1 scst_threads=2 srptthread=0 iops=49951.89 > type=randwrite bs=512 drives=2 scst_threads=2 srptthread=0 iops=83195.60 > type=randwrite bs=4k drives=2 scst_threads=2 srptthread=0 iops=75224.25 > type=randwrite bs=512 drives=1 scst_threads=3 srptthread=0 iops=60277.98 > type=randwrite bs=4k drives=1 scst_threads=3 srptthread=0 iops=49874.57 > type=randwrite bs=512 drives=2 scst_threads=3 srptthread=0 iops=84851.43 > type=randwrite bs=4k drives=2 scst_threads=3 srptthread=0 iops=73238.46 I think srptthread=0 performs worse in this case, because with it part of processing done in SIRQ, but seems scheduler make it be done on the same CPU as fct0-worker, which does the data transfer to your SSD device job. And this thread is always consumes about 100% CPU, so it has less CPU time, hence less overall performance. So, try to affine fctX-worker, SCST threads and SIRQ processing on different CPUs and check again. You can affine threads using utility from http://www.kernel.org/pub/linux/kernel/people/rml/cpu-affinity/, how to affine IRQ see Documentation/IRQ-affinity.txt in your kernel tree. Vlad From vst at vlnb.net Thu Nov 20 07:26:15 2008 From: vst at vlnb.net (Vladislav Bolkhovitin) Date: Thu, 20 Nov 2008 18:26:15 +0300 Subject: [ofa-general] SRP/mlx4 interrupts throttling performance In-Reply-To: <4910A49B.1050004@harr.org> References: <48E386F6.5040502@fusionio.com> <48E38BAF.5000801@harr.org> <48E6498A.3070002@mellanox.com> <48E65FE0.2060602@harr.org> <48E67ACC.1020903@harr.org> <48E695F9.80703@harr.org> <48E9E681.8090600@vlnb.net> <48EA2F42.80008@harr.org> <48EB8CBC.30303@harr.org> <48EB96C5.2060202@vlnb.net> <48EBA581.4040301@mellanox.com> <48EBA72B.4000909@harr.org> <48EBBDB1.1080203@harr.org> <48EBE6B6.4060804@mellanox.com> <48ECEA4D.7080504@harr.org> <48F79CA9.8090806@vlnb.net> <49022438.9030903@harr.org> <490B45B0.7030208@vlnb.net> <4910A49B.1050004@harr.org> Message-ID: <49258197.3020904@vlnb.net> Cameron Harr wrote: > Vladislav Bolkhovitin wrote: >> Cameron Harr wrote: >>> Vladislav Bolkhovitin wrote: >>>>> ** Sometimes the benchmark "zombied" (process doing no work, but >>>>> process can't be killed) after running a certain amount of time. >>>>> However, it wasn't repeatable in a reliable way, so I mark that >>>>> this particular run has zombied before. >>>> That means that there is a bug somewhere. Usually such bugs are >>>> found in few hours of code auditing (srpt driver is pretty simple) >>>> or by using kernel debug facilities (example diff to .config >>>> attached). I personally always prefer put my effort on fixing real >>>> things, not inventing various workarounds, like srpt_thread in this >>>> case. >>>> >>>> So I would: >>>> >>>> 1. Completely remove srpt thread and all related code. It doesn't do >>>> anything, which can't be done in SIRQ context (tasklet) >>>> >>>> 2. Audit the code to check if it does any action, which it >>>> shouldn't do on SIRQ and fix it. This step isn't required, but >>>> usually it saves a lot of time of puzzled debugging in the future. >>>> >>>> 3. Change in srpt_handle_rdma_comp() and srpt_handle_new_iu() >>>> SCST_CONTEXT_THREAD to SCST_CONTEXT_DIRECT_ATOMIC. > > I'm assuming you didn't want me to implement this change this time, correct? Seems, I've already done that in the patch you use ;) From olga.shern at gmail.com Thu Nov 20 07:51:22 2008 From: olga.shern at gmail.com (Olga Shern (Voltaire)) Date: Thu, 20 Nov 2008 17:51:22 +0200 Subject: [ofa-general] ***SPAM*** Re: [ewg] OFED 1.4 - delay the GA to Dec 4 In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EAD01006654@mtlexch01.mtl.com> References: <5D49E7A8952DC44FB38C38FA0D758EAD01006654@mtlexch01.mtl.com> Message-ID: > > 1370 blo vlad at mellanox.co.il Ping over IPoIB I/F fails > after ifconfig down and up > Yossi have sent a patch that fixes this > 1198 cri yosefe at voltaire.com hang during ipoib > create_child/ifdown We sent patch to Roland some time ago. But it was decided in EWG meeting that because: 1. It is rarely that user will run such test 2. This is an old bug that wasn't introduced in OFED 1.4 we will not add the patch to OFED 1.4 If you think this is another bug we should open a new one > 1289 maj jackm at mellanox.co.il Ib and ipoib doesnt respond while > running multiple tests ... > It seems that this was already fixed - need only retest this and verify that this is indeed fixed From vuhuong at mellanox.com Thu Nov 20 11:06:23 2008 From: vuhuong at mellanox.com (Vu Pham) Date: Thu, 20 Nov 2008 11:06:23 -0800 Subject: [ofa-general] srp_daemon and partitions. In-Reply-To: <774A4005-446E-40D1-A70E-DBCBF12219F0@catbus.co.uk> References: <774A4005-446E-40D1-A70E-DBCBF12219F0@catbus.co.uk> Message-ID: <4925B52F.3030106@mellanox.com> Hi James, it's srp_daemon and ibsrpdm bug. We'll try to fix it to provide zoning thru pkey. > > We wish to protect the storage from unwanted use. In a fibre channel > san environment this would be done in two ways, firstly presentation ( > configuring the controller as to which luns each WWN can access ) and > secondly zoning which is configuring the switches that make the fabric > as to which ports can communicate. If we can't do this it would > restrict IB to a single use eg as a replacement for fibre switches. > Does DDN has management sw to set the access control list (configuring the controller as to which luns each WWN can access)? OFED's srp target / scst mid-layer can provide this -vu From vuhuong at mellanox.com Thu Nov 20 11:06:23 2008 From: vuhuong at mellanox.com (Vu Pham) Date: Thu, 20 Nov 2008 11:06:23 -0800 Subject: [ofa-general] srp_daemon and partitions. In-Reply-To: <774A4005-446E-40D1-A70E-DBCBF12219F0@catbus.co.uk> References: <774A4005-446E-40D1-A70E-DBCBF12219F0@catbus.co.uk> Message-ID: <4925B52F.3030106@mellanox.com> Hi James, it's srp_daemon and ibsrpdm bug. We'll try to fix it to provide zoning thru pkey. > > We wish to protect the storage from unwanted use. In a fibre channel > san environment this would be done in two ways, firstly presentation ( > configuring the controller as to which luns each WWN can access ) and > secondly zoning which is configuring the switches that make the fabric > as to which ports can communicate. If we can't do this it would > restrict IB to a single use eg as a replacement for fibre switches. > Does DDN has management sw to set the access control list (configuring the controller as to which luns each WWN can access)? OFED's srp target / scst mid-layer can provide this -vu From michael.oevermann at tu-berlin.de Thu Nov 20 11:41:44 2008 From: michael.oevermann at tu-berlin.de (Michael Oevermann) Date: Thu, 20 Nov 2008 20:41:44 +0100 Subject: [ofa-general] infiniband problem, no NICs Message-ID: <4925BD78.4030003@tu-berlin.de> Hi all, I have "inherited" a small cluster with a head node and four compute nodes which I have to administer. The nodes are connected via infiniband (OFED), but the head is not. I am a complete novice to the infiniband stuff and here is my problem: The infiniband configuration seems to be OK. The usual tests suggested in the OFED install guide give the expected output, e.g. ibv_devinfo on the nodes: ************************* oscar_cluster ************************* --------- n01--------- hca_id: mthca0 fw_ver: 1.2.0 node_guid: 0002:c902:0025:930c sys_image_guid: 0002:c902:0025:930f vendor_id: 0x02c9 vendor_part_id: 25204 hw_ver: 0xA0 board_id: MT_03B0140001 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 2 port_lid: 1 port_lmc: 0x00 etc. for the other nodes. sminfo on the nodes: ************************* oscar_cluster ************************* --------- n01--------- sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6881 priority 0 state 3 SMINFO_MASTER --------- n02--------- sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6882 priority 0 state 3 SMINFO_MASTER --------- n03--------- sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6883 priority 0 state 3 SMINFO_MASTER --------- n04--------- sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6884 priority 0 state 3 SMINFO_MASTER However, when I directly start a mpi job (without using a scheduler) via: /usr/mpi/gcc4/openmpi-1.2.2-1/bin/mpirun -np 4 -hostfile /home/sysgen/infiniband-mpi-test/machine/usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1 I get the error message: 0,1,0]: uDAPL on host n01 was unable to find any NICs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- -------------------------------------------------------------------------- [0,1,2]: uDAPL on host n01 was unable to find any NICs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- -------------------------------------------------------------------------- [0,1,3]: uDAPL on host n02 was unable to find any NICs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- -------------------------------------------------------------------------- [0,1,1]: uDAPL on host n02 was unable to find any NICs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- MPI with normal GB Etherrnet and IP networking just works fine, but the infinband doesn't. The MPI libs I am using for the test are definitely compiled with IB support and the tests have been run successfully on the cluster before. Any suggestions what is going wrong here? Best regards and thanks for any help! Michael From rdreier at cisco.com Thu Nov 20 14:50:51 2008 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 20 Nov 2008 14:50:51 -0800 Subject: [ofa-general] Re: Race condition in userspace libraries with create/destroy qp In-Reply-To: <200811201211.46527.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Thu, 20 Nov 2008 12:11:45 +0200") References: <200811201211.46527.jackm@dev.mellanox.co.il> Message-ID: > mlx4_create_qp and mlx4_destroy_qp are not atomic WRT each other. If one thread is > destroying a QP while another is creating a qp, there is a race hole. The destroying thread > can lose its timeslice after it has deleted the QP from kernel space, but before it has cleared > it from userspace store (mlx4_clear_qp). > If the other thread creates a qp during this break, it gets the same QP base number and overwrites > the destroyed QPs entry with mlx4_store_qp(). Yes, looks like a real bug. > 2. Create a mutex for this purpose, and use it to force the create and destroy qp operations > to be atomic WRT the ibv_cmd_xxx_qp operations and the store/clear qp operations. This looks like the best solution. I wonder if we should just add this synchronization in libibverbs rather than individual drivers? I notice that libcxgb3 seems to have the same bug AFAICS. But maybe it's better to just keep the simple rule that driver libraries are responsible for locking their own data structures. - R. From weiny2 at llnl.gov Thu Nov 20 16:38:09 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 20 Nov 2008 16:38:09 -0800 Subject: [ofa-general] [PATCH 0/3] ibnetdiscover library "libibnetdisc" Message-ID: <20081120163809.26a3c499.weiny2@llnl.gov> The following 3 patches implement "libibnetdisc" which provides the functionality of ibnetdiscover in a C library. I mentioned this to Sasha at the last Sonoma conference and posted the bulk of this code to the list a few months ago. This libary is still providing the 85% performance speed up of iblinkinfo.pl on our clusters. This new series is heavily tested and, for our hardware, preserves the functionality of ibnetdiscover. Since I don't have a Xsigo box to test on I can only verify that it compiles correctly. Ira From weiny2 at llnl.gov Thu Nov 20 16:38:15 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 20 Nov 2008 16:38:15 -0800 Subject: [ofa-general] [PATCH 3/3] Convert ibnetdiscover to use new ibnetdisc library. Message-ID: <20081120163815.5cd110fb.weiny2@llnl.gov> >From e2b8bac5d651c2278719d511dee2ab2e8ad05706 Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Thu, 20 Nov 2008 09:29:57 -0800 Subject: [PATCH] Convert ibnetdiscover to use new ibnetdisc library. Removed -e and -v since they were somewhat redundant with the -d option. All other functionality is preserved Signed-off-by: Ira Weiny --- infiniband-diags/Makefile.am | 4 +- infiniband-diags/man/ibnetdiscover.8 | 10 +- infiniband-diags/src/ibnetdiscover.c | 910 ++++++++++------------------------ 3 files changed, 254 insertions(+), 670 deletions(-) diff --git a/infiniband-diags/Makefile.am b/infiniband-diags/Makefile.am index 8f26749..420c69e 100644 --- a/infiniband-diags/Makefile.am +++ b/infiniband-diags/Makefile.am @@ -35,9 +35,9 @@ sbin_SCRIPTS = scripts/ibcheckerrs scripts/ibchecknet scripts/ibchecknode \ src_ibaddr_SOURCES = src/ibaddr.c src/ibdiag_common.c src_ibaddr_CFLAGS = -Wall $(DBGFLAGS) -src_ibnetdiscover_SOURCES = src/ibnetdiscover.c src/grouping.c src/ibdiag_common.c +src_ibnetdiscover_SOURCES = src/ibnetdiscover.c src/ibdiag_common.c src_ibnetdiscover_CFLAGS = -Wall $(DBGFLAGS) -src_ibnetdiscover_LDFLAGS = -Wl,--rpath -Wl,$(libdir) +src_ibnetdiscover_LDFLAGS = -Wl,--rpath -Wl,$(libdir) -libnetdisc src_iblinkinfo_pl_SOURCES = src/iblinkinfo.c src_iblinkinfo_pl_CFLAGS = -Wall $(DBGFLAGS) diff --git a/infiniband-diags/man/ibnetdiscover.8 b/infiniband-diags/man/ibnetdiscover.8 index 958efa9..768d392 100644 --- a/infiniband-diags/man/ibnetdiscover.8 +++ b/infiniband-diags/man/ibnetdiscover.8 @@ -5,7 +5,7 @@ ibnetdiscover \- discover InfiniBand topology .SH SYNOPSIS .B ibnetdiscover -[\-d(ebug)] [\-e(rr_show)] [\-v(erbose)] [\-s(how)] [\-l(ist)] [\-g(rouping)] [\-H(ca_list)] [\-S(witch_list)] [\-R(outer_list)] [\-C ca_name] [\-P ca_port] [\-t(imeout) timeout_ms] [\-V(ersion)] [\--node-name-map ] [\-p(orts)] [\-h(elp)] [] +[\-d(ebug)] [\-s(how)] [\-l(ist)] [\-g(rouping)] [\-H(ca_list)] [\-S(witch_list)] [\-R(outer_list)] [\-C ca_name] [\-P ca_port] [\-t(imeout) timeout_ms] [\-V(ersion)] [\--node-name-map ] [\-p(orts)] [\-h(elp)] [] .SH DESCRIPTION .PP @@ -37,7 +37,7 @@ List of connected switches List of connected routers .TP \fB\-s\fR, \fB\-\-show\fR -Show more information +Show progress information during discovery. .TP \fB\-\-node\-name\-map\fR Specify a node name map. The node name map file maps GUIDs to more user friendly @@ -57,15 +57,9 @@ using the util_name -h syntax. # Debugging flags .PP \-d raise the IB debugging level. - May be used several times (-ddd or -d -d -d). -.PP -\-e show send and receive errors (timeouts and others) .PP \-h show the usage message .PP -\-v increase the application verbosity level. - May be used several times (-vv or -v -v -v) -.PP \-V show the version info. # Other common flags: diff --git a/infiniband-diags/src/ibnetdiscover.c b/infiniband-diags/src/ibnetdiscover.c index 2cfaa8a..d8ead48 100644 --- a/infiniband-diags/src/ibnetdiscover.c +++ b/infiniband-diags/src/ibnetdiscover.c @@ -1,6 +1,7 @@ /* * Copyright (c) 2004-2008 Voltaire Inc. All rights reserved. * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -47,483 +48,108 @@ #include #include -#include -#include -#include #include +#include +#include -#include "ibnetdiscover.h" -#include "grouping.h" #include "ibdiag_common.h" -static char *node_type_str[] = { - "???", - "ca", - "switch", - "router", - "iwarp rnic" -}; - -static char *linkwidth_str[] = { - "??", - "1x", - "4x", - "??", - "8x", - "??", - "??", - "??", - "12x" -}; - -static char *linkspeed_str[] = { - "???", - "SDR", - "DDR", - "???", - "QDR" -}; - -static int timeout = 2000; /* ms */ -static int dumplevel = 0; +static int debug; static int verbose; -static FILE *f; +#define LIST_CA_NODE (1 << IBND_CA_NODE) +#define LIST_SWITCH_NODE (1 << IBND_SWITCH_NODE) +#define LIST_ROUTER_NODE (1 << IBND_ROUTER_NODE) char *argv0 = "ibnetdiscover"; +static FILE *f; static char *node_name_map_file = NULL; static nn_map_t *node_name_map = NULL; -Node *nodesdist[MAXHOPS+1]; /* last is Ca list */ -Node *mynode; -int maxhops_discovered = 0; - -struct ChassisList *chassis = NULL; - -static char * -get_linkwidth_str(int linkwidth) -{ - if (linkwidth > 8) - return linkwidth_str[0]; - else - return linkwidth_str[linkwidth]; -} - -static char * -get_linkspeed_str(int linkspeed) -{ - if (linkspeed > 4) - return linkspeed_str[0]; - else - return linkspeed_str[linkspeed]; -} - -static inline const char* -node_type_str2(Node *node) -{ - switch(node->type) { - case SWITCH_NODE: return "SW"; - case CA_NODE: return "CA"; - case ROUTER_NODE: return "RT"; - } - return "??"; -} - -void -decode_port_info(void *pi, Port *port) -{ - mad_decode_field(pi, IB_PORT_LID_F, &port->lid); - mad_decode_field(pi, IB_PORT_LMC_F, &port->lmc); - mad_decode_field(pi, IB_PORT_STATE_F, &port->state); - mad_decode_field(pi, IB_PORT_PHYS_STATE_F, &port->physstate); - mad_decode_field(pi, IB_PORT_LINK_WIDTH_ACTIVE_F, &port->linkwidth); - mad_decode_field(pi, IB_PORT_LINK_SPEED_ACTIVE_F, &port->linkspeed); -} - - -int -get_port(Port *port, int portnum, ib_portid_t *portid) -{ - char portinfo[64]; - void *pi = portinfo; - - port->portnum = portnum; - - if (!smp_query(pi, portid, IB_ATTR_PORT_INFO, portnum, timeout)) - return -1; - decode_port_info(pi, port); - - DEBUG("portid %s portnum %d: lid %d state %d physstate %d %s %s", - portid2str(portid), portnum, port->lid, port->state, port->physstate, get_linkwidth_str(port->linkwidth), get_linkspeed_str(port->linkspeed)); - return 1; -} -/* - * Returns 0 if non switch node is found, 1 if switch is found, -1 if error. - */ -int -get_node(Node *node, Port *port, ib_portid_t *portid) -{ - char portinfo[64]; - char switchinfo[64]; - void *pi = portinfo, *ni = node->nodeinfo, *nd = node->nodedesc; - void *si = switchinfo; - - if (!smp_query(ni, portid, IB_ATTR_NODE_INFO, 0, timeout)) - return -1; - - mad_decode_field(ni, IB_NODE_GUID_F, &node->nodeguid); - mad_decode_field(ni, IB_NODE_TYPE_F, &node->type); - mad_decode_field(ni, IB_NODE_NPORTS_F, &node->numports); - mad_decode_field(ni, IB_NODE_DEVID_F, &node->devid); - mad_decode_field(ni, IB_NODE_VENDORID_F, &node->vendid); - mad_decode_field(ni, IB_NODE_SYSTEM_GUID_F, &node->sysimgguid); - mad_decode_field(ni, IB_NODE_PORT_GUID_F, &node->portguid); - mad_decode_field(ni, IB_NODE_LOCAL_PORT_F, &node->localport); - port->portnum = node->localport; - port->portguid = node->portguid; - - if (!smp_query(nd, portid, IB_ATTR_NODE_DESC, 0, timeout)) - return -1; - - if (!smp_query(pi, portid, IB_ATTR_PORT_INFO, 0, timeout)) - return -1; - decode_port_info(pi, port); - - if (node->type != SWITCH_NODE) - return 0; - - node->smalid = port->lid; - node->smalmc = port->lmc; - - /* after we have the sma information find out the real PortInfo for this port */ - if (!smp_query(pi, portid, IB_ATTR_PORT_INFO, node->localport, timeout)) - return -1; - decode_port_info(pi, port); - - if (!smp_query(si, portid, IB_ATTR_SWITCH_INFO, 0, timeout)) - node->smaenhsp0 = 0; /* assume base SP0 */ - else - mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, &node->smaenhsp0); - - DEBUG("portid %s: got switch node %" PRIx64 " '%s'", - portid2str(portid), node->nodeguid, node->nodedesc); - return 1; -} - -static int -extend_dpath(ib_dr_path_t *path, int nextport) -{ - if (path->cnt+2 >= sizeof(path->p)) - return -1; - ++path->cnt; - if (path->cnt > maxhops_discovered) - maxhops_discovered = path->cnt; - path->p[path->cnt] = nextport; - return path->cnt; -} - -static void -dump_endnode(ib_portid_t *path, char *prompt, Node *node, Port *port) -{ - if (!dumplevel) - return; - - fprintf(f, "%s -> %s %s {%016" PRIx64 "} portnum %d lid %d-%d\"%s\"\n", - portid2str(path), prompt, - (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), - node->nodeguid, node->type == SWITCH_NODE ? 0 : port->portnum, - port->lid, port->lid + (1 << port->lmc) - 1, - clean_nodedesc(node->nodedesc)); -} - -#define HASHGUID(guid) ((uint32_t)(((uint32_t)(guid) * 101) ^ ((uint32_t)((guid) >> 32) * 103))) -#define HTSZ 137 - -static Node *nodestbl[HTSZ]; - -static Node * -find_node(Node *new) -{ - int hash = HASHGUID(new->nodeguid) % HTSZ; - Node *node; - - for (node = nodestbl[hash]; node; node = node->htnext) - if (node->nodeguid == new->nodeguid) - return node; - - return NULL; -} - -static Node * -create_node(Node *temp, ib_portid_t *path, int dist) -{ - Node *node; - int hash = HASHGUID(temp->nodeguid) % HTSZ; - - node = malloc(sizeof(*node)); - if (!node) - return NULL; - - memcpy(node, temp, sizeof(*node)); - node->dist = dist; - node->path = *path; - - node->htnext = nodestbl[hash]; - nodestbl[hash] = node; - - if (node->type != SWITCH_NODE) - dist = MAXHOPS; /* special Ca list */ - - node->dnext = nodesdist[dist]; - nodesdist[dist] = node; - - return node; -} - -static Port * -find_port(Node *node, Port *port) -{ - Port *old; - - for (old = node->ports; old; old = old->next) - if (old->portnum == port->portnum) - return old; - - return NULL; -} - -static Port * -create_port(Node *node, Port *temp) -{ - Port *port; - - port = malloc(sizeof(*port)); - if (!port) - return NULL; - - memcpy(port, temp, sizeof(*port)); - port->node = node; - port->next = node->ports; - node->ports = port; - - return port; -} - -static void -link_ports(Node *node, Port *port, Node *remotenode, Port *remoteport) -{ - DEBUG("linking: 0x%" PRIx64 " %p->%p:%u and 0x%" PRIx64 " %p->%p:%u", - node->nodeguid, node, port, port->portnum, - remotenode->nodeguid, remotenode, remoteport, remoteport->portnum); - if (port->remoteport) - port->remoteport->remoteport = NULL; - if (remoteport->remoteport) - remoteport->remoteport->remoteport = NULL; - port->remoteport = remoteport; - remoteport->remoteport = port; -} - -static int -handle_port(Node *node, Port *port, ib_portid_t *path, int portnum, int dist) -{ - Node node_buf; - Port port_buf; - Node *remotenode, *oldnode; - Port *remoteport, *oldport; - - memset(&node_buf, 0, sizeof(node_buf)); - memset(&port_buf, 0, sizeof(port_buf)); - - DEBUG("handle node %p port %p:%d dist %d", node, port, portnum, dist); - if (port->physstate != 5) /* LinkUp */ - return -1; - - if (extend_dpath(&path->drpath, portnum) < 0) - return -1; - - if (get_node(&node_buf, &port_buf, path) < 0) { - IBWARN("NodeInfo on %s failed, skipping port", - portid2str(path)); - path->drpath.cnt--; /* restore path */ - return -1; - } - - oldnode = find_node(&node_buf); - if (oldnode) - remotenode = oldnode; - else if (!(remotenode = create_node(&node_buf, path, dist + 1))) - IBERROR("no memory"); - - oldport = find_port(remotenode, &port_buf); - if (oldport) { - remoteport = oldport; - if (node != remotenode || port != remoteport) - IBWARN("port moving..."); - } else if (!(remoteport = create_port(remotenode, &port_buf))) - IBERROR("no memory"); - - dump_endnode(path, oldnode ? "known remote" : "new remote", - remotenode, remoteport); - - link_ports(node, port, remotenode, remoteport); - - path->drpath.cnt--; /* restore path */ - return 0; -} - -/* - * Return 1 if found, 0 if not, -1 on errors. - */ -static int -discover(ib_portid_t *from) -{ - Node node_buf; - Port port_buf; - Node *node; - Port *port; - int i; - int dist = 0; - ib_portid_t *path; - - DEBUG("from %s", portid2str(from)); - - memset(&node_buf, 0, sizeof(node_buf)); - memset(&port_buf, 0, sizeof(port_buf)); - - if (get_node(&node_buf, &port_buf, from) < 0) { - IBWARN("can't reach node %s", portid2str(from)); - return -1; - } - - node = create_node(&node_buf, from, 0); - if (!node) - IBERROR("out of memory"); - - mynode = node; - - port = create_port(node, &port_buf); - if (!port) - IBERROR("out of memory"); - - if (node->type != SWITCH_NODE && - handle_port(node, port, from, node->localport, 0) < 0) - return 0; - - for (dist = 0; dist < MAXHOPS; dist++) { - - for (node = nodesdist[dist]; node; node = node->dnext) { - - path = &node->path; - - DEBUG("dist %d node %p", dist, node); - dump_endnode(path, "processing", node, port); - - for (i = 1; i <= node->numports; i++) { - if (i == node->localport) - continue; - - if (get_port(&port_buf, i, path) < 0) { - IBWARN("can't reach node %s port %d", portid2str(path), i); - continue; - } - - port = find_port(node, &port_buf); - if (port) - continue; - - port = create_port(node, &port_buf); - if (!port) - IBERROR("out of memory"); - - /* If switch, set port GUID to node GUID */ - if (node->type == SWITCH_NODE) - port->portguid = node->portguid; - - handle_port(node, port, path, i, dist); - } - } - } +static int timeout_ms = 2000; +static int dumplevel = 0; - return 0; -} char * -node_name(Node *node) +node_name(ibnd_node_t *node) { static char buf[256]; - switch(node->type) { - case SWITCH_NODE: - sprintf(buf, "\"%s", "S"); - break; - case CA_NODE: + switch(node->info.type) { + case IBND_CA_NODE: sprintf(buf, "\"%s", "H"); break; - case ROUTER_NODE: + case IBND_SWITCH_NODE: + sprintf(buf, "\"%s", "S"); + break; + case IBND_ROUTER_NODE: sprintf(buf, "\"%s", "R"); break; default: sprintf(buf, "\"%s", "?"); break; } - sprintf(buf+2, "-%016" PRIx64 "\"", node->nodeguid); + sprintf(buf+2, "-%016" PRIx64 "\"", node->info.nodeguid); return buf; } void -list_node(Node *node) +list_node(ibnd_node_t *node, void *user_data) { - char *node_type; - char *nodename = remap_node_name(node_name_map, node->nodeguid, + char *nodename = remap_node_name(node_name_map, node->info.nodeguid, node->nodedesc); - switch(node->type) { - case SWITCH_NODE: - node_type = "Switch"; - break; - case CA_NODE: - node_type = "Ca"; - break; - case ROUTER_NODE: - node_type = "Router"; - break; - default: - node_type = "???"; - break; - } fprintf(f, "%s\t : 0x%016" PRIx64 " ports %d devid 0x%x vendid 0x%x \"%s\"\n", - node_type, - node->nodeguid, node->numports, node->devid, node->vendid, + ibnd_node_type_str(node), + node->info.nodeguid, node->info.numports, node->info.devid, + node->info.vendid, nodename); free(nodename); } void -out_ids(Node *node, int group, char *chname) +list_nodes(ibnd_fabric_t *fabric, int list) +{ + if (list & LIST_CA_NODE) { + ibnd_iter_nodes_type(fabric, list_node, IBND_CA_NODE, NULL); + } + if (list & LIST_SWITCH_NODE) { + ibnd_iter_nodes_type(fabric, list_node, IBND_SWITCH_NODE, NULL); + } + if (list & LIST_ROUTER_NODE) { + ibnd_iter_nodes_type(fabric, list_node, IBND_ROUTER_NODE, NULL); + } +} + +void +out_ids(ibnd_node_t *node, int group, char *chname) { - fprintf(f, "\nvendid=0x%x\ndevid=0x%x\n", node->vendid, node->devid); - if (node->sysimgguid) - fprintf(f, "sysimgguid=0x%" PRIx64, node->sysimgguid); + fprintf(f, "\nvendid=0x%x\ndevid=0x%x\n", node->info.vendid, node->info.devid); + if (node->info.sysimgguid) + fprintf(f, "sysimgguid=0x%" PRIx64, node->info.sysimgguid); if (group && node->chrecord && node->chrecord->chassisnum) { fprintf(f, "\t\t# Chassis %d", node->chrecord->chassisnum); if (chname) - fprintf(f, " (%s)", chname); - if (is_xsigo_tca(node->nodeguid) && node->ports->remoteport) - fprintf(f, " slot %d", node->ports->remoteport->portnum); + fprintf(f, " (%s)", clean_nodedesc(chname)); + if (ibnd_is_xsigo_tca(node->info.nodeguid) + && node->ports[1] + && node->ports[1]->remoteport) + fprintf(f, " slot %d", node->ports[1]->remoteport->portnum); } fprintf(f, "\n"); } + uint64_t -out_chassis(int chassisnum) +out_chassis(ibnd_fabric_t *fabric, int chassisnum) { uint64_t guid; fprintf(f, "\nChassis %d", chassisnum); - guid = get_chassis_guid(chassisnum); + guid = ibnd_get_chassis_guid(fabric, chassisnum); if (guid) fprintf(f, " (guid 0x%" PRIx64 ")", guid); fprintf(f, "\n"); @@ -531,54 +157,49 @@ out_chassis(int chassisnum) } void -out_switch(Node *node, int group, char *chname) +out_switch(ibnd_node_t *node, int group, char *chname) { char *str; + char str2[256]; char *nodename = NULL; out_ids(node, group, chname); - fprintf(f, "switchguid=0x%" PRIx64, node->nodeguid); - fprintf(f, "(%" PRIx64 ")", node->portguid); - /* Currently, only if Voltaire chassis */ - if (group - && node->chrecord && node->chrecord->chassisnum - && node->vendid == VTR_VENDOR_ID) { - str = get_chassis_type(node->chrecord->chassistype); + fprintf(f, "switchguid=0x%" PRIx64, node->info.nodeguid); + fprintf(f, "(%" PRIx64 ")", node->info.nodeportguid); + if (group) { + str = ibnd_get_chassis_type(node); if (str) fprintf(f, "%s ", str); - str = get_chassis_slot(node->chrecord->chassisslot); + str = ibnd_get_chassis_slot_str(node, str2, 256); if (str) - fprintf(f, "%s ", str); - fprintf(f, "%d Chip %d", node->chrecord->slotnum, node->chrecord->anafanum); + fprintf(f, "%s", str); } - nodename = remap_node_name(node_name_map, node->nodeguid, + nodename = remap_node_name(node_name_map, node->info.nodeguid, node->nodedesc); fprintf(f, "\nSwitch\t%d %s\t\t# \"%s\" %s port 0 lid %d lmc %d\n", - node->numports, node_name(node), + node->info.numports, node_name(node), nodename, - node->smaenhsp0 ? "enhanced" : "base", + node->sw_info.smaenhsp0 ? "enhanced" : "base", node->smalid, node->smalmc); free(nodename); } void -out_ca(Node *node, int group, char *chname) +out_ca(ibnd_node_t *node, int group, char *chname) { char *node_type; char *node_type2; - char *nodename = remap_node_name(node_name_map, node->nodeguid, - node->nodedesc); out_ids(node, group, chname); - switch(node->type) { - case CA_NODE: + switch(node->info.type) { + case IBND_CA_NODE: node_type = "ca"; node_type2 = "Ca"; break; - case ROUTER_NODE: + case IBND_ROUTER_NODE: node_type = "rt"; node_type2 = "Rt"; break; @@ -588,37 +209,37 @@ out_ca(Node *node, int group, char *chname) break; } - fprintf(f, "%sguid=0x%" PRIx64 "\n", node_type, node->nodeguid); + fprintf(f, "%sguid=0x%" PRIx64 "\n", node_type, node->info.nodeguid); fprintf(f, "%s\t%d %s\t\t# \"%s\"", - node_type2, node->numports, node_name(node), - nodename); - if (group && is_xsigo_hca(node->nodeguid)) + node_type2, node->info.numports, node_name(node), + clean_nodedesc(node->nodedesc)); + if (group && ibnd_is_xsigo_hca(node->info.nodeguid)) fprintf(f, " (scp)"); fprintf(f, "\n"); - - free(nodename); } +#define OUT_BUFFER_SIZE 16 static char * -out_ext_port(Port *port, int group) +out_ext_port(ibnd_port_t *port, int group) { - char *str = NULL; + static char mapping[OUT_BUFFER_SIZE]; - /* Currently, only if Voltaire chassis */ - if (group - && port->node->chrecord && port->node->vendid == VTR_VENDOR_ID) - str = portmapstring(port); + if (group && port->ext_portnum != 0) { + snprintf(mapping, OUT_BUFFER_SIZE, + "[ext %d]", port->ext_portnum); + return (mapping); + } - return (str); + return (NULL); } void -out_switch_port(Port *port, int group) +out_switch_port(ibnd_port_t *port, int group) { char *ext_port_str = NULL; char *rem_nodename = NULL; - DEBUG("port %p:%d remoteport %p", port, port->portnum, port->remoteport); + DEBUG("port %p:%d remoteport %p\n", port, port->portnum, port->remoteport); fprintf(f, "[%d]", port->portnum); ext_port_str = out_ext_port(port, group); @@ -626,7 +247,7 @@ out_switch_port(Port *port, int group) fprintf(f, "%s", ext_port_str); rem_nodename = remap_node_name(node_name_map, - port->remoteport->node->nodeguid, + port->remoteport->node->info.nodeguid, port->remoteport->node->nodedesc); ext_port_str = out_ext_port(port->remoteport, group); @@ -634,17 +255,17 @@ out_switch_port(Port *port, int group) node_name(port->remoteport->node), port->remoteport->portnum, ext_port_str ? ext_port_str : ""); - if (port->remoteport->node->type != SWITCH_NODE) - fprintf(f, "(%" PRIx64 ") ", port->remoteport->portguid); + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) + fprintf(f, "(%" PRIx64 ") ", port->remoteport->guid); fprintf(f, "\t\t# \"%s\" lid %d %s%s", rem_nodename, - port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid, - get_linkwidth_str(port->linkwidth), - get_linkspeed_str(port->linkspeed)); + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active)); - if (is_xsigo_tca(port->remoteport->portguid)) + if (ibnd_is_xsigo_tca(port->remoteport->guid)) fprintf(f, " slot %d", port->portnum); - else if (is_xsigo_hca(port->remoteport->portguid)) + else if (ibnd_is_xsigo_hca(port->remoteport->guid)) fprintf(f, " (scp)"); fprintf(f, "\n"); @@ -652,68 +273,80 @@ out_switch_port(Port *port, int group) } void -out_ca_port(Port *port, int group) +out_ca_port(ibnd_port_t *port, int group) { char *str = NULL; char *rem_nodename = NULL; fprintf(f, "[%d]", port->portnum); - if (port->node->type != SWITCH_NODE) - fprintf(f, "(%" PRIx64 ") ", port->portguid); + if (port->node->info.type != IBND_SWITCH_NODE) + fprintf(f, "(%" PRIx64 ") ", port->guid); fprintf(f, "\t%s[%d]", node_name(port->remoteport->node), port->remoteport->portnum); str = out_ext_port(port->remoteport, group); if (str) fprintf(f, "%s", str); - if (port->remoteport->node->type != SWITCH_NODE) - fprintf(f, " (%" PRIx64 ") ", port->remoteport->portguid); + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) + fprintf(f, " (%" PRIx64 ") ", port->remoteport->guid); rem_nodename = remap_node_name(node_name_map, - port->remoteport->node->nodeguid, + port->remoteport->node->info.nodeguid, port->remoteport->node->nodedesc); fprintf(f, "\t\t# lid %d lmc %d \"%s\" lid %d %s%s\n", - port->lid, port->lmc, rem_nodename, - port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid, - get_linkwidth_str(port->linkwidth), - get_linkspeed_str(port->linkspeed)); + port->info.lid, port->info.lmc, rem_nodename, + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active)); free(rem_nodename); } int -dump_topology(int listtype, int group) +dump_topology(int group, ibnd_fabric_t *fabric) { - Node *node; - Port *port; - int i = 0, dist = 0; + ibnd_node_t *node; + ibnd_port_t *port; + int i = 0, dist = 0, p = 0; time_t t = time(0); uint64_t chguid; char *chname = NULL; - if (!listtype) { - fprintf(f, "#\n# Topology file: generated on %s#\n", ctime(&t)); - fprintf(f, "# Max of %d hops discovered\n", maxhops_discovered); - fprintf(f, "# Initiated from node %016" PRIx64 " port %016" PRIx64 "\n", mynode->nodeguid, mynode->portguid); - } + fprintf(f, "#\n# Topology file: generated on %s#\n", ctime(&t)); + fprintf(f, "# Max of %d hops discovered\n", fabric->maxhops_discovered); + fprintf(f, "# Initiated from node %016" PRIx64 " port %016" PRIx64 "\n", + fabric->from_node->info.nodeguid, fabric->from_node->info.nodeportguid); /* Make pass on switches */ - if (group && !listtype) { - ChassisList *ch = NULL; + if (group) { + ibnd_chassis_list_t *ch = NULL; /* Chassis based switches first */ - for (ch = chassis; ch; ch = ch->next) { + for (ch = fabric->chassis; ch; ch = ch->next) { int n = 0; if (!ch->chassisnum) continue; - chguid = out_chassis(ch->chassisnum); - if (chname) - free(chname); + chguid = out_chassis(fabric, ch->chassisnum); + chname = NULL; - if (is_xsigo_guid(chguid)) { - for (node = nodesdist[MAXHOPS]; node; node = node->dnext) { +/** + * Hal will this work for Xsigo? + */ + if (ibnd_is_xsigo_guid(chguid)) { + for (node = ch->nodes; node; node = node->chassis_next) { + if (ibnd_is_xsigo_hca(node->info.nodeguid)) { + chname = node->nodedesc; + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); + } + } + +#if 0 +/** + * vs. this? + */ + for (node = fabric->nodesdist[MAXHOPS]; node; node = node->dnext) { if (!node->chrecord || !node->chrecord->chassisnum) continue; @@ -721,209 +354,171 @@ dump_topology(int listtype, int group) if (node->chrecord->chassisnum != ch->chassisnum) continue; - if (is_xsigo_hca(node->nodeguid)) { - chname = remap_node_name(node_name_map, - node->nodeguid, - node->nodedesc); - fprintf(f, "Hostname: %s\n", chname); + if (ibnd_is_xsigo_hca(node->nodeguid)) { + chname = node->nodedesc; + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); } } +#endif } fprintf(f, "\n# Spine Nodes"); - for (n = 1; n <= (SPINES_MAX_NUM+1); n++) { + for (n = 1; n <= SPINES_MAX_NUM; n++) { if (ch->spinenode[n]) { out_switch(ch->spinenode[n], group, chname); - for (port = ch->spinenode[n]->ports; port; port = port->next, i++) - if (port->remoteport) + for (p = 1; p <= ch->spinenode[n]->info.numports; p++) { + port = ch->spinenode[n]->ports[p]; + if (port && port->remoteport) out_switch_port(port, group); + } } } fprintf(f, "\n# Line Nodes"); - for (n = 1; n <= (LINES_MAX_NUM+1); n++) { + for (n = 1; n <= LINES_MAX_NUM; n++) { if (ch->linenode[n]) { out_switch(ch->linenode[n], group, chname); - for (port = ch->linenode[n]->ports; port; port = port->next, i++) - if (port->remoteport) + for (p = 1; p <= ch->linenode[n]->info.numports; p++) { + port = ch->linenode[n]->ports[p]; + if (port && port->remoteport) out_switch_port(port, group); + } } } fprintf(f, "\n# Chassis Switches"); - for (dist = 0; dist <= maxhops_discovered; dist++) { - - for (node = nodesdist[dist]; node; node = node->dnext) { - - /* Non Voltaire chassis */ - if (node->vendid == VTR_VENDOR_ID) - continue; - if (!node->chrecord || - !node->chrecord->chassisnum) - continue; - - if (node->chrecord->chassisnum != ch->chassisnum) - continue; - + for (node = ch->nodes; node; node = node->chassis_next) { + if (node->info.type == IBND_SWITCH_NODE) { out_switch(node, group, chname); - for (port = node->ports; port; port = port->next, i++) - if (port->remoteport) + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) out_switch_port(port, group); - + } } - } fprintf(f, "\n# Chassis CAs"); - for (node = nodesdist[MAXHOPS]; node; node = node->dnext) { - if (!node->chrecord || - !node->chrecord->chassisnum) - continue; - - if (node->chrecord->chassisnum != ch->chassisnum) - continue; - - out_ca(node, group, chname); - for (port = node->ports; port; port = port->next, i++) - if (port->remoteport) - out_ca_port(port, group); - + for (node = ch->nodes; node; node = node->chassis_next) { + if (node->info.type == IBND_CA_NODE) { + out_ca(node, group, chname); + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_ca_port(port, group); + } + } } } - } else { - for (dist = 0; dist <= maxhops_discovered; dist++) { - - for (node = nodesdist[dist]; node; node = node->dnext) { - - DEBUG("SWITCH: dist %d node %p", dist, node); - if (!listtype) - out_switch(node, group, chname); - else { - if (listtype & LIST_SWITCH_NODE) - list_node(node); - continue; - } - - for (port = node->ports; port; port = port->next, i++) - if (port->remoteport) + } else { /* !group */ + for (node = fabric->switches; node; node = node->type_next) { + DEBUG("SWITCH: dist %d node %p\n", dist, node); + out_switch(node, group, chname); + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) out_switch_port(port, group); - } + } } } - if (chname) - free(chname); chname = NULL; - if (group && !listtype) { - + if (group) { fprintf(f, "\nNon-Chassis Nodes\n"); - - for (dist = 0; dist <= maxhops_discovered; dist++) { - - for (node = nodesdist[dist]; node; node = node->dnext) { - - DEBUG("SWITCH: dist %d node %p", dist, node); + for (node = fabric->switches; node; node = node->type_next) { + DEBUG("SWITCH: dist %d node %p\n", dist, node); /* Now, skip chassis based switches */ if (node->chrecord && node->chrecord->chassisnum) continue; out_switch(node, group, chname); - for (port = node->ports; port; port = port->next, i++) - if (port->remoteport) + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) out_switch_port(port, group); - } - + } } } /* Make pass on CAs */ - for (node = nodesdist[MAXHOPS]; node; node = node->dnext) { - - DEBUG("CA: dist %d node %p", dist, node); - if (!listtype) { - /* Now, skip chassis based CAs */ - if (group && node->chrecord && - node->chrecord->chassisnum) - continue; - out_ca(node, group, chname); - } else { - if (((listtype & LIST_CA_NODE) && (node->type == CA_NODE)) || - ((listtype & LIST_ROUTER_NODE) && (node->type == ROUTER_NODE))) - list_node(node); + for (node = fabric->ch_adapters; node; node = node->type_next) { + DEBUG("CA: dist %d node %p\n", dist, node); + /* Now, skip chassis based CAs */ + if (group && node->chrecord && + node->chrecord->chassisnum) continue; - } + out_ca(node, group, chname); - for (port = node->ports; port; port = port->next, i++) - if (port->remoteport) + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) out_ca_port(port, group); + } } - if (chname) - free(chname); + /* make pass on routers */ + for (node = fabric->routers; node; node = node->type_next) { + DEBUG("RT: dist %d node %p\n", dist, node); + /* Now, skip chassis based CAs */ + if (group && node->chrecord && + node->chrecord->chassisnum) + continue; + out_ca(node, group, chname); + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_ca_port(port, group); + } + } return i; } -void dump_ports_report () + +void dump_ports_report (ibnd_node_t *node, void *user_data) { - int b, n = 0, p; - Node *node; - Port *port; - - // If switch and LID == 0, search of other switch ports with - // valid LID and assign it to all ports of that switch - for (b = 0; b <= MAXHOPS; b++) - for (node = nodesdist[b]; node; node = node->dnext) - if (node->type == SWITCH_NODE) { - int swlid = 0; - for (p = 0, port = node->ports; - p < node->numports && port && !swlid; - port = port->next) - if (port->lid != 0) - swlid = port->lid; - for (p = 0, port = node->ports; - p < node->numports && port; - port = port->next) - port->lid = swlid; - } + int p = 0; + ibnd_port_t *port = NULL; + + /* for each port */ + for (p = node->info.numports, port = node->ports[p]; + p > 0; + port = node->ports[--p]) { + if (port == NULL) + continue; - for (b = 0; b <= MAXHOPS; b++) - for (node = nodesdist[b]; node; node = node->dnext) { - for (p = 0, port = node->ports; - p < node->numports && port; - p++, port = port->next) { - fprintf(stdout, - "%2s %5d %2d 0x%016" PRIx64 " %s %s", - node_type_str2(port->node), port->lid, - port->portnum, - port->portguid, - get_linkwidth_str(port->linkwidth), - get_linkspeed_str(port->linkspeed)); - if (port->remoteport) - fprintf(stdout, - " - %2s %5d %2d 0x%016" PRIx64 - " ( '%s' - '%s' )\n", - node_type_str2(port->remoteport->node), - port->remoteport->lid, - port->remoteport->portnum, - port->remoteport->portguid, - port->node->nodedesc, - port->remoteport->node->nodedesc); - else - fprintf(stdout, "%36s'%s'\n", "", - port->node->nodedesc); - } - n++; - } + fprintf(stdout, + "%2s %5d %2d 0x%016" PRIx64 " %s %s", + ibnd_node_type_str_short(node), + node->info.type == IBND_SWITCH_NODE ? node->smalid : port->info.lid, + port->portnum, + port->guid, + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active)); + if (port->remoteport) + fprintf(stdout, + " - %2s %5d %2d 0x%016" PRIx64 + " ( '%s' - '%s' )\n", + ibnd_node_type_str_short(port->remoteport->node), + port->remoteport->node->info.type == IBND_SWITCH_NODE ? + port->remoteport->node->smalid : port->remoteport->info.lid, + port->remoteport->portnum, + port->remoteport->guid, + port->node->nodedesc, + port->remoteport->node->nodedesc); + else + fprintf(stdout, "%36s'%s'\n", "", + port->node->nodedesc); + } } void usage(void) { - fprintf(stderr, "Usage: %s [-d(ebug)] -e(rr_show) -v(erbose) -s(how) -l(ist) -g(rouping) -H(ca_list) -S(witch_list) -R(outer_list) -V(ersion) -C ca_name -P ca_port " + fprintf(stderr, "Usage: %s [-d(ebug)] -s(how) -l(ist) -g(rouping) -H(ca_list) -S(witch_list) -R(outer_list) -V(ersion) -C ca_name -P ca_port " "-t(imeout) timeout_ms --node-name-map node-name-map] -p(orts) []\n", argv0); fprintf(stderr, " --node-name-map specify a node name map file\n"); @@ -933,20 +528,18 @@ usage(void) int main(int argc, char **argv) { - int mgmt_classes[2] = {IB_SMI_CLASS, IB_SMI_DIRECT_CLASS}; - ib_portid_t my_portid = {0}; - int udebug = 0, list = 0; + int list = 0; char *ca = 0; int ca_port = 0; int group = 0; int ports_report = 0; + ibnd_fabric_t *fabric = NULL; static char const str_opts[] = "C:P:t:devslgHSRpVhu"; static const struct option long_opts[] = { { "C", 1, 0, 'C'}, { "P", 1, 0, 'P'}, { "debug", 0, 0, 'd'}, - { "err_show", 0, 0, 'e'}, { "verbose", 0, 0, 'v'}, { "show", 0, 0, 's'}, { "list", 0, 0, 'l'}, @@ -982,23 +575,17 @@ main(int argc, char **argv) ca_port = strtoul(optarg, 0, 0); break; case 'd': - ibdebug++; - madrpc_show_errors(1); - umad_debug(udebug); - udebug++; + debug = 1; + ibnd_debug(1); break; case 't': - timeout = strtoul(optarg, 0, 0); + timeout_ms = strtoul(optarg, 0, 0); break; case 'v': verbose++; - dumplevel++; break; case 's': - dumplevel = 1; - break; - case 'e': - madrpc_show_errors(1); + ibnd_show_progress(1); break; case 'l': list = LIST_CA_NODE | LIST_SWITCH_NODE | LIST_ROUTER_NODE; @@ -1007,13 +594,13 @@ main(int argc, char **argv) group = 1; break; case 'S': - list = LIST_SWITCH_NODE; + list |= LIST_SWITCH_NODE; break; case 'H': - list = LIST_CA_NODE; + list |= LIST_CA_NODE; break; case 'R': - list = LIST_ROUTER_NODE; + list |= LIST_ROUTER_NODE; break; case 'V': fprintf(stderr, "%s %s\n", argv0, get_build_version() ); @@ -1030,22 +617,25 @@ main(int argc, char **argv) argv += optind; if (argc && !(f = fopen(argv[0], "w"))) - IBERROR("can't open file %s for writing", argv[0]); + fprintf(stderr, "can't open file %s for writing", argv[0]); - madrpc_init(ca, ca_port, mgmt_classes, 2); node_name_map = open_node_name_map(node_name_map_file); - if (discover(&my_portid) < 0) - IBERROR("discover"); - - if (group) - chassis = group_nodes(); + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } if (ports_report) - dump_ports_report(); + ibnd_iter_nodes(fabric, + dump_ports_report, + NULL); + else if (list) + list_nodes(fabric, list); else - dump_topology(list, group); + dump_topology(group, fabric); + ibnd_destroy_fabric(fabric); close_node_name_map(node_name_map); exit(0); } -- 1.5.4.5 From weiny2 at llnl.gov Thu Nov 20 16:38:12 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 20 Nov 2008 16:38:12 -0800 Subject: [ofa-general] [PATCH 1/3] Create a new library libibnetdisc Message-ID: <20081120163812.6230375d.weiny2@llnl.gov> >From 663b13de4253c4d87c73e8d2f50c9b798fa3a4d8 Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Fri, 14 Nov 2008 15:36:03 -0800 Subject: [PATCH] Create a new library libibnetdisc This encompasses the functionality of ibnetdiscover in a C library. It returns a single "ibnd_fabric_t" object which represents the data found during the scan. The NodeInfo, PortInfo, and SwitchInfo are preserved from the queries made on the fabric to be used by the calling function as they see fit. This greatly benefits some diags like iblinkinfo.pl. This diag in particular was re-written using this library in C and has shown an 85% speed up on a ~1000 node cluster. Previous iblinkinfo.pl real 3m35.876s user 0m13.210s sys 1m1.046s New iblinkinfotest real 0m32.869s user 0m0.067s sys 0m0.140s Signed-off-by: Ira Weiny --- libibnetdisc/AUTHORS | 1 + libibnetdisc/COPYING | 384 ++++++++++++ libibnetdisc/ChangeLog | 4 + libibnetdisc/Makefile.am | 73 +++ libibnetdisc/autogen.sh | 11 + libibnetdisc/configure.in | 68 +++ libibnetdisc/include/infiniband/ibnetdisc.h | 306 ++++++++++ libibnetdisc/libibnetdisc.spec.in | 94 +++ libibnetdisc/libibnetdisc.ver | 9 + libibnetdisc/man/ibnd_debug.3 | 2 + libibnetdisc/man/ibnd_destroy_fabric.3 | 2 + libibnetdisc/man/ibnd_discover_fabric.3 | 43 ++ libibnetdisc/man/ibnd_find_node_dr.3 | 2 + libibnetdisc/man/ibnd_find_node_guid.3 | 25 + libibnetdisc/man/ibnd_iter_nodes.3 | 24 + libibnetdisc/man/ibnd_iter_nodes_type.3 | 2 + libibnetdisc/man/ibnd_linkspeed_str.3 | 2 + libibnetdisc/man/ibnd_linkstate_str.3 | 2 + libibnetdisc/man/ibnd_linkwidth_str.3 | 26 + libibnetdisc/man/ibnd_node_type_str.3 | 2 + libibnetdisc/man/ibnd_node_type_str_short.3 | 2 + libibnetdisc/man/ibnd_physstate_str.3 | 2 + libibnetdisc/man/ibnd_update_node.3 | 21 + libibnetdisc/src/chassis.c | 820 +++++++++++++++++++++++++ libibnetdisc/src/chassis.h | 82 +++ libibnetdisc/src/ibnetdisc.c | 863 +++++++++++++++++++++++++++ libibnetdisc/src/libibnetdisc.map | 27 + libibnetdisc/test/iblinkinfotest.c | 395 ++++++++++++ libibnetdisc/test/ibnetdisctest.c | 588 ++++++++++++++++++ libibnetdisc/test/testleaks.c | 261 ++++++++ 30 files changed, 4143 insertions(+), 0 deletions(-) create mode 100644 libibnetdisc/AUTHORS create mode 100644 libibnetdisc/COPYING create mode 100644 libibnetdisc/ChangeLog create mode 100644 libibnetdisc/Makefile.am create mode 100755 libibnetdisc/autogen.sh create mode 100644 libibnetdisc/configure.in create mode 100644 libibnetdisc/include/infiniband/ibnetdisc.h create mode 100644 libibnetdisc/libibnetdisc.spec.in create mode 100644 libibnetdisc/libibnetdisc.ver create mode 100644 libibnetdisc/man/ibnd_debug.3 create mode 100644 libibnetdisc/man/ibnd_destroy_fabric.3 create mode 100644 libibnetdisc/man/ibnd_discover_fabric.3 create mode 100644 libibnetdisc/man/ibnd_find_node_dr.3 create mode 100644 libibnetdisc/man/ibnd_find_node_guid.3 create mode 100644 libibnetdisc/man/ibnd_iter_nodes.3 create mode 100644 libibnetdisc/man/ibnd_iter_nodes_type.3 create mode 100644 libibnetdisc/man/ibnd_linkspeed_str.3 create mode 100644 libibnetdisc/man/ibnd_linkstate_str.3 create mode 100644 libibnetdisc/man/ibnd_linkwidth_str.3 create mode 100644 libibnetdisc/man/ibnd_node_type_str.3 create mode 100644 libibnetdisc/man/ibnd_node_type_str_short.3 create mode 100644 libibnetdisc/man/ibnd_physstate_str.3 create mode 100644 libibnetdisc/man/ibnd_update_node.3 create mode 100644 libibnetdisc/src/chassis.c create mode 100644 libibnetdisc/src/chassis.h create mode 100644 libibnetdisc/src/ibnetdisc.c create mode 100644 libibnetdisc/src/libibnetdisc.map create mode 100644 libibnetdisc/test/iblinkinfotest.c create mode 100644 libibnetdisc/test/ibnetdisctest.c create mode 100644 libibnetdisc/test/testleaks.c diff --git a/libibnetdisc/AUTHORS b/libibnetdisc/AUTHORS new file mode 100644 index 0000000..d7211f9 --- /dev/null +++ b/libibnetdisc/AUTHORS @@ -0,0 +1 @@ +Ira Weiny diff --git a/libibnetdisc/COPYING b/libibnetdisc/COPYING new file mode 100644 index 0000000..a017728 --- /dev/null +++ b/libibnetdisc/COPYING @@ -0,0 +1,384 @@ +This software with the exception of OpenSM is available to you +under a choice of one of two licenses. You may chose to be +licensed under the terms of the the OpenIB.org BSD license or +the GNU General Public License (GPL) Version 2, both included +below. + +OpenSM is licensed under either GNU General Public License (GPL) +Version 2, or Intel BSD + Patent license. See OpenSM for the +specific language for the latter licensing terms. + + +Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + +================================================================== + + OpenIB.org BSD license + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions +are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above + copyright notice, this list of conditions and the following + disclaimer in the documentation and/or other materials provided + with the distribution. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, +BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN +ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. + +================================================================== + + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc. + 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Library General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) year name of author + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + , 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Library General +Public License instead of this License. diff --git a/libibnetdisc/ChangeLog b/libibnetdisc/ChangeLog new file mode 100644 index 0000000..d74037e --- /dev/null +++ b/libibnetdisc/ChangeLog @@ -0,0 +1,4 @@ + +2008-04-09 Ira Weiny + + * Added to git tree diff --git a/libibnetdisc/Makefile.am b/libibnetdisc/Makefile.am new file mode 100644 index 0000000..b5c0dd0 --- /dev/null +++ b/libibnetdisc/Makefile.am @@ -0,0 +1,73 @@ + +SUBDIRS = . + +INCLUDES = -I$(srcdir)/include -I$(includedir) -I$(includedir)/infiniband + +lib_LTLIBRARIES = libibnetdisc.la +sbin_PROGRAMS = + +if ENABLE_TEST_UTILS +sbin_PROGRAMS += test/ibnetdisctest \ + test/iblinkinfotest \ + test/testleaks +endif + +DBGFLAGS = -g + +if HAVE_LD_VERSION_SCRIPT +libibnetdisc_version_script = -Wl,--version-script=$(srcdir)/src/libibnetdisc.map +else +libibnetdisc_version_script = +endif + +libibnetdisc_la_SOURCES = src/ibnetdisc.c src/chassis.c src/chassis.h +libibnetdisc_la_CFLAGS = -Wall $(DBGFLAGS) +libibnetdisc_la_LDFLAGS = -version-info $(ibnetdisc_api_version) \ + -export-dynamic $(libibnetdisc_version_script) \ + -losmcomp -libmad +libibnetdisc_la_DEPENDENCIES = $(srcdir)/src/libibnetdisc.map + +libibnetdiscincludedir = $(includedir)/infiniband + +test_ibnetdisctest_SOURCES = test/ibnetdisctest.c +test_ibnetdisctest_CFLAGS = -Wall $(DBGFLAGS) +test_ibnetdisctest_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ + -libcommon -libnetdisc + +test_iblinkinfotest_SOURCES = test/iblinkinfotest.c +test_iblinkinfotest_CFLAGS = -Wall $(DBGFLAGS) +test_iblinkinfotest_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ + -libcommon -libnetdisc + +test_testleaks_SOURCES = test/testleaks.c +test_testleaks_CFLAGS = -Wall $(DBGFLAGS) +test_testleaks_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ + -libcommon -libnetdisc + +libibnetdiscinclude_HEADERS = $(srcdir)/include/infiniband/ibnetdisc.h + +man_MANS = man/ibnd_debug.3 \ + man/ibnd_destroy_fabric.3 \ + man/ibnd_discover_fabric.3 \ + man/ibnd_find_node_dr.3 \ + man/ibnd_find_node_guid.3 \ + man/ibnd_iter_nodes.3 \ + man/ibnd_iter_nodes_type.3 \ + man/ibnd_linkspeed_str.3 \ + man/ibnd_linkstate_str.3 \ + man/ibnd_linkwidth_str.3 \ + man/ibnd_node_type_str.3 \ + man/ibnd_physstate_str.3 \ + man/ibnd_update_node.3 + +EXTRA_DIST = libibnetdisc.spec.in libibnetdisc.spec \ + $(srcdir)/src/libibnetdisc.map libibnetdisc.ver autogen.sh + +dist-hook: + if [ -x $(top_srcdir)/../gen_chlog.sh ] ; then \ + $(top_srcdir)/../gen_chlog.sh $(PACKAGE) > $(distdir)/ChangeLog ; \ + fi + if [ -x $(top_srcdir)/../gen_ver.sh ] ; then \ + ver=`$(top_srcdir)/../gen_ver.sh $(PACKAGE)` ; \ + sed -e '/AC_INIT/s/$(PACKAGE), .*,/$(PACKAGE), '$$ver',/' $(top_srcdir)/configure.in > $(distdir)/configure.in ; \ + fi diff --git a/libibnetdisc/autogen.sh b/libibnetdisc/autogen.sh new file mode 100755 index 0000000..4827884 --- /dev/null +++ b/libibnetdisc/autogen.sh @@ -0,0 +1,11 @@ +#! /bin/sh + +# create config dir if not exist +test -d config || mkdir config + +set -x +aclocal -I config +libtoolize --force --copy +autoheader +automake --foreign --add-missing --copy +autoconf diff --git a/libibnetdisc/configure.in b/libibnetdisc/configure.in new file mode 100644 index 0000000..e5bb0f9 --- /dev/null +++ b/libibnetdisc/configure.in @@ -0,0 +1,68 @@ +dnl Process this file with autoconf to produce a configure script. + +AC_PREREQ(2.57) +AC_INIT(libibnetdisc, 0.0.1, general at lists.openfabrics.org) +dnl AC_CONFIG_SRCDIR([src/stack.c]) +AC_CONFIG_AUX_DIR(config) +AM_CONFIG_HEADER(config.h) +AM_INIT_AUTOMAKE + +AC_SUBST(RELEASE, ${RELEASE:-unknown}) +AC_SUBST(TARBALL, ${TARBALL:-${PACKAGE}-${VERSION}.tar.gz}) + +dnl the library version info is available in the file: libibnetdisc.ver +ibnetdisc_api_version=`grep LIBVERSION $srcdir/libibnetdisc.ver | sed 's/LIBVERSION=//'` +if test -z $ibnetdisc_api_version; then + echo "FAILED to find $srcdir/libibnetdisc.ver" + exit 1 +fi +AC_SUBST(ibnetdisc_api_version) +AC_DEFINE_UNQUOTED(API_VERSION, + ["$ibnetdisc_api_version"], + [The API version of this library]) + +dnl Checks for programs +AC_PROG_CC +AC_PROG_CPP +AC_PROG_INSTALL +AC_PROG_LN_S +AC_PROG_MAKE_SET +AM_PROG_LIBTOOL + +dnl Checks for header files. +AC_HEADER_STDC +AC_CHECK_HEADERS([stdint.h stdlib.h string.h syslog.h unistd.h]) + +dnl Checks for library functions +AC_TYPE_SIGNAL +AC_FUNC_VPRINTF +AC_CHECK_FUNCS([strrchr strtoul strtoull]) + +dnl Checks for typedefs, structures, and compiler characteristics. +AC_C_CONST +AC_C_INLINE +AC_STRUCT_TM + +AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, + if test -n "`$LD --help < /dev/null 2>/dev/null | grep version-script`"; then + ac_cv_version_script=yes + else + ac_cv_version_script=no + fi) + +AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = "yes") + +dnl Check if we should include test utilities +AC_MSG_CHECKING(for --enable-test-utils) +AC_ARG_ENABLE(test-utils, +[ --enable-test-utils build additional test utilities (default=no)], +[case "${enableval}" in + yes) tutils=yes ;; + no) tutils=no ;; + *) AC_MSG_ERROR(bad value ${enableval} for --enable-test-utils) ;; +esac],[tutils=no]) +AM_CONDITIONAL(ENABLE_TEST_UTILS, test x$tutils = xyes) +AC_MSG_RESULT(${tutils=no}) + +AC_CONFIG_FILES([Makefile libibnetdisc.spec]) +AC_OUTPUT diff --git a/libibnetdisc/include/infiniband/ibnetdisc.h b/libibnetdisc/include/infiniband/ibnetdisc.h new file mode 100644 index 0000000..92fa8c4 --- /dev/null +++ b/libibnetdisc/include/infiniband/ibnetdisc.h @@ -0,0 +1,306 @@ +/* + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#ifndef _IBNETDISC_H_ +#define _IBNETDISC_H_ + +#include +#include + +#define MAXHOPS 63 + +/* HASH table defines */ +#define HASHGUID(guid) ((uint32_t)(((uint32_t)(guid) * 101) ^ ((uint32_t)((guid) >> 32) * 103))) +#define HTSZ 137 + +#define IBND_DEBUG(str, args...) \ + if (ibdebug) printf("%s:%d; "str, __FILE__, __LINE__, ##args) +#define IBND_ERROR(str, args...) \ + fprintf(stderr, "%s:%d; "str, __FILE__, __LINE__, ##args) + +/** ========================================================================= + * ENUM definitions + */ +typedef enum { + IBND_CA_NODE = 1, + IBND_SWITCH_NODE = 2, + IBND_ROUTER_NODE = 3 +} ibnd_node_type_t; + +typedef enum { + IBND_LINK_DOWN = 1, + IBND_LINK_INIT = 2, + IBND_LINK_ARMED = 3, + IBND_LINK_ACTIVE = 4 +} ibnd_link_state_t; + +/** ========================================================================= + * Node + */ +typedef struct switch_info { + int smaenhsp0; +} ibnd_switch_info_t; + +typedef struct node_info { + int base_ver; + int class_ver; + int type; + int numports; + uint64_t sysimgguid; + uint64_t nodeguid; + uint64_t nodeportguid; + uint16_t partition_cap; + uint32_t devid; + uint32_t revision; + int localport; + uint32_t vendid; +} ibnd_node_info_t; + +struct port; +struct ib_fabric; +struct chassis_record; +typedef struct node { + struct node *next; /* all node list in fabric */ + struct node *htnext; /* store node in guid hash table */ + struct node *dnext; /* store node in nodesdist table */ + struct node *type_next; /* store node in "type" list (ca|switch|router) */ + + struct ib_fabric *fabric; /* the fabric node belongs to */ + + ib_portid_t path_portid; /* path from "from_node" */ + int dist; /* num of hops from "from_node" */ + + int smalid; + int smalmc; + ibnd_switch_info_t sw_info; + ibnd_node_info_t info; + + char nodedesc[64]; + + struct port **ports; /* in order array of port pointers */ + /* the size of this array is info.numports + 1 */ + /* items MAY BE NULL! (ie 0 == switches only) */ + + /* chassis info */ + struct node *chassis_next; /* store node in "chassis" list */ + struct chassis_record *chrecord; + + void *user_data; /* users can store data here */ +} ibnd_node_t; + +/** ========================================================================= + * Port + */ +typedef struct port_info { + int lid; + int smlid; + int link_speed_supported; + int link_speed_enabled; + int link_speed_active; + int link_state; + int phys_state; + int link_down_def_state; + int mkey_prot_bits; + int lmc; + int neighbor_mtu; + int smsl; + int init_type; + int vl_capability; + int vl_high_limit; + int vl_arb_high_cap; + int vl_arb_low_cap; + int init_reply; + int mtu_cap; + int vl_stall_count; + int hoq_lifetime; + int oper_vls; + int partition_enforce_in; + int partition_enforce_out; + int filter_raw_in; + int filter_raw_out; + int mkey_violations; + int pkey_violations; + int qkey_violations; + int guid_capabilities; + int client_rereg; + int subnet_timeout; + int response_time_val; + int local_phys_error; + int overrun_error; + int max_credit_hint; + uint32_t link_round_trip; + int local_port; + int link_width_supported; + int link_width_enabled; + int link_width_active; + int diag_code; + int mkey_lease; + uint32_t capability_mask; + uint64_t mkey; + uint64_t gid_prefix; +} ibnd_port_info_t; + +typedef struct port { + struct port *htnext; + uint64_t guid; + int portnum; + int ext_portnum; /* optional (!= 0) external port num */ + ibnd_node_t *node; + struct port *remoteport; /* null if SMA, or does not exist */ + ibnd_port_info_t info; + void *user_data; /* users can store data here */ +} ibnd_port_t; + + +/** ========================================================================= + * Chassis data + */ +typedef struct chassis_record { + struct chassis_record *next; + unsigned char chassisnum; + unsigned char chassistype; + unsigned char anafanum; + unsigned char slotnum; + unsigned char chassisslot; +} ibnd_chassis_record_t; + +#define SPINES_MAX_NUM 12 +#define LINES_MAX_NUM 36 + +typedef struct chassis_list { + struct chassis_list *next; + uint64_t chassisguid; + int chassisnum; + int chassistype; + + /* generic grouping by SystemImageGUID */ + int nodecount; + ibnd_node_t *nodes; + + /* specific to voltaire type nodes */ + ibnd_node_t *spinenode[SPINES_MAX_NUM + 1]; + ibnd_node_t *linenode[LINES_MAX_NUM + 1]; +} ibnd_chassis_list_t; + +/** ========================================================================= + * Fabric + * Main fabric object which is returned and represents the data discovered + */ +typedef struct ib_fabric { + /* the node which you requested to start on + * "from" parameter in ibnd_discover_fabric + */ + ibnd_node_t *from_node; + + /* list of all nodes in the system */ + ibnd_node_t *nodes; + + /* NULL terminated lists of node types */ + ibnd_node_t *switches; + ibnd_node_t *ch_adapters; + ibnd_node_t *routers; + + /* list of all chassis found in the fabric */ + ibnd_chassis_list_t *chassis; + + /* the following are for internal use */ + void *ibmad_port; + ibnd_node_t *nodestbl[HTSZ]; + ibnd_port_t *portstbl[HTSZ]; + int maxhops_discovered; + ibnd_node_t *nodesdist[MAXHOPS+1]; + ibnd_chassis_list_t *first_chassis; + ibnd_chassis_list_t *current_chassis; +} ibnd_fabric_t; + + +/** ========================================================================= + * Initialization (fabric operations) + */ +void ibnd_debug(int i); +void ibnd_show_progress(int i); + +ibnd_fabric_t *ibnd_discover_fabric(char *dev_name, int dev_port, + int timeout_ms, ib_portid_t *from, int hops); + /** + * dev_name: (required) local device name to use to access the fabric + * dev_port: (required) local device port to use to access the fabric + * timeout_ms: (required) gives the timeout for a _SINGLE_ query on + * the fabric. So if there are mutiple nodes not + * responding this may result in a lengthy delay. + * from: (optional) specify the node to start scanning from. + * If NULL start from the node we are running on. + * hops: (optional) Specify how much of the fabric to traverse. + * negative value == scan entire fabric + */ +void ibnd_destroy_fabric(ibnd_fabric_t *fabric); + +/** ========================================================================= + * Node operations + */ +typedef void (*ibnd_iter_func_t)(ibnd_node_t *node, void *user_data); + +ibnd_node_t *ibnd_find_node_guid(ibnd_fabric_t *fabric, uint64_t guid); +ibnd_node_t *ibnd_find_node_dr(ibnd_fabric_t *fabric, char *dr_str); +ibnd_node_t *ibnd_update_node(ibnd_node_t *node); + +void ibnd_iter_nodes(ibnd_fabric_t *fabric, + ibnd_iter_func_t func, + void *user_data); +void ibnd_iter_nodes_type(ibnd_fabric_t *fabric, + ibnd_iter_func_t func, + ibnd_node_type_t node_type, + void *user_data); + +/** ========================================================================= + * Str convert functions + */ +char *ibnd_linkwidth_str(int link_width); +char *ibnd_linkspeed_str(int link_speed); +char *ibnd_linkstate_str(int link_state); +char *ibnd_physstate_str(int phys_state); +const char *ibnd_node_type_str(ibnd_node_t *node); +const char *ibnd_node_type_str_short(ibnd_node_t *node); + +/** ========================================================================= + * Chassis queries + */ +uint64_t ibnd_get_chassis_guid(ibnd_fabric_t *fabric, unsigned char chassisnum); +char *ibnd_get_chassis_type(ibnd_node_t *node); +char *ibnd_get_chassis_slot_str(ibnd_node_t *node, char *str, size_t size); + +int ibnd_is_xsigo_guid(uint64_t guid); +int ibnd_is_xsigo_tca(uint64_t guid); +int ibnd_is_xsigo_hca(uint64_t guid); + +#endif /* _IBNETDISC_H_ */ diff --git a/libibnetdisc/libibnetdisc.spec.in b/libibnetdisc/libibnetdisc.spec.in new file mode 100644 index 0000000..015cd24 --- /dev/null +++ b/libibnetdisc/libibnetdisc.spec.in @@ -0,0 +1,94 @@ + +%define RELEASE @RELEASE@ +%define rel %{?CUSTOM_RELEASE} %{!?CUSTOM_RELEASE:%RELEASE} + +%if %{?_with_test_utils:1}%{!?_with_test_utils:0} +%define _enable_test_utils --enable-test-utils +%endif +%if %{?_without_test_utils:1}%{!?_without_test_utils:0} +%define _disable_test_utils --disable-test-utils +%endif + +Summary: OpenFabrics Alliance InfiniBand MAD library +Name: libibnetdisc +Version: @VERSION@ +Release: %rel%{?dist} +License: GPLv2 or BSD +Group: System Environment/Libraries +BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) +Source: http://www.openfabrics.org/downloads/management/@TARBALL@ +Url: http://openfabrics.org/ +BuildRequires: opensm-libs, libtool, libibcommon, libibumad +Requires(post): /sbin/ldconfig +Requires(postun): /sbin/ldconfig + +%description +libibnetdisc provides a higer level C interface to scaning an IB fabric. + +%package devel +Summary: Development files for the libibnetdisc library +Group: System Environment/Libraries +Requires: %{name} = %{version}-%{release}, opensm-devel, libibcommon-devel, libibumad-devel +Requires(post): /sbin/ldconfig +Requires(postun): /sbin/ldconfig + +%description devel +Development files for the libibnetdisc library. + +%package static +Summary: Static version of the libibnetdisc library +Group: System Environment/Libraries +Requires: %{name} = %{version}-%{release} + +%description static +Static version of the libibnetdisc library + +%if %{?_with_test_utils:1}%{!?_with_test_utils:0} +%package utils +Summary: Debug utilities built against libibnetdisc +Group: System Environment/Libraries +Requires: %{name} = %{version}-%{release} + +%description utils +Debug utilities built against libibnetdisc + +%files utils +%defattr(-,root,root) +%{_sbindir}/* +%endif + +%prep +%setup -q + +%build +%configure \ + %{?_enable_test_utils} \ + %{?_disable_test_utils} +make + +%install +make DESTDIR=${RPM_BUILD_ROOT} install +# remove unpackaged files from the buildroot +rm -f $RPM_BUILD_ROOT%{_libdir}/*.la + +%clean +rm -rf $RPM_BUILD_ROOT + +%post -p /sbin/ldconfig +%postun -p /sbin/ldconfig +%post devel -p /sbin/ldconfig +%postun devel -p /sbin/ldconfig + +%files +%defattr(-,root,root) +%{_libdir}/libibnetdisc*.so.* +%doc AUTHORS COPYING ChangeLog + +%files devel +%defattr(-,root,root) +%{_libdir}/libibnetdisc.so +%{_includedir}/infiniband/*.h + +%files static +%defattr(-,root,root) +%{_libdir}/libibnetdisc.a diff --git a/libibnetdisc/libibnetdisc.ver b/libibnetdisc/libibnetdisc.ver new file mode 100644 index 0000000..a0a5f3c --- /dev/null +++ b/libibnetdisc/libibnetdisc.ver @@ -0,0 +1,9 @@ +# In this file we track the current API version +# of the IB net discover interface (and libraries) +# The version is built of the following +# tree numbers: +# API_REV:RUNNING_REV:AGE +# API_REV - advance on any added API +# RUNNING_REV - advance any change to the vendor files +# AGE - number of backward versions the API still supports +LIBVERSION=1:0:0 diff --git a/libibnetdisc/man/ibnd_debug.3 b/libibnetdisc/man/ibnd_debug.3 new file mode 100644 index 0000000..a4076fc --- /dev/null +++ b/libibnetdisc/man/ibnd_debug.3 @@ -0,0 +1,2 @@ +.\".TH IBND_DEBUG 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_discover_fabric.3 diff --git a/libibnetdisc/man/ibnd_destroy_fabric.3 b/libibnetdisc/man/ibnd_destroy_fabric.3 new file mode 100644 index 0000000..8fe20ae --- /dev/null +++ b/libibnetdisc/man/ibnd_destroy_fabric.3 @@ -0,0 +1,2 @@ +.\".TH IBND_DESTROY_FABRIC 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_discover_fabric.3 diff --git a/libibnetdisc/man/ibnd_discover_fabric.3 b/libibnetdisc/man/ibnd_discover_fabric.3 new file mode 100644 index 0000000..0db23f4 --- /dev/null +++ b/libibnetdisc/man/ibnd_discover_fabric.3 @@ -0,0 +1,43 @@ +.TH IBND_DISCOVER_FABRIC 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" +.SH "NAME" +ibnd_discover_fabric, ibnd_destroy_fabric, ibnd_debug \- initialize ibnetdiscover library. +.SH "SYNOPSIS" +.nf +.B #include +.sp +.BI "ibnd_fabric_t *ibnd_discover_fabric(char *dev_name, int dev_port, int timeout_ms, ib_portid_t *from, int hops)" +.BI "void ibnd_destroy_fabric(ibnd_fabric_t *fabric)" +.BI "void ibnd_debug(int i)" + +.SH "DESCRIPTION" +.B ibnd_discover_fabric() +Discover the fabric connected to the port specified by dev_name and dev_port, using a timeout specified. The "from" and "hops" parameters are optional and allow one to scan part of a fabric by specifying a node "from" and a number of hops away from that node to scan, "hops". This gives the user a "sub-fabric" which is "centered" anywhere they chose. + +.B ibnd_destroy_fabric() +free all memory and resources associated with the fabric. + +.B ibnd_debug() +Set the debug level to be printed as library operations take place. + +.SH "RETURN VALUE" +.B ibnd_discover_fabric() +return NULL on failure, otherwise a valid ibnd_fabric_t object. + +.B ibnd_destory_fabric(), ibnd_debug() +NONE + +.SH "EXAMPLES" + +.B Discover the entire fabric connected to device "mthca0", port 1. + + ibnd_discover_fabric("mthca0", 1, 100, NULL, 0); + +.B Discover only a single node and those nodes connected to it. + + str2drpath(&(port_id.drpath), from, 0, 0); + + ibnd_discover_fabric("mthca0", 1, 100, &port_id, 1); + +.SH "AUTHORS" +.TP +Ira Weiny diff --git a/libibnetdisc/man/ibnd_find_node_dr.3 b/libibnetdisc/man/ibnd_find_node_dr.3 new file mode 100644 index 0000000..612e501 --- /dev/null +++ b/libibnetdisc/man/ibnd_find_node_dr.3 @@ -0,0 +1,2 @@ +.\".TH IBND_FIND_NODE_DR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_find_node_guid.3 diff --git a/libibnetdisc/man/ibnd_find_node_guid.3 b/libibnetdisc/man/ibnd_find_node_guid.3 new file mode 100644 index 0000000..676b528 --- /dev/null +++ b/libibnetdisc/man/ibnd_find_node_guid.3 @@ -0,0 +1,25 @@ +.TH IBND_FIND_NODE_GUID 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" +.SH "NAME" +ibnd_find_node_guid, ibnd_find_node_dr \- given a fabric object find the node object within it which matches the guid or directed route specified. + +.SH "SYNOPSIS" +.nf +.B #include +.sp +.BI "ibnd_node_t *ibnd_find_node_guid(ibnd_fabric_t *fabric, uint64_t guid)" +.BI "ibnd_node_t *ibnd_find_node_dr(ibnd_fabric_t *fabric, char *dr_str)" + +.SH "DESCRIPTION" +.B ibnd_find_node_guid() +Given a fabric object and a guid, return the ibnd_node_t object with that node guid. +.B ibnd_find_node_dr() +Given a fabric object and a directed route, return the ibnd_node_t object with +that directed route. + +.SH "RETURN VALUE" +.B ibnd_find_node_guid(), ibnd_find_node_dr() +return NULL on failure, otherwise a valid ibnd_node_t object. + +.SH "AUTHORS" +.TP +Ira Weiny diff --git a/libibnetdisc/man/ibnd_iter_nodes.3 b/libibnetdisc/man/ibnd_iter_nodes.3 new file mode 100644 index 0000000..7199dfb --- /dev/null +++ b/libibnetdisc/man/ibnd_iter_nodes.3 @@ -0,0 +1,24 @@ +.TH IBND_ITER_NODES 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" +.SH "NAME" +ibnd_iter_nodes, ibnd_iter_nodes_type \- given a fabric object and a function itterate over the nodes in the fabric. + +.SH "SYNOPSIS" +.nf +.B #include +.sp +.BI "void ibnd_iter_nodes(ibnd_fabric_t *fabric, ibnd_iter_func_t func, void *user_data)" +.BI "void ibnd_iter_nodes_type(ibnd_fabric_t *fabric, ibnd_iter_func_t func, ibnd_node_type_t type, void *user_data)" + +.SH "DESCRIPTION" +.B ibnd_iter_nodes() +Itterate through all the nodes in the fabric and call "func" on them. +.B ibnd_iter_nodes_type() +The same as ibnd_iter_nodes except to limit the iteration to the nodes with the specified type. + +.SH "RETURN VALUE" +.B ibnd_iter_nodes(), ibnd_iter_nodes_type() +NONE + +.SH "AUTHORS" +.TP +Ira Weiny diff --git a/libibnetdisc/man/ibnd_iter_nodes_type.3 b/libibnetdisc/man/ibnd_iter_nodes_type.3 new file mode 100644 index 0000000..878547c --- /dev/null +++ b/libibnetdisc/man/ibnd_iter_nodes_type.3 @@ -0,0 +1,2 @@ +.\".TH IBND_FIND_NODES_TYPE 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_find_nodes.3 diff --git a/libibnetdisc/man/ibnd_linkspeed_str.3 b/libibnetdisc/man/ibnd_linkspeed_str.3 new file mode 100644 index 0000000..128cd3e --- /dev/null +++ b/libibnetdisc/man/ibnd_linkspeed_str.3 @@ -0,0 +1,2 @@ +.\".TH IBND_LINKSPEED_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_linkwidth_str.3 diff --git a/libibnetdisc/man/ibnd_linkstate_str.3 b/libibnetdisc/man/ibnd_linkstate_str.3 new file mode 100644 index 0000000..2fa9189 --- /dev/null +++ b/libibnetdisc/man/ibnd_linkstate_str.3 @@ -0,0 +1,2 @@ +.\".TH IBND_LINKSTATE_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_linkwidth_str.3 diff --git a/libibnetdisc/man/ibnd_linkwidth_str.3 b/libibnetdisc/man/ibnd_linkwidth_str.3 new file mode 100644 index 0000000..2cd4f0a --- /dev/null +++ b/libibnetdisc/man/ibnd_linkwidth_str.3 @@ -0,0 +1,26 @@ +.TH IBND_LINKWIDTH_STR 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" +.SH "NAME" +ibnd_linkwidth_str, ibnd_linkspeed_str, ibnd_linkstate_str, ibnd_physstate_str, ibnd_node_type_str \- prety string functions. + +.SH "SYNOPSIS" +.nf +.B #include +.sp +.BI +.BI "char *ibnd_linkwidth_str(int link_width)" +.BI "char *ibnd_linkspeed_str(int link_speed)" +.BI "char *ibnd_linkstate_str(int link_state)" +.BI "char *ibnd_physstate_str(int phys_state)" +.BI "const char *ibnd_node_type_str(ibnd_node_t *node)" +.BI "const char *ibnd_node_type_str_short(ibnd_node_t *node)" + +.SH "DESCRIPTION" +Return user readable strings for the values given. + +.BI "const char *ibnd_node_type_str_short(ibnd_node_t *node)" +Returns a shorter abbreviated version of the string. + + +.SH "AUTHORS" +.TP +Ira Weiny diff --git a/libibnetdisc/man/ibnd_node_type_str.3 b/libibnetdisc/man/ibnd_node_type_str.3 new file mode 100644 index 0000000..77dbf07 --- /dev/null +++ b/libibnetdisc/man/ibnd_node_type_str.3 @@ -0,0 +1,2 @@ +.\".TH IBND_NODE_TYPE_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_linkwidth_str.3 diff --git a/libibnetdisc/man/ibnd_node_type_str_short.3 b/libibnetdisc/man/ibnd_node_type_str_short.3 new file mode 100644 index 0000000..62feb6e --- /dev/null +++ b/libibnetdisc/man/ibnd_node_type_str_short.3 @@ -0,0 +1,2 @@ +.\".TH IBND_NODE_TYPE_STR_SHORT 3 "Aug 05, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_linkwidth_str.3 diff --git a/libibnetdisc/man/ibnd_physstate_str.3 b/libibnetdisc/man/ibnd_physstate_str.3 new file mode 100644 index 0000000..aeeaeb7 --- /dev/null +++ b/libibnetdisc/man/ibnd_physstate_str.3 @@ -0,0 +1,2 @@ +.\".TH IBND_PHYSSTATE_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_physstate_str.3 diff --git a/libibnetdisc/man/ibnd_update_node.3 b/libibnetdisc/man/ibnd_update_node.3 new file mode 100644 index 0000000..d3aa206 --- /dev/null +++ b/libibnetdisc/man/ibnd_update_node.3 @@ -0,0 +1,21 @@ +.TH IBND_UPDATE_NODE 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" +.SH "NAME" +ibnd_update_node \- Update the node specified with new data from the fabric. + +.SH "SYNOPSIS" +.nf +.B #include +.sp +.BI "ibnd_node_t *ibnd_update_node(ibnd_node_t *node)" + +.SH "DESCRIPTION" +.B ibnd_update_node() +Update the node info, port info, and node description of the node specified. + +.SH "RETURN VALUE" +.B ibnd_update_node() +Return NULL on failure, otherwise a valid ibnd_node_t object which is part of the fabric object. + +.SH "AUTHORS" +.TP +Ira Weiny diff --git a/libibnetdisc/src/chassis.c b/libibnetdisc/src/chassis.c new file mode 100644 index 0000000..5f9c073 --- /dev/null +++ b/libibnetdisc/src/chassis.c @@ -0,0 +1,820 @@ +/* + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +/*========================================================*/ +/* FABRIC SCANNER SPECIFIC DATA */ +/*========================================================*/ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include + +#include +#include + +#include "chassis.h" + +static char *ChassisTypeStr[5] = { "", "ISR9288", "ISR9096", "ISR2012", "ISR2004" }; +static char *ChassisSlotTypeStr[4] = { "", "Line", "Spine", "SRBD" }; + +char *ibnd_get_chassis_type(ibnd_node_t *node) +{ + /* Currently, only if Voltaire chassis */ + if (node->info.vendid != VTR_VENDOR_ID) + return (NULL); + if (!node->chrecord) + return (NULL); + if (node->chrecord->chassistype == UNRESOLVED_CT + || node->chrecord->chassistype > ISR2004_CT) + return (NULL); + return ChassisTypeStr[node->chrecord->chassistype]; +} + +char *ibnd_get_chassis_slot_str(ibnd_node_t *node, char *str, size_t size) +{ + /* Currently, only if Voltaire chassis */ + if (node->info.vendid != VTR_VENDOR_ID) + return (NULL); + if (!node->chrecord) + return (NULL); + if (node->chrecord->chassisslot == UNRESOLVED_CS + || node->chrecord->chassisslot > SRBD_CS) + return (NULL); + if (!str) + return (NULL); + snprintf(str, size, "%s %d Chip %d", + ChassisSlotTypeStr[node->chrecord->chassisslot], + node->chrecord->slotnum, + node->chrecord->anafanum); + return (str); +} + +static ibnd_chassis_list_t *find_chassisnum(ibnd_fabric_t *fabric, unsigned char chassisnum) +{ + ibnd_chassis_list_t *current; + + for (current = fabric->first_chassis; current; current = current->next) { + if (current->chassisnum == chassisnum) + return current; + } + + return NULL; +} + +static uint64_t topspin_chassisguid(uint64_t guid) +{ + /* Byte 3 in system image GUID is chassis type, and */ + /* Byte 4 is location ID (slot) so just mask off byte 4 */ + return guid & 0xffffffff00ffffffULL; +} + +int ibnd_is_xsigo_guid(uint64_t guid) +{ + if ((guid & 0xffffff0000000000ULL) == 0x0013970000000000ULL) + return 1; + else + return 0; +} + +static int is_xsigo_leafone(uint64_t guid) +{ + if ((guid & 0xffffffffff000000ULL) == 0x0013970102000000ULL) + return 1; + else + return 0; +} + +int ibnd_is_xsigo_hca(uint64_t guid) +{ + /* NodeType 2 is HCA */ + if ((guid & 0xffffffff00000000ULL) == 0x0013970200000000ULL) + return 1; + else + return 0; +} + +int ibnd_is_xsigo_tca(uint64_t guid) +{ + /* NodeType 3 is TCA */ + if ((guid & 0xffffffff00000000ULL) == 0x0013970300000000ULL) + return 1; + else + return 0; +} + +static int is_xsigo_ca(uint64_t guid) +{ + if (ibnd_is_xsigo_hca(guid) || ibnd_is_xsigo_tca(guid)) + return 1; + else + return 0; +} + +static int is_xsigo_switch(uint64_t guid) +{ + if ((guid & 0xffffffff00000000ULL) == 0x0013970100000000ULL) + return 1; + else + return 0; +} + +static uint64_t xsigo_chassisguid(ibnd_node_t *node) +{ + if (!is_xsigo_ca(node->info.sysimgguid)) { + /* Byte 3 is NodeType and byte 4 is PortType */ + /* If NodeType is 1 (switch), PortType is masked */ + if (is_xsigo_switch(node->info.sysimgguid)) + return node->info.sysimgguid & 0xffffffff00ffffffULL; + else + return node->info.sysimgguid; + } else { + if (!node->ports || !node->ports[1]) + return (0); + + /* Is there a peer port ? */ + if (!node->ports[1]->remoteport) + return node->info.sysimgguid; + + /* If peer port is Leaf 1, use its chassis GUID */ + if (is_xsigo_leafone(node->ports[1]->remoteport->node->info.sysimgguid)) + return node->ports[1]->remoteport->node->info.sysimgguid & + 0xffffffff00ffffffULL; + else + return node->info.sysimgguid; + } +} + +static uint64_t get_chassisguid(ibnd_node_t *node) +{ + if (node->info.vendid == TS_VENDOR_ID || node->info.vendid == SS_VENDOR_ID) + return topspin_chassisguid(node->info.sysimgguid); + else if (node->info.vendid == XS_VENDOR_ID || ibnd_is_xsigo_guid(node->info.sysimgguid)) + return xsigo_chassisguid(node); + else + return node->info.sysimgguid; +} + +static ibnd_chassis_list_t *find_chassisguid(ibnd_node_t *node) +{ + ibnd_chassis_list_t *current; + uint64_t chguid; + + chguid = get_chassisguid(node); + for (current = node->fabric->first_chassis; current; current = current->next) { + if (current->chassisguid == chguid) + return current; + } + + return NULL; +} + +uint64_t ibnd_get_chassis_guid(ibnd_fabric_t *fabric, unsigned char chassisnum) +{ + ibnd_chassis_list_t *chassis; + + chassis = find_chassisnum(fabric, chassisnum); + if (chassis) + return chassis->chassisguid; + else + return 0; +} + +static int is_router(ibnd_node_t *node) +{ + return (node->info.devid == VTR_DEVID_IB_FC_ROUTER || + node->info.devid == VTR_DEVID_IB_IP_ROUTER); +} + +static int is_spine_9096(ibnd_node_t *node) +{ + return (node->info.devid == VTR_DEVID_SFB4 || + node->info.devid == VTR_DEVID_SFB4_DDR); +} + +static int is_spine_9288(ibnd_node_t *node) +{ + return (node->info.devid == VTR_DEVID_SFB12 || + node->info.devid == VTR_DEVID_SFB12_DDR); +} + +static int is_spine_2004(ibnd_node_t *node) +{ + return (node->info.devid == VTR_DEVID_SFB2004); +} + +static int is_spine_2012(ibnd_node_t *node) +{ + return (node->info.devid == VTR_DEVID_SFB2012); +} + +static int is_spine(ibnd_node_t *node) +{ + return (is_spine_9096(node) || is_spine_9288(node) || + is_spine_2004(node) || is_spine_2012(node)); +} + +static int is_line_24(ibnd_node_t *node) +{ + return (node->info.devid == VTR_DEVID_SLB24 || + node->info.devid == VTR_DEVID_SLB24_DDR); +} + +static int is_line_8(ibnd_node_t *node) +{ + return (node->info.devid == VTR_DEVID_SLB8); +} + +static int is_line_2024(ibnd_node_t *node) +{ + return (node->info.devid == VTR_DEVID_SLB2024); +} + +static int is_line(ibnd_node_t *node) +{ + return (is_line_24(node) || is_line_8(node) || is_line_2024(node)); +} + +int is_chassis_switch(ibnd_node_t *node) +{ + return (is_spine(node) || is_line(node)); +} + +/* these structs help find Line (Anafa) slot number while using spine portnum */ +int line_slot_2_sfb4[25] = { 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4 }; +int anafa_line_slot_2_sfb4[25] = { 0, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2 }; +int line_slot_2_sfb12[25] = { 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9,10, 10, 11, 11, 12, 12 }; +int anafa_line_slot_2_sfb12[25] = { 0, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2 }; + +/* IPR FCR modules connectivity while using sFB4 port as reference */ +int ipr_slot_2_sfb4_port[25] = { 0, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1 }; + +/* these structs help find Spine (Anafa) slot number while using spine portnum */ +int spine12_slot_2_slb[25] = { 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; +int anafa_spine12_slot_2_slb[25]= { 0, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; +int spine4_slot_2_slb[25] = { 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; +int anafa_spine4_slot_2_slb[25] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; +/* reference { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }; */ + +static void get_sfb_slot(ibnd_node_t *node, ibnd_port_t *lineport) +{ + ibnd_chassis_record_t *ch = node->chrecord; + + ch->chassisslot = SPINE_CS; + if (is_spine_9096(node)) { + ch->chassistype = ISR9096_CT; + ch->slotnum = spine4_slot_2_slb[lineport->portnum]; + ch->anafanum = anafa_spine4_slot_2_slb[lineport->portnum]; + } else if (is_spine_9288(node)) { + ch->chassistype = ISR9288_CT; + ch->slotnum = spine12_slot_2_slb[lineport->portnum]; + ch->anafanum = anafa_spine12_slot_2_slb[lineport->portnum]; + } else if (is_spine_2012(node)) { + ch->chassistype = ISR2012_CT; + ch->slotnum = spine12_slot_2_slb[lineport->portnum]; + ch->anafanum = anafa_spine12_slot_2_slb[lineport->portnum]; + } else if (is_spine_2004(node)) { + ch->chassistype = ISR2004_CT; + ch->slotnum = spine4_slot_2_slb[lineport->portnum]; + ch->anafanum = anafa_spine4_slot_2_slb[lineport->portnum]; + } else { + IBPANIC("Unexpected node found: guid 0x%016" PRIx64, + node->info.nodeguid); + } +} + +static void get_router_slot(ibnd_node_t *node, ibnd_port_t *spineport) +{ + ibnd_chassis_record_t *ch = node->chrecord; + int guessnum = 0; + + if (!ch) { + if (!(node->chrecord = calloc(1, sizeof(ibnd_chassis_record_t)))) + IBPANIC("out of mem"); + ch = node->chrecord; + } + + ch->chassisslot = SRBD_CS; + if (is_spine_9096(spineport->node)) { + ch->chassistype = ISR9096_CT; + ch->slotnum = line_slot_2_sfb4[spineport->portnum]; + ch->anafanum = ipr_slot_2_sfb4_port[spineport->portnum]; + } else if (is_spine_9288(spineport->node)) { + ch->chassistype = ISR9288_CT; + ch->slotnum = line_slot_2_sfb12[spineport->portnum]; + /* this is a smart guess based on nodeguids order on sFB-12 module */ + guessnum = spineport->node->info.nodeguid % 4; + /* module 1 <--> remote anafa 3 */ + /* module 2 <--> remote anafa 2 */ + /* module 3 <--> remote anafa 1 */ + ch->anafanum = (guessnum == 3 ? 1 : (guessnum == 1 ? 3 : 2)); + } else if (is_spine_2012(spineport->node)) { + ch->chassistype = ISR2012_CT; + ch->slotnum = line_slot_2_sfb12[spineport->portnum]; + /* this is a smart guess based on nodeguids order on sFB-12 module */ + guessnum = spineport->node->info.nodeguid % 4; + // module 1 <--> remote anafa 3 + // module 2 <--> remote anafa 2 + // module 3 <--> remote anafa 1 + ch->anafanum = (guessnum == 3? 1 : (guessnum == 1 ? 3 : 2)); + } else if (is_spine_2004(spineport->node)) { + ch->chassistype = ISR2004_CT; + ch->slotnum = line_slot_2_sfb4[spineport->portnum]; + ch->anafanum = ipr_slot_2_sfb4_port[spineport->portnum]; + } else { + IBPANIC("Unexpected node found: guid 0x%016" PRIx64, + spineport->node->info.nodeguid); + } +} + +static void get_slb_slot(ibnd_chassis_record_t *ch, ibnd_port_t *spineport) +{ + ch->chassisslot = LINE_CS; + if (is_spine_9096(spineport->node)) { + ch->chassistype = ISR9096_CT; + ch->slotnum = line_slot_2_sfb4[spineport->portnum]; + ch->anafanum = anafa_line_slot_2_sfb4[spineport->portnum]; + } else if (is_spine_9288(spineport->node)) { + ch->chassistype = ISR9288_CT; + ch->slotnum = line_slot_2_sfb12[spineport->portnum]; + ch->anafanum = anafa_line_slot_2_sfb12[spineport->portnum]; + } else if (is_spine_2012(spineport->node)) { + ch->chassistype = ISR2012_CT; + ch->slotnum = line_slot_2_sfb12[spineport->portnum]; + ch->anafanum = anafa_line_slot_2_sfb12[spineport->portnum]; + } else if (is_spine_2004(spineport->node)) { + ch->chassistype = ISR2004_CT; + ch->slotnum = line_slot_2_sfb4[spineport->portnum]; + ch->anafanum = anafa_line_slot_2_sfb4[spineport->portnum]; + } else { + IBPANIC("Unexpected node found: guid 0x%016" PRIx64, + spineport->node->info.nodeguid); + } +} + +/* forward declare this */ +static void voltaire_portmap(ibnd_port_t *port); +/* + This function called for every Voltaire node in fabric + It could be optimized so, but time overhead is very small + and its only diag.util +*/ +static void fill_voltaire_chassis_record(ibnd_node_t *node) +{ + int p = 0; + ibnd_port_t *port; + ibnd_node_t *remnode = 0; + ibnd_chassis_record_t *ch = 0; + + if (node->chrecord) /* somehow this node has already been passed */ + return; + + if (!(node->chrecord = calloc(1, sizeof(ibnd_chassis_record_t)))) + IBPANIC("out of mem"); + + ch = node->chrecord; + + /* node is router only in case of using unique lid */ + /* (which is lid of chassis router port) */ + /* in such case node->ports is actually a requested port... */ + if (is_router(node)) { + /* find the remote node */ + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && is_spine(port->remoteport->node)) + get_router_slot(node, port->remoteport); + } + } else if (is_spine(node)) { + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (!port || !port->remoteport) + continue; + remnode = port->remoteport->node; + if (remnode->info.type != IBND_SWITCH_NODE) { + if (!remnode->chrecord) + get_router_slot(remnode, port); + continue; + } + if (!ch->chassistype) + /* we assume here that remoteport belongs to line */ + get_sfb_slot(node, port->remoteport); + + /* we could break here, but need to find if more routers connected */ + } + + } else if (is_line(node)) { + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (!port || port->portnum > 12 || !port->remoteport) + continue; + /* we assume here that remoteport belongs to spine */ + get_slb_slot(ch, port->remoteport); + break; + } + } + + /* for each port of this node, map external ports */ + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (!port) + continue; + voltaire_portmap(port); + } + + return; +} + +static int get_line_index(ibnd_node_t *node) +{ + int retval = 3 * (node->chrecord->slotnum - 1) + node->chrecord->anafanum; + + if (retval > LINES_MAX_NUM || retval < 1) + IBPANIC("Internal error"); + return retval; +} + +static int get_spine_index(ibnd_node_t *node) +{ + int retval; + + if (is_spine_9288(node) || is_spine_2012(node)) + retval = 3 * (node->chrecord->slotnum - 1) + node->chrecord->anafanum; + else + retval = node->chrecord->slotnum; + + if (retval > SPINES_MAX_NUM || retval < 1) + IBPANIC("Internal error"); + return retval; +} + +static void insert_line_router(ibnd_node_t *node, ibnd_chassis_list_t *chassislist) +{ + int i = get_line_index(node); + + if (chassislist->linenode[i]) + return; /* already filled slot */ + + chassislist->linenode[i] = node; + node->chrecord->chassisnum = chassislist->chassisnum; +} + +static void insert_spine(ibnd_node_t *node, ibnd_chassis_list_t *chassislist) +{ + int i = get_spine_index(node); + + if (chassislist->spinenode[i]) + return; /* already filled slot */ + + chassislist->spinenode[i] = node; + node->chrecord->chassisnum = chassislist->chassisnum; +} + +static void pass_on_lines_catch_spines(ibnd_chassis_list_t *chassislist) +{ + ibnd_node_t *node, *remnode; + ibnd_port_t *port; + int i, p; + + for (i = 1; i <= LINES_MAX_NUM; i++) { + node = chassislist->linenode[i]; + + if (!(node && is_line(node))) + continue; /* empty slot or router */ + + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (!port || port->portnum > 12 || !port->remoteport) + continue; + + remnode = port->remoteport->node; + + if (!remnode->chrecord) + continue; /* some error - spine not initialized ? FIXME */ + insert_spine(remnode, chassislist); + } + } +} + +static void pass_on_spines_catch_lines(ibnd_chassis_list_t *chassislist) +{ + ibnd_node_t *node, *remnode; + ibnd_port_t *port; + int i, p; + + for (i = 1; i <= SPINES_MAX_NUM; i++) { + node = chassislist->spinenode[i]; + if (!node) + continue; /* empty slot */ + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (!port || !port->remoteport) + continue; + remnode = port->remoteport->node; + + if (!remnode->chrecord) + continue; /* some error - line/router not initialized ? FIXME */ + insert_line_router(remnode, chassislist); + } + } +} + +/* + Stupid interpolation algorithm... + But nothing to do - have to be compliant with VoltaireSM/NMS +*/ +static void pass_on_spines_interpolate_chguid(ibnd_chassis_list_t *chassislist) +{ + ibnd_node_t *node; + int i; + + for (i = 1; i <= SPINES_MAX_NUM; i++) { + node = chassislist->spinenode[i]; + if (!node) + continue; /* skip the empty slots */ + + /* take first guid minus one to be consistent with SM */ + chassislist->chassisguid = node->info.nodeguid - 1; + break; + } +} + +/* + This function fills chassislist structure with all nodes + in that chassis + chassislist structure = structure of one standalone chassis +*/ +static void build_chassis(ibnd_node_t *node, ibnd_chassis_list_t *chassislist) +{ + int p = 0; + ibnd_node_t *remnode = 0; + ibnd_port_t *port = 0; + + /* we get here with node = chassis_spine */ + chassislist->chassistype = node->chrecord->chassistype; + insert_spine(node, chassislist); + + /* loop: pass on all ports of node */ + for (p = 1; p <= node->info.numports; p++ ) { + port = node->ports[p]; + if (!port || !port->remoteport) + continue; + remnode = port->remoteport->node; + + if (!remnode->chrecord) + continue; /* some error - line or router not initialized ? FIXME */ + + insert_line_router(remnode, chassislist); + } + + pass_on_lines_catch_spines(chassislist); + /* this pass needed for to catch routers, since routers connected only */ + /* to spines in slot 1 or 4 and we could miss them first time */ + pass_on_spines_catch_lines(chassislist); + + /* additional 2 passes needed for to overcome a problem of pure "in-chassis" */ + /* connectivity - extra pass to ensure that all related chips/modules */ + /* inserted into the chassislist */ + pass_on_lines_catch_spines(chassislist); + pass_on_spines_catch_lines(chassislist); + pass_on_spines_interpolate_chguid(chassislist); +} + +/*========================================================*/ +/* INTERNAL TO EXTERNAL PORT MAPPING */ +/*========================================================*/ + +/* +Description : On ISR9288/9096 external ports indexing + is not matching the internal ( anafa ) port + indexes. Use this MAP to translate the data you get from + the OpenIB diagnostics (smpquery, ibroute, ibtracert, etc.) + + +Module : sLB-24 + anafa 1 anafa 2 +ext port | 13 14 15 16 17 18 | 19 20 21 22 23 24 +int port | 22 23 24 18 17 16 | 22 23 24 18 17 16 +ext port | 1 2 3 4 5 6 | 7 8 9 10 11 12 +int port | 19 20 21 15 14 13 | 19 20 21 15 14 13 +------------------------------------------------ + +Module : sLB-8 + anafa 1 anafa 2 +ext port | 13 14 15 16 17 18 | 19 20 21 22 23 24 +int port | 24 23 22 18 17 16 | 24 23 22 18 17 16 +ext port | 1 2 3 4 5 6 | 7 8 9 10 11 12 +int port | 21 20 19 15 14 13 | 21 20 19 15 14 13 + +-----------> + anafa 1 anafa 2 +ext port | - - 5 - - 6 | - - 7 - - 8 +int port | 24 23 22 18 17 16 | 24 23 22 18 17 16 +ext port | - - 1 - - 2 | - - 3 - - 4 +int port | 21 20 19 15 14 13 | 21 20 19 15 14 13 +------------------------------------------------ + +Module : sLB-2024 + +ext port | 13 14 15 16 17 18 19 20 21 22 23 24 +A1 int port| 13 14 15 16 17 18 19 20 21 22 23 24 +ext port | 1 2 3 4 5 6 7 8 9 10 11 12 +A2 int port| 13 14 15 16 17 18 19 20 21 22 23 24 +--------------------------------------------------- + +*/ + +int int2ext_map_slb24[2][25] = { + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 5, 4, 18, 17, 16, 1, 2, 3, 13, 14, 15 }, + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 11, 10, 24, 23, 22, 7, 8, 9, 19, 20, 21 } + }; +int int2ext_map_slb8[2][25] = { + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 6, 6, 6, 1, 1, 1, 5, 5, 5 }, + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 8, 8, 8, 3, 3, 3, 7, 7, 7 } + }; +int int2ext_map_slb2024[2][25] = { + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }, + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 } + }; +/* reference { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }; */ + +/* map internal ports to external ports if appropriate */ +static void +voltaire_portmap(ibnd_port_t *port) +{ + ibnd_chassis_record_t *ch = port->node->chrecord; + int portnum = port->portnum; + int chipnum = 0; + ibnd_node_t *node = port->node; + + if (!ch || !is_line(node) || (portnum < 13 || portnum > 24)) { + port->ext_portnum = 0; + return; + } + + if (ch->anafanum < 1 || ch->anafanum > 2) { + port->ext_portnum = 0; + return; + } + + chipnum = ch->anafanum - 1; + + if (is_line_24(node)) + port->ext_portnum = int2ext_map_slb24[chipnum][portnum]; + else if (is_line_2024(node)) + port->ext_portnum = int2ext_map_slb2024[chipnum][portnum]; + else + port->ext_portnum = int2ext_map_slb8[chipnum][portnum]; +} + +static void add_chassislist(ibnd_fabric_t *fabric) +{ + if (!(fabric->current_chassis = calloc(1, sizeof(ibnd_chassis_list_t)))) + IBPANIC("out of mem"); + + if (fabric->first_chassis == NULL) { + fabric->first_chassis = fabric->current_chassis; + } else { + fabric->current_chassis->next = NULL; + } +} + +static void +add_node_to_chassis(ibnd_chassis_list_t *chassis, ibnd_node_t *node) +{ + node->chassis_next = chassis->nodes; + if (chassis->nodes) + chassis->nodes->chassis_next = node; + else + chassis->nodes = node; +} + +/* + Main grouping function + Algorithm: + 1. pass on every Voltaire node + 2. catch spine chip for every Voltaire node + 2.1 build/interpolate chassis around this chip + 2.2 go to 1. + 3. pass on non Voltaire nodes (SystemImageGUID based grouping) + 4. now group non Voltaire nodes by SystemImageGUID +*/ +ibnd_chassis_list_t *group_nodes(ibnd_fabric_t *fabric) +{ + ibnd_node_t *node; + int dist; + int chassisnum = 0; + ibnd_chassis_list_t *chassis; + + fabric->first_chassis = NULL; + fabric->current_chassis = NULL; + + /* first pass on switches and build for every Voltaire node */ + /* an appropriate chassis record (slotnum and position) */ + /* according to internal connectivity */ + /* not very efficient but clear code so... */ + for (dist = 0; dist <= fabric->maxhops_discovered; dist++) { + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { + if (node->info.vendid == VTR_VENDOR_ID) + fill_voltaire_chassis_record(node); + } + } + + /* separate every Voltaire chassis from each other and build linked list of them */ + /* algorithm: catch spine and find all surrounding nodes */ + for (dist = 0; dist <= fabric->maxhops_discovered; dist++) { + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { + if (node->info.vendid != VTR_VENDOR_ID) + continue; + if (!node->chrecord || node->chrecord->chassisnum || !is_spine(node)) + continue; + add_chassislist(fabric); + fabric->current_chassis->chassisnum = ++chassisnum; + build_chassis(node, fabric->current_chassis); + } + } + + /* now make pass on nodes for chassis which are not Voltaire */ + /* grouped by common SystemImageGUID */ + for (dist = 0; dist <= fabric->maxhops_discovered; dist++) { + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { + if (node->info.vendid == VTR_VENDOR_ID) + continue; + if (node->info.sysimgguid) { + chassis = find_chassisguid(node); + if (chassis) + chassis->nodecount++; + else { + /* Possible new chassis */ + add_chassislist(fabric); + fabric->current_chassis->chassisguid = get_chassisguid(node); + fabric->current_chassis->nodecount = 1; + } + } + } + } + + /* now, make another pass to see which nodes are part of chassis */ + /* (defined as chassis->nodecount > 1) */ + for (dist = 0; dist <= MAXHOPS; ) { + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { + if (node->info.vendid == VTR_VENDOR_ID) + continue; + if (node->info.sysimgguid) { + chassis = find_chassisguid(node); + if (chassis && chassis->nodecount > 1) { + if (!chassis->chassisnum) + chassis->chassisnum = ++chassisnum; + if (!node->chrecord) { + if (!(node->chrecord = + calloc(1, + sizeof(ibnd_chassis_record_t)))) + IBPANIC("out of mem"); + node->chrecord->chassisnum = chassis->chassisnum; + add_node_to_chassis(chassis, node); + } + } + } + } + if (dist == fabric->maxhops_discovered) + dist = MAXHOPS; /* skip to CAs */ + else + dist++; + } + + return (fabric->first_chassis); +} diff --git a/libibnetdisc/src/chassis.h b/libibnetdisc/src/chassis.h new file mode 100644 index 0000000..ea271d0 --- /dev/null +++ b/libibnetdisc/src/chassis.h @@ -0,0 +1,82 @@ +/* + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#ifndef _CHASSIS_H_ +#define _CHASSIS_H_ + +#include + +/*========================================================*/ +/* CHASSIS RECOGNITION SPECIFIC DATA */ +/*========================================================*/ + +/* Device IDs */ +#define VTR_DEVID_IB_FC_ROUTER 0x5a00 +#define VTR_DEVID_IB_IP_ROUTER 0x5a01 +#define VTR_DEVID_ISR9600_SPINE 0x5a02 +#define VTR_DEVID_ISR9600_LEAF 0x5a03 +#define VTR_DEVID_HCA1 0x5a04 +#define VTR_DEVID_HCA2 0x5a44 +#define VTR_DEVID_HCA3 0x6278 +#define VTR_DEVID_SW_6IB4 0x5a05 +#define VTR_DEVID_ISR9024 0x5a06 +#define VTR_DEVID_ISR9288 0x5a07 +#define VTR_DEVID_SLB24 0x5a09 +#define VTR_DEVID_SFB12 0x5a08 +#define VTR_DEVID_SFB4 0x5a0b +#define VTR_DEVID_ISR9024_12 0x5a0c +#define VTR_DEVID_SLB8 0x5a0d +#define VTR_DEVID_RLX_SWITCH_BLADE 0x5a20 +#define VTR_DEVID_ISR9024_DDR 0x5a31 +#define VTR_DEVID_SFB12_DDR 0x5a32 +#define VTR_DEVID_SFB4_DDR 0x5a33 +#define VTR_DEVID_SLB24_DDR 0x5a34 +#define VTR_DEVID_SFB2012 0x5a37 +#define VTR_DEVID_SLB2024 0x5a38 +#define VTR_DEVID_ISR2012 0x5a39 +#define VTR_DEVID_SFB2004 0x5a40 +#define VTR_DEVID_ISR2004 0x5a41 + +/* Vendor IDs (for chassis based systems) */ +#define VTR_VENDOR_ID 0x8f1 /* Voltaire */ +#define TS_VENDOR_ID 0x5ad /* Cisco */ +#define SS_VENDOR_ID 0x66a /* InfiniCon */ +#define XS_VENDOR_ID 0x1397 /* Xsigo */ + +enum ibnd_chassis_type { UNRESOLVED_CT, ISR9288_CT, ISR9096_CT, ISR2012_CT, ISR2004_CT }; +enum ibnd_chassis_slot_type { UNRESOLVED_CS, LINE_CS, SPINE_CS, SRBD_CS }; + +ibnd_chassis_list_t *group_nodes(ibnd_fabric_t *fabric); + +#endif /* _CHASSIS_H_ */ diff --git a/libibnetdisc/src/ibnetdisc.c b/libibnetdisc/src/ibnetdisc.c new file mode 100644 index 0000000..3f4901a --- /dev/null +++ b/libibnetdisc/src/ibnetdisc.c @@ -0,0 +1,863 @@ +/* + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * Copyright (c) 2008 Lawrence Livermore National Laboratory + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include +#include + +#include "chassis.h" + +static int timeout_ms = 2000; +static int show_progress = 0; + +static char *linkwidth_str[] = { + "??", + "1x", + "4x", + "??", + "8x", + "??", + "??", + "??", + "12x" +}; + +static char *linkspeed_str[] = { + "???", + "SDR", + "DDR", + "???", + "QDR" +}; + +static char *linkstate_str[] = { + "No State", + "Down", + "Init", + "Armed", + "Active" +}; + +static char *physstate_str[] = { + "No State", + "Sleep", + "Polling", + "Disabled", + "PortConfigTraining", + "LinkUp", + "LinkErrorRecovery", + "Phy Test" +}; + +char * +ibnd_linkwidth_str(int link_width) +{ + if (link_width > 8) + return linkwidth_str[0]; + else + return linkwidth_str[link_width]; +} + +char * +ibnd_linkspeed_str(int link_speed) +{ + if (link_speed > 4) + return linkspeed_str[0]; + else + return linkspeed_str[link_speed]; +} +char * +ibnd_linkstate_str(int link_state) +{ + if (link_state > 4) + return linkstate_str[0]; + else + return linkstate_str[link_state]; +} + +char * +ibnd_physstate_str(int phys_state) +{ + if (phys_state > 7) + return physstate_str[0]; + else + return physstate_str[phys_state]; +} + +void +decode_port_info(void * rcv_buf, ibnd_port_info_t *pi) +{ + mad_decode_field(rcv_buf, IB_PORT_LID_F, &pi->lid); + mad_decode_field(rcv_buf, IB_PORT_SMLID_F, &pi->smlid); + + mad_decode_field(rcv_buf, IB_PORT_LINK_SPEED_SUPPORTED_F, &pi->link_speed_supported); + mad_decode_field(rcv_buf, IB_PORT_LINK_SPEED_ENABLED_F, &pi->link_speed_enabled); + mad_decode_field(rcv_buf, IB_PORT_LINK_SPEED_ACTIVE_F, &pi->link_speed_active); + + mad_decode_field(rcv_buf, IB_PORT_LOCAL_PORT_F, &pi->local_port); + mad_decode_field(rcv_buf, IB_PORT_LINK_WIDTH_SUPPORTED_F, &pi->link_width_supported); + mad_decode_field(rcv_buf, IB_PORT_LINK_WIDTH_ENABLED_F, &pi->link_width_enabled); + + mad_decode_field(rcv_buf, IB_PORT_LINK_WIDTH_ACTIVE_F, &pi->link_width_active); + + mad_decode_field(rcv_buf, IB_PORT_DIAG_F, &pi->diag_code); + mad_decode_field(rcv_buf, IB_PORT_MKEY_LEASE_F, &pi->mkey_lease); + mad_decode_field(rcv_buf, IB_PORT_CAPMASK_F, &pi->capability_mask); + mad_decode_field(rcv_buf, IB_PORT_MKEY_F, &pi->mkey); + mad_decode_field(rcv_buf, IB_PORT_GID_PREFIX_F, &pi->gid_prefix); + + mad_decode_field(rcv_buf, IB_PORT_STATE_F, &pi->link_state); + mad_decode_field(rcv_buf, IB_PORT_PHYS_STATE_F, &pi->phys_state); + + mad_decode_field(rcv_buf, IB_PORT_LINK_DOWN_DEF_F, &pi->link_down_def_state); + mad_decode_field(rcv_buf, IB_PORT_MKEY_PROT_BITS_F, &pi->mkey_prot_bits); + + mad_decode_field(rcv_buf, IB_PORT_LMC_F, &pi->lmc); + mad_decode_field(rcv_buf, IB_PORT_NEIGHBOR_MTU_F, &pi->neighbor_mtu); + mad_decode_field(rcv_buf, IB_PORT_SMSL_F, &pi->smsl); + mad_decode_field(rcv_buf, IB_PORT_INIT_TYPE_F, &pi->init_type); + + mad_decode_field(rcv_buf, IB_PORT_VL_CAP_F, &pi->vl_capability); + mad_decode_field(rcv_buf, IB_PORT_VL_HIGH_LIMIT_F, &pi->vl_high_limit); + mad_decode_field(rcv_buf, IB_PORT_VL_ARBITRATION_HIGH_CAP_F, &pi->vl_arb_high_cap); + mad_decode_field(rcv_buf, IB_PORT_VL_ARBITRATION_LOW_CAP_F, &pi->vl_arb_low_cap); + + mad_decode_field(rcv_buf, IB_PORT_INIT_TYPE_REPLY_F, &pi->init_reply); + mad_decode_field(rcv_buf, IB_PORT_MTU_CAP_F, &pi->mtu_cap); + mad_decode_field(rcv_buf, IB_PORT_VL_STALL_COUNT_F, &pi->vl_stall_count); + mad_decode_field(rcv_buf, IB_PORT_HOQ_LIFE_F, &pi->hoq_lifetime); + mad_decode_field(rcv_buf, IB_PORT_OPER_VLS_F, &pi->oper_vls); + mad_decode_field(rcv_buf, IB_PORT_PART_EN_INB_F, &pi->partition_enforce_in); + mad_decode_field(rcv_buf, IB_PORT_PART_EN_OUTB_F, &pi->partition_enforce_out); + mad_decode_field(rcv_buf, IB_PORT_FILTER_RAW_INB_F, &pi->filter_raw_in); + mad_decode_field(rcv_buf, IB_PORT_FILTER_RAW_OUTB_F, &pi->filter_raw_out); + mad_decode_field(rcv_buf, IB_PORT_MKEY_VIOL_F, &pi->mkey_violations); + mad_decode_field(rcv_buf, IB_PORT_PKEY_VIOL_F, &pi->pkey_violations); + mad_decode_field(rcv_buf, IB_PORT_QKEY_VIOL_F, &pi->qkey_violations); + + mad_decode_field(rcv_buf, IB_PORT_GUID_CAP_F, &pi->guid_capabilities); + + mad_decode_field(rcv_buf, IB_PORT_CLIENT_REREG_F, &pi->client_rereg); + mad_decode_field(rcv_buf, IB_PORT_SUBN_TIMEOUT_F, &pi->subnet_timeout); + mad_decode_field(rcv_buf, IB_PORT_RESP_TIME_VAL_F, &pi->response_time_val); + mad_decode_field(rcv_buf, IB_PORT_LOCAL_PHYS_ERR_F, &pi->local_phys_error); + mad_decode_field(rcv_buf, IB_PORT_OVERRUN_ERR_F, &pi->overrun_error); + mad_decode_field(rcv_buf, IB_PORT_MAX_CREDIT_HINT_F, &pi->max_credit_hint); + mad_decode_field(rcv_buf, IB_PORT_LINK_ROUND_TRIP_F, &pi->link_round_trip); +} + +static int +get_port_info(ibnd_fabric_t *fabric, ibnd_port_t *port, int portnum, ib_portid_t *portid) +{ + char portinfo[64]; + void *pi = portinfo; + + port->portnum = portnum; + + if (!smp_query_via(pi, portid, IB_ATTR_PORT_INFO, portnum, timeout_ms, + fabric->ibmad_port)) + return -1; + + decode_port_info(pi, &port->info); + + IBND_DEBUG("portid %s portnum %d: lid %d state %d physstate %d %s %s\n", + portid2str(portid), portnum, port->info.lid, port->info.link_state, + port->info.phys_state, ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active)); + return 1; +} + +static void +decode_node_info(void * rcv_buf, ibnd_node_info_t *ni) +{ + mad_decode_field(rcv_buf, IB_NODE_BASE_VERS_F, &ni->base_ver); + mad_decode_field(rcv_buf, IB_NODE_CLASS_VERS_F, &ni->class_ver); + mad_decode_field(rcv_buf, IB_NODE_TYPE_F, &ni->type); + mad_decode_field(rcv_buf, IB_NODE_NPORTS_F, &ni->numports); + mad_decode_field(rcv_buf, IB_NODE_SYSTEM_GUID_F, &ni->sysimgguid); + mad_decode_field(rcv_buf, IB_NODE_GUID_F, &ni->nodeguid); + mad_decode_field(rcv_buf, IB_NODE_PORT_GUID_F, &ni->nodeportguid); + mad_decode_field(rcv_buf, IB_NODE_PARTITION_CAP_F, &ni->partition_cap); + mad_decode_field(rcv_buf, IB_NODE_DEVID_F, &ni->devid); + mad_decode_field(rcv_buf, IB_NODE_REVISION_F, &ni->revision); + mad_decode_field(rcv_buf, IB_NODE_LOCAL_PORT_F, &ni->localport); + mad_decode_field(rcv_buf, IB_NODE_VENDORID_F, &ni->vendid); +} + +/* + * Returns -1 if error. + */ +static int +query_node_info(ibnd_fabric_t *fabric, ibnd_node_t *node, ib_portid_t *portid) +{ + char nodeinfo[64]; + void *ni = nodeinfo; + if (!smp_query_via(ni, portid, IB_ATTR_NODE_INFO, 0, timeout_ms, + fabric->ibmad_port)) + return -1; + decode_node_info(ni, &(node->info)); + return (0); +} + +/* + * Returns 0 if non switch node is found, 1 if switch is found, -1 if error. + */ +static int +query_node(ibnd_fabric_t *fabric, ibnd_node_t *node, ibnd_port_t *port, ib_portid_t *portid) +{ + char portinfo[64]; + void *pi = portinfo; + char switchinfo[64]; + void *si = switchinfo; + void *nd = node->nodedesc; + + if (query_node_info(fabric, node, portid)) + return -1; + + port->portnum = node->info.localport; + port->guid = node->info.nodeportguid; + + if (!smp_query_via(nd, portid, IB_ATTR_NODE_DESC, 0, timeout_ms, + fabric->ibmad_port)) + return -1; + + if (!smp_query_via(pi, portid, IB_ATTR_PORT_INFO, 0, timeout_ms, + fabric->ibmad_port)) + return -1; + decode_port_info(pi, &port->info); + + if (node->info.type != IBND_SWITCH_NODE) + return 0; + + node->smalid = port->info.lid; + node->smalmc = port->info.lmc; + + /* after we have the sma information find out the real PortInfo for this port */ + if (!smp_query_via(pi, portid, IB_ATTR_PORT_INFO, node->info.localport, timeout_ms, + fabric->ibmad_port)) + return -1; + decode_port_info(pi, &port->info); + + if (!smp_query_via(si, portid, IB_ATTR_SWITCH_INFO, 0, timeout_ms, + fabric->ibmad_port)) + node->sw_info.smaenhsp0 = 0; /* assume base SP0 */ + else + mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, &node->sw_info.smaenhsp0); + + IBND_DEBUG("portid %s: got switch node %" PRIx64 " '%s'\n", + portid2str(portid), node->info.nodeguid, node->nodedesc); + return 1; +} + +static int +add_port_to_dpath(ib_dr_path_t *path, int nextport) +{ + if (path->cnt+2 >= sizeof(path->p)) + return -1; + ++path->cnt; + path->p[path->cnt] = nextport; + return path->cnt; +} + +static int +extend_dpath(ibnd_fabric_t *fabric, ib_dr_path_t *path, int nextport) +{ + int rc = add_port_to_dpath(path, nextport); + if ((rc != -1) && (path->cnt > fabric->maxhops_discovered)) + fabric->maxhops_discovered = path->cnt; + return (rc); +} + +static void +dump_endnode(ib_portid_t *path, char *prompt, ibnd_node_t *node, ibnd_port_t *port) +{ + if (!show_progress) + return; + + printf("%s -> %s %s {%016" PRIx64 "} portnum %d lid %d-%d\"%s\"\n", + portid2str(path), prompt, + ibnd_node_type_str(node), + node->info.nodeguid, node->info.type == IBND_SWITCH_NODE ? 0 : port->portnum, + port->info.lid, port->info.lid + (1 << port->info.lmc) - 1, + node->nodedesc); +} + +static ibnd_node_t * +find_existing_node(ibnd_fabric_t *fabric, ibnd_node_t *new) +{ + int hash = HASHGUID(new->info.nodeguid) % HTSZ; + ibnd_node_t *node; + + for (node = fabric->nodestbl[hash]; node; node = node->htnext) + if (node->info.nodeguid == new->info.nodeguid) + return node; + + return NULL; +} + +ibnd_node_t * +ibnd_find_node_guid(ibnd_fabric_t *fabric, uint64_t guid) +{ + int hash = HASHGUID(guid) % HTSZ; + ibnd_node_t *node; + + for (node = fabric->nodestbl[hash]; node; node = node->htnext) + if (node->info.nodeguid == guid) + return node; + + return NULL; +} + +ibnd_node_t * +ibnd_update_node(ibnd_node_t *node) +{ + char portinfo[64]; + void *pi = portinfo; + ibnd_port_info_t port0_info; + char switchinfo[64]; + void *si = switchinfo; + void *nd = node->nodedesc; + int p = 0; + + if (query_node_info(node->fabric, node, &(node->path_portid))) + return (NULL); + + if (!smp_query_via(nd, &(node->path_portid), IB_ATTR_NODE_DESC, 0, timeout_ms, + node->fabric->ibmad_port)) + return (NULL); + + /* update all the port info's */ + for (p = 1; p >= node->info.numports; p++) { + get_port_info(node->fabric, node->ports[p], p, &(node->path_portid)); + } + + if (node->info.type != IBND_SWITCH_NODE) + goto done; + + if (!smp_query_via(pi, &(node->path_portid), IB_ATTR_PORT_INFO, 0, timeout_ms, + node->fabric->ibmad_port)) + return (NULL); + decode_port_info(pi, &port0_info); + + node->smalid = port0_info.lid; + node->smalmc = port0_info.lmc; + + if (!smp_query_via(si, &(node->path_portid), IB_ATTR_SWITCH_INFO, 0, timeout_ms, + node->fabric->ibmad_port)) + node->sw_info.smaenhsp0 = 0; /* assume base SP0 */ + else + mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, &node->sw_info.smaenhsp0); + +done: + return (node); +} + +ibnd_node_t * +ibnd_find_node_dr(ibnd_fabric_t *fabric, char *dr_str) +{ + int i = 0; + ibnd_node_t *rc = fabric->from_node; + ib_dr_path_t path; + + if (str2drpath(&path, dr_str, 0, 0) == -1) { + return (NULL); + } + + for (i = 0; i <= path.cnt; i++) { + ibnd_port_t *remote_port = NULL; + if (path.p[i] == 0) + continue; + if (!rc->ports) + return (NULL); + + remote_port = rc->ports[path.p[i]]->remoteport; + if (!remote_port) + return (NULL); + + rc = remote_port->node; + } + + return (rc); +} + +void +add_to_nodeguid_hash(ibnd_node_t *node, ibnd_node_t *hash[]) +{ + int hash_idx = HASHGUID(node->info.nodeguid) % HTSZ; + + node->htnext = hash[hash_idx]; + hash[hash_idx] = node; +} + +void +add_to_portguid_hash(ibnd_port_t *port, ibnd_port_t *hash[]) +{ + int hash_idx = HASHGUID(port->guid) % HTSZ; + + port->htnext = hash[hash_idx]; + hash[hash_idx] = port; +} + +ibnd_port_t * +find_existing_port_fabric(ibnd_fabric_t *fabric, uint64_t guid) +{ + int hash = HASHGUID(guid) % HTSZ; + ibnd_port_t *port; + + for (port = fabric->portstbl[hash]; port; port = port->htnext) + if (port->guid == guid) + return port; + + return NULL; +} + +void +add_to_type_list(ibnd_node_t *node, ibnd_fabric_t *fabric) +{ + switch (node->info.type) { + case IBND_CA_NODE: + node->type_next = fabric->ch_adapters; + fabric->ch_adapters = node; + break; + case IBND_SWITCH_NODE: + node->type_next = fabric->switches; + fabric->switches = node; + break; + case IBND_ROUTER_NODE: + node->type_next = fabric->routers; + fabric->routers = node; + break; + } +} + +void +add_to_nodedist(ibnd_node_t *node, ibnd_fabric_t *fabric) +{ + int dist = node->dist; + if (node->info.type != IBND_SWITCH_NODE) + dist = MAXHOPS; /* special Ca list */ + + node->dnext = fabric->nodesdist[dist]; + fabric->nodesdist[dist] = node; +} + + +static ibnd_node_t * +create_node(ibnd_fabric_t *fabric, ibnd_node_t *temp, ib_portid_t *path, int dist) +{ + ibnd_node_t *node; + + node = malloc(sizeof(*node)); + if (!node) { + IBPANIC("OOM: node creation failed\n"); + return NULL; + } + + memcpy(node, temp, sizeof(*node)); + node->dist = dist; + node->path_portid = *path; + node->fabric = fabric; + + add_to_nodeguid_hash(node, fabric->nodestbl); + + /* add this to the all nodes list */ + node->next = fabric->nodes; + fabric->nodes = node; + + add_to_type_list(node, fabric); + add_to_nodedist(node, fabric); + + return node; +} + +static ibnd_port_t * +find_existing_port_node(ibnd_node_t *node, ibnd_port_t *port) +{ + if (port->portnum > node->info.numports || node->ports == NULL ) + return (NULL); + + return (node->ports[port->portnum]); +} + +static ibnd_port_t * +add_port_to_node(ibnd_fabric_t *fabric, ibnd_node_t *node, ibnd_port_t *temp) +{ + ibnd_port_t *port; + + port = malloc(sizeof(*port)); + if (!port) + return NULL; + + memcpy(port, temp, sizeof(*port)); + port->node = node; + port->ext_portnum = 0; + + if (node->ports == NULL) { + node->ports = calloc(sizeof(*node->ports), node->info.numports + 1); + if (!node->ports) { + IBND_ERROR("Failed to allocate the ports array\n"); + return (NULL); + } + } + + node->ports[temp->portnum] = port; + + add_to_portguid_hash(port, fabric->portstbl); + return port; +} + +void +link_ports(ibnd_node_t *node, ibnd_port_t *port, ibnd_node_t *remotenode, ibnd_port_t *remoteport) +{ + IBND_DEBUG("linking: 0x%" PRIx64 " %p->%p:%u and 0x%" PRIx64 " %p->%p:%u\n", + node->info.nodeguid, node, port, port->portnum, + remotenode->info.nodeguid, remotenode, remoteport, remoteport->portnum); + if (port->remoteport) + port->remoteport->remoteport = NULL; + if (remoteport->remoteport) + remoteport->remoteport->remoteport = NULL; + port->remoteport = remoteport; + remoteport->remoteport = port; +} + +static int +get_remote_node(ibnd_fabric_t *fabric, ibnd_node_t *node, ibnd_port_t *port, ib_portid_t *path, + int portnum, int dist) +{ + ibnd_node_t node_buf; + ibnd_port_t port_buf; + ibnd_node_t *remotenode, *oldnode; + ibnd_port_t *remoteport, *oldport; + + memset(&node_buf, 0, sizeof(node_buf)); + memset(&port_buf, 0, sizeof(port_buf)); + + IBND_DEBUG("handle node %p port %p:%d dist %d\n", node, port, portnum, dist); + if (port->info.phys_state != 5) /* LinkUp */ + return -1; + + if (extend_dpath(fabric, &path->drpath, portnum) < 0) + return -1; + + if (query_node(fabric, &node_buf, &port_buf, path) < 0) { + IBWARN("NodeInfo on %s failed, skipping port", + portid2str(path)); + path->drpath.cnt--; /* restore path */ + return -1; + } + + oldnode = find_existing_node(fabric, &node_buf); + if (oldnode) + remotenode = oldnode; + else if (!(remotenode = create_node(fabric, &node_buf, path, dist + 1))) + IBPANIC("no memory"); + + oldport = find_existing_port_node(remotenode, &port_buf); + if (oldport) { + remoteport = oldport; + } else if (!(remoteport = add_port_to_node(fabric, remotenode, &port_buf))) + IBPANIC("no memory"); + + dump_endnode(path, oldnode ? "known remote" : "new remote", + remotenode, remoteport); + + link_ports(node, port, remotenode, remoteport); + + path->drpath.cnt--; /* restore path */ + return 0; +} + +static void * +ibnd_init_port(char *dev_name, int dev_port) +{ + int mgmt_classes[2] = {IB_SMI_CLASS, IB_SMI_DIRECT_CLASS}; + + /* Crank up the mad lib */ + return (mad_rpc_open_port(dev_name, dev_port, mgmt_classes, 2)); +} + +ibnd_fabric_t * +ibnd_discover_fabric(char *dev_name, int dev_port, int timeout_ms, + ib_portid_t *from, int hops) +{ + ibnd_fabric_t *fabric = NULL; + ib_portid_t my_portid = {0}; + ibnd_node_t node_buf; + ibnd_port_t port_buf; + ibnd_node_t *node; + ibnd_port_t *port; + int i; + int dist = 0; + ib_portid_t *path; + int max_hops = MAXHOPS-1; /* default find everything */ + + /* if not everything how much? */ + if (hops >= 0) { + max_hops = hops; + } + + /* If not specified start from "my" port */ + if (!from) { + from = &my_portid; + } + + fabric = malloc(sizeof(*fabric)); + + if (!fabric) { + IBPANIC("OOM: failed to malloc ibnd_fabric_t\n"); + return (NULL); + } + + memset(fabric, 0, sizeof(*fabric)); + + fabric->ibmad_port = ibnd_init_port(dev_name, dev_port); + if (!fabric->ibmad_port) { + IBPANIC("OOM: failed to open \"%s\" port %d\n", + dev_name, dev_port); + goto error; + } + + IBND_DEBUG("from %s\n", portid2str(from)); + + memset(&node_buf, 0, sizeof(node_buf)); + memset(&port_buf, 0, sizeof(port_buf)); + + if (query_node(fabric, &node_buf, &port_buf, from) < 0) { + IBWARN("can't reach node %s\n", portid2str(from)); + goto error; + } + + node = create_node(fabric, &node_buf, from, 0); + if (!node) + goto error; + + fabric->from_node = node; + + port = add_port_to_node(fabric, node, &port_buf); + if (!port) + IBPANIC("out of memory"); + + if (node->info.type != IBND_SWITCH_NODE && + get_remote_node(fabric, node, port, from, node->info.localport, 0) < 0) + return fabric; + + for (dist = 0; dist <= max_hops; dist++) { + + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { + + path = &node->path_portid; + + IBND_DEBUG("dist %d node %p\n", dist, node); + dump_endnode(path, "processing", node, port); + + for (i = 1; i <= node->info.numports; i++) { + if (i == node->info.localport) + continue; + + if (get_port_info(fabric, &port_buf, i, path) < 0) { + IBWARN("can't reach node %s port %d", portid2str(path), i); + continue; + } + + port = find_existing_port_node(node, &port_buf); + if (port) + continue; + + port = add_port_to_node(fabric, node, &port_buf); + if (!port) + IBPANIC("out of memory"); + + /* If switch, set port GUID to node port GUID */ + if (node->info.type == IBND_SWITCH_NODE) + port->guid = node->info.nodeportguid; + + get_remote_node(fabric, node, port, path, i, dist); + } + } + } + + fabric->chassis = group_nodes(fabric); + + return fabric; +error: + free(fabric); + return (NULL); +} + +static void +destroy_node(ibnd_node_t *node) +{ + int p = 0; + + for (p = 0; p <= node->info.numports; p++) { + free(node->ports[p]); + } + free(node->ports); + + if (node->chrecord) + free(node->chrecord); + free(node); +} + +void +ibnd_destroy_fabric(ibnd_fabric_t *fabric) +{ + int dist = 0; + ibnd_node_t *node = NULL; + ibnd_node_t *next = NULL; + ibnd_chassis_list_t *ch, *ch_next; + + for (dist = 0; dist <= MAXHOPS; dist++) { + node = fabric->nodesdist[dist]; + while (node) { + next = node->dnext; + destroy_node(node); + node = next; + } + } + ch = fabric->first_chassis; + while (ch) { + ch_next = ch->next; + free(ch); + ch = ch_next; + } + free(fabric); + if (fabric->ibmad_port) + mad_rpc_close_port(fabric->ibmad_port); +} + +void +ibnd_debug(int i) +{ + if (i) { + ibdebug++; + madrpc_show_errors(1); + umad_debug(i); + } else { + ibdebug = 0; + madrpc_show_errors(0); + umad_debug(0); + } +} + +void +ibnd_show_progress(int i) +{ + show_progress = i; +} + +const char* +ibnd_node_type_str(ibnd_node_t *node) +{ + switch(node->info.type) { + case IBND_CA_NODE: return "Ca"; + case IBND_SWITCH_NODE: return "Switch"; + case IBND_ROUTER_NODE: return "Router"; + } + return "??"; +} + +const char* +ibnd_node_type_str_short(ibnd_node_t *node) +{ + switch(node->info.type) { + case IBND_SWITCH_NODE: return "SW"; + case IBND_CA_NODE: return "CA"; + case IBND_ROUTER_NODE: return "RT"; + } + return "??"; +} + + +void +ibnd_iter_nodes(ibnd_fabric_t *fabric, + ibnd_iter_func_t func, + void *user_data) +{ + ibnd_node_t *cur = NULL; + + for (cur = fabric->nodes; cur; cur = cur->next) { + func(cur, user_data); + } +} + + +void +ibnd_iter_nodes_type(ibnd_fabric_t *fabric, + ibnd_iter_func_t func, + ibnd_node_type_t node_type, + void *user_data) +{ + ibnd_node_t *list = NULL; + ibnd_node_t *cur = NULL; + + switch (node_type) { + case IBND_SWITCH_NODE: + list = fabric->switches; + break; + case IBND_CA_NODE: + list = fabric->ch_adapters; + break; + case IBND_ROUTER_NODE: + list = fabric->routers; + break; + default: + IBND_DEBUG("Invalid node_type specified %d\n", node_type); + break; + } + + for (cur = list; cur; cur = cur->type_next) { + func(cur, user_data); + } +} + diff --git a/libibnetdisc/src/libibnetdisc.map b/libibnetdisc/src/libibnetdisc.map new file mode 100644 index 0000000..5e8c315 --- /dev/null +++ b/libibnetdisc/src/libibnetdisc.map @@ -0,0 +1,27 @@ +IBNETDISC_1.0 { + global: + ibnd_debug; + ibnd_show_progress; + ibnd_discover_fabric; + ibnd_cache_fabric; + ibnd_read_fabric; + ibnd_destroy_fabric; + ibnd_find_node_guid; + ibnd_update_node; + ibnd_find_node_dr; + ibnd_linkwidth_str; + ibnd_linkspeed_str; + ibnd_node_type_str; + ibnd_node_type_str_short; + ibnd_is_xsigo_guid; + ibnd_is_xsigo_tca; + ibnd_is_xsigo_hca; + ibnd_get_chassis_guid; + ibnd_get_chassis_type; + ibnd_get_chassis_slot_str; + ibnd_linkstate_str; + ibnd_physstate_str; + ibnd_iter_nodes; + ibnd_iter_nodes_type; + local: *; +}; diff --git a/libibnetdisc/test/iblinkinfotest.c b/libibnetdisc/test/iblinkinfotest.c new file mode 100644 index 0000000..7c52a0b --- /dev/null +++ b/libibnetdisc/test/iblinkinfotest.c @@ -0,0 +1,395 @@ +/* + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +char *argv0 = "iblinkinfotest"; +static FILE *f; + +static char *node_name_map_file = NULL; +static nn_map_t *node_name_map = NULL; + +static int timeout_ms = 500; + +static int debug = 0; +#define DEBUG(str, args...) \ + if (debug) fprintf(stderr, str, ##args) + +static int down_links_only = 0; +static int line_mode = 0; +static int add_sw_settings = 0; +static int print_port_guids = 0; + +static unsigned int +get_max(unsigned int num) +{ + unsigned int v = num; // 32-bit word to find the log base 2 of + unsigned r = 0; // r will be lg(v) + + while (v >>= 1) // unroll for more speed... + { + r++; + } + + return (1 << r); +} + +void +get_msg(char *width_msg, char *speed_msg, int msg_size, ibnd_port_t *port) +{ + int max_speed = 0; + + int max_width = get_max(port->info.link_width_supported + & port->remoteport->info.link_width_supported); + if ((max_width & port->info.link_width_active) == 0) { + // we are not at the max supported width + // print what we could be at. + snprintf(width_msg, msg_size, "Could be %s", + ibnd_linkwidth_str(max_width)); + } + + max_speed = get_max(port->info.link_speed_supported + & port->remoteport->info.link_speed_supported); + if ((max_speed & port->info.link_speed_active) == 0) { + // we are not at the max supported speed + // print what we could be at. + snprintf(speed_msg, msg_size, "Could be %s", + ibnd_linkspeed_str(max_speed)); + } +} + +void +print_port(ibnd_node_t *node, ibnd_port_t *port) +{ + char remote_guid_str[256]; + char remote_str[256]; + char link_str[256]; + char width_msg[256]; + char speed_msg[256]; + char ext_port_str[256]; + + if (!port) + return; + + remote_guid_str[0] = '\0'; + remote_str[0] = '\0'; + link_str[0] = '\0'; + width_msg[0] = '\0'; + speed_msg[0] = '\0'; + + if (port->remoteport) { + char remote_name_buf[256]; + strncpy(remote_name_buf, port->remoteport->node->nodedesc, 256); + + if (port->remoteport->ext_portnum) + snprintf(ext_port_str, 256, "%d", port->remoteport->ext_portnum); + else + ext_port_str[0] = '\0'; + + get_msg(width_msg, speed_msg, 256, port); + if (line_mode) { + if (print_port_guids) { + snprintf(remote_guid_str, 256, + "0x%016lx ", + port->remoteport->guid); + } else { + snprintf(remote_guid_str, 256, + "0x%016lx ", + port->remoteport->node->info.nodeguid); + } + } + + snprintf(remote_str, 256, + "%s%6d %4d[%2s] \"%s\" (%s %s)\n", + remote_guid_str, + port->remoteport->info.lid ? + port->remoteport->info.lid : + port->remoteport->node->smalid, + port->remoteport->portnum, + ext_port_str, + remap_node_name(node_name_map, + port->remoteport->node->info.nodeguid, + remote_name_buf), + width_msg, + speed_msg + ); + } else { + snprintf(remote_str, 256, + "%6s %4s[%2s] \"\" ( )\n", "", "", ""); + } + + if (add_sw_settings) { + snprintf(link_str, 256, + "(%3s %s %6s/%8s) (HOQ:%d VL_Stall:%d)", + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active), + ibnd_linkstate_str(port->info.link_state), + ibnd_physstate_str(port->info.phys_state), + port->info.hoq_lifetime, + port->info.vl_stall_count + ); + } else { + snprintf(link_str, 256, + "(%3s %s %6s/%8s)", + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active), + ibnd_linkstate_str(port->info.link_state), + ibnd_physstate_str(port->info.phys_state) + ); + } + + if (port->ext_portnum) + snprintf(ext_port_str, 256, "%d", port->ext_portnum); + else + ext_port_str[0] = '\0'; + + if (line_mode) { + char name_buf[256]; + strncpy(name_buf, node->nodedesc, 256); + printf("0x%016lx \"%30s\" %6d %4d[%2s] ==%s==> %s", + node->info.nodeguid, + remap_node_name(node_name_map, + node->info.nodeguid, + name_buf), + node->smalid, port->portnum, + ext_port_str, + link_str, + remote_str + ); + } else { + printf(" %6d %4d[%2s] ==%s==> %s", + node->smalid, port->portnum, + ext_port_str, + link_str, + remote_str + ); + } +} + +void +print_switch(ibnd_node_t *node, void *user_data) +{ + int i = 0; + + if (!line_mode) { + char name_buf[256]; + strncpy(name_buf, node->nodedesc, 256); + printf("Switch 0x%016lx %s:\n", + node->info.nodeguid, + remap_node_name(node_name_map, + node->info.nodeguid, + name_buf)); + } + + for (i = 1; i <= node->info.numports; i++) { + ibnd_port_t *port = node->ports[i]; + if (!port) + continue; + if (!down_links_only || port->info.link_state == IBND_LINK_DOWN) { + print_port(node, port); + } + } +} + +void +usage(void) +{ + fprintf(stderr, + "Usage: %s [-hclp -S -D -C -P ]\n" + " Report link speed and connection for each port of each switch which is active\n" + " -h This help message\n" + " -S output only the node specified by guid\n" + " -D print only node specified by \n" + " -f specify node to start \"from\"\n" + " -n Number of hops to include away from specified node\n" + " -d print only down links\n" + " -l (line mode) print all information for each link on each line\n" + " -p print additional switch settings (PktLifeTime,HoqLife,VLStallCount)\n" + + + " -t timeout for any single fabric query\n" + " -s show errors\n" + " --node-name-map use specified node name map\n" + + " -C use selected Channel Adaptor name for queries\n" + " -P use selected channel adaptor port for queries\n" + " -g print port guids instead of node guids\n" + " --debug print debug messages\n" + , + argv0); + exit(-1); +} + +int +main(int argc, char **argv) +{ + char *ca = 0; + int ca_port = 0; + ibnd_fabric_t *fabric = NULL; + uint64_t guid = 0; + char *dr_path = NULL; + char *from = NULL; + int hops = 0; + ib_portid_t port_id; + + static char const str_opts[] = "S:D:n:C:P:t:sldgphuf:"; + static const struct option long_opts[] = { + { "S", 1, 0, 'S'}, + { "D", 1, 0, 'D'}, + { "num-hops", 1, 0, 'n'}, + { "down-links-only", 0, 0, 'd'}, + { "line-mode", 0, 0, 'l'}, + { "ca-name", 1, 0, 'C'}, + { "ca-port", 1, 0, 'P'}, + { "timeout", 1, 0, 't'}, + { "show", 0, 0, 's'}, + { "print-port-guids", 0, 0, 'g'}, + { "print-additional", 0, 0, 'p'}, + { "help", 0, 0, 'h'}, + { "usage", 0, 0, 'u'}, + { "node-name-map", 1, 0, 1}, + { "debug", 0, 0, 2}, + { "from", 1, 0, 'f'}, + { } + }; + + f = stdout; + + argv0 = argv[0]; + + while (1) { + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); + if ( ch == -1 ) + break; + switch(ch) { + case 1: + node_name_map_file = strdup(optarg); + break; + case 2: + debug = 1; + ibnd_debug(1); + break; + case 'f': + from = strdup(optarg); + break; + case 'C': + ca = strdup(optarg); + break; + case 'P': + ca_port = strtoul(optarg, 0, 0); + break; + case 'D': + dr_path = strdup(optarg); + break; + case 'n': + hops = (int)strtol(optarg, NULL, 0); + break; + case 'd': + down_links_only = 1; + break; + case 'l': + line_mode = 1; + break; + case 't': + timeout_ms = strtoul(optarg, 0, 0); + break; + case 'g': + print_port_guids = 1; + break; + case 'S': + guid = (uint64_t)strtoull(optarg, 0, 0); + break; + case 'p': + add_sw_settings = 1; + break; + default: + usage(); + break; + } + } + argc -= optind; + argv += optind; + + if (argc && !(f = fopen(argv[0], "w"))) + fprintf(stderr, "can't open file %s for writing", argv[0]); + + node_name_map = open_node_name_map(node_name_map_file); + + if (from) { + /* only scan part of the fabric */ + str2drpath(&(port_id.drpath), from, 0, 0); + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, &port_id, hops)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } + guid = 0; + } else { + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } + } + + if (guid) { + ibnd_node_t *sw = ibnd_find_node_guid(fabric, guid); + print_switch(sw, NULL); + } else if (dr_path) { + ibnd_node_t *sw = ibnd_find_node_dr(fabric, dr_path); + print_switch(sw, NULL); + } else { + ibnd_iter_nodes_type(fabric, print_switch, IBND_SWITCH_NODE, NULL); + } + + ibnd_destroy_fabric(fabric); + + close_node_name_map(node_name_map); + exit(0); +} diff --git a/libibnetdisc/test/ibnetdisctest.c b/libibnetdisc/test/ibnetdisctest.c new file mode 100644 index 0000000..e4088da --- /dev/null +++ b/libibnetdisc/test/ibnetdisctest.c @@ -0,0 +1,588 @@ +/* + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#define LIST_CA_NODE (1 << IBND_CA_NODE) +#define LIST_SWITCH_NODE (1 << IBND_SWITCH_NODE) +#define LIST_ROUTER_NODE (1 << IBND_ROUTER_NODE) + +char *argv0 = "ibnetdiscover"; +static FILE *f; + +static char *node_name_map_file = NULL; +static nn_map_t *node_name_map = NULL; + +static int timeout_ms = 2000; +static int dumplevel = 0; + +static int debug = 0; +#define DEBUG(str, args...) \ + if (debug) fprintf(stderr, str, ##args) + +char * +node_name(ibnd_node_t *node) +{ + static char buf[256]; + + switch(node->info.type) { + case IBND_CA_NODE: + sprintf(buf, "\"%s", "H"); + break; + case IBND_SWITCH_NODE: + sprintf(buf, "\"%s", "S"); + break; + case IBND_ROUTER_NODE: + sprintf(buf, "\"%s", "R"); + break; + default: + sprintf(buf, "\"%s", "?"); + break; + } + sprintf(buf+2, "-%016" PRIx64 "\"", node->info.nodeguid); + + return buf; +} + +void +list_node(ibnd_node_t *node, void *user_data) +{ + char *nodename = remap_node_name(node_name_map, node->info.nodeguid, + node->nodedesc); + + fprintf(f, "%s\t : 0x%016" PRIx64 " ports %d devid 0x%x vendid 0x%x \"%s\"\n", + ibnd_node_type_str(node), + node->info.nodeguid, node->info.numports, node->info.devid, + node->info.vendid, + nodename); + + free(nodename); +} + +void +list_nodes(ibnd_fabric_t *fabric, int list) +{ + if (list & LIST_CA_NODE) { + ibnd_iter_nodes_type(fabric, list_node, IBND_CA_NODE, NULL); + } + if (list & LIST_SWITCH_NODE) { + ibnd_iter_nodes_type(fabric, list_node, IBND_SWITCH_NODE, NULL); + } + if (list & LIST_ROUTER_NODE) { + ibnd_iter_nodes_type(fabric, list_node, IBND_ROUTER_NODE, NULL); + } +} + +void +out_ids(ibnd_node_t *node, int group, char *chname) +{ + fprintf(f, "\nvendid=0x%x\ndevid=0x%x\n", node->info.vendid, node->info.devid); + if (node->info.sysimgguid) + fprintf(f, "sysimgguid=0x%" PRIx64, node->info.sysimgguid); + if (group + && node->chrecord && node->chrecord->chassisnum) { + fprintf(f, "\t\t# Chassis %d", node->chrecord->chassisnum); + if (chname) + fprintf(f, " (%s)", clean_nodedesc(chname)); + if (ibnd_is_xsigo_tca(node->info.nodeguid) + && node->ports[1] + && node->ports[1]->remoteport) + fprintf(f, " slot %d", node->ports[1]->remoteport->portnum); + } + fprintf(f, "\n"); +} + + +uint64_t +out_chassis(ibnd_fabric_t *fabric, int chassisnum) +{ + uint64_t guid; + + fprintf(f, "\nChassis %d", chassisnum); + guid = ibnd_get_chassis_guid(fabric, chassisnum); + if (guid) + fprintf(f, " (guid 0x%" PRIx64 ")", guid); + fprintf(f, "\n"); + return guid; +} + +void +out_switch(ibnd_node_t *node, int group, char *chname) +{ + char *str; + char str2[256]; + char *nodename = NULL; + + out_ids(node, group, chname); + fprintf(f, "switchguid=0x%" PRIx64, node->info.nodeguid); + fprintf(f, "(%" PRIx64 ")", node->info.nodeportguid); + if (group) { + str = ibnd_get_chassis_type(node); + if (str) + fprintf(f, "%s ", str); + str = ibnd_get_chassis_slot_str(node, str2, 256); + if (str) + fprintf(f, "%s ", str); + } + + nodename = remap_node_name(node_name_map, node->info.nodeguid, + node->nodedesc); + + fprintf(f, "\nSwitch\t%d %s\t\t# \"%s\" %s port 0 lid %d lmc %d\n", + node->info.numports, node_name(node), + nodename, + node->sw_info.smaenhsp0 ? "enhanced" : "base", + node->smalid, node->smalmc); + + free(nodename); +} + +void +out_ca(ibnd_node_t *node, int group, char *chname) +{ + char *node_type; + char *node_type2; + + out_ids(node, group, chname); + switch(node->info.type) { + case IBND_CA_NODE: + node_type = "ca"; + node_type2 = "Ca"; + break; + case IBND_ROUTER_NODE: + node_type = "rt"; + node_type2 = "Rt"; + break; + default: + node_type = "???"; + node_type2 = "???"; + break; + } + + fprintf(f, "%sguid=0x%" PRIx64 "\n", node_type, node->info.nodeguid); + fprintf(f, "%s\t%d %s\t\t# \"%s\"", + node_type2, node->info.numports, node_name(node), + clean_nodedesc(node->nodedesc)); + if (group && ibnd_is_xsigo_hca(node->info.nodeguid)) + fprintf(f, " (scp)"); + fprintf(f, "\n"); +} + +#define OUT_BUFFER_SIZE 16 +static char * +out_ext_port(ibnd_port_t *port, int group) +{ + static char mapping[OUT_BUFFER_SIZE]; + + if (group && port->ext_portnum != 0) { + snprintf(mapping, OUT_BUFFER_SIZE, + "[ext %d]", port->ext_portnum); + } + + return (mapping); +} + +void +out_switch_port(ibnd_port_t *port, int group) +{ + char *ext_port_str = NULL; + char *rem_nodename = NULL; + + DEBUG("port %p:%d remoteport %p\n", port, port->portnum, port->remoteport); + fprintf(f, "[%d]", port->portnum); + + ext_port_str = out_ext_port(port, group); + if (ext_port_str) + fprintf(f, "%s", ext_port_str); + + rem_nodename = remap_node_name(node_name_map, + port->remoteport->node->info.nodeguid, + port->remoteport->node->nodedesc); + + ext_port_str = out_ext_port(port->remoteport, group); + fprintf(f, "\t%s[%d]%s", + node_name(port->remoteport->node), + port->remoteport->portnum, + ext_port_str ? ext_port_str : ""); + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) + fprintf(f, "(%" PRIx64 ") ", port->remoteport->guid); + fprintf(f, "\t\t# \"%s\" lid %d %s%s", + rem_nodename, + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active)); + + if (ibnd_is_xsigo_tca(port->remoteport->guid)) + fprintf(f, " slot %d", port->portnum); + else if (ibnd_is_xsigo_hca(port->remoteport->guid)) + fprintf(f, " (scp)"); + fprintf(f, "\n"); + + free(rem_nodename); +} + +void +out_ca_port(ibnd_port_t *port, int group) +{ + char *str = NULL; + char *rem_nodename = NULL; + + fprintf(f, "[%d]", port->portnum); + if (port->node->info.type != IBND_SWITCH_NODE) + fprintf(f, "(%" PRIx64 ") ", port->guid); + fprintf(f, "\t%s[%d]", + node_name(port->remoteport->node), + port->remoteport->portnum); + str = out_ext_port(port->remoteport, group); + if (str) + fprintf(f, "%s", str); + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) + fprintf(f, " (%" PRIx64 ") ", port->remoteport->guid); + + rem_nodename = remap_node_name(node_name_map, + port->remoteport->node->info.nodeguid, + port->remoteport->node->nodedesc); + + fprintf(f, "\t\t# lid %d lmc %d \"%s\" lid %d %s%s\n", + port->info.lid, port->info.lmc, rem_nodename, + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active)); + + free(rem_nodename); +} + +int +dump_topology(int group, ibnd_fabric_t *fabric) +{ + ibnd_node_t *node; + ibnd_port_t *port; + int i = 0, dist = 0, p = 0; + time_t t = time(0); + uint64_t chguid; + char *chname = NULL; + + fprintf(f, "#\n# Topology file: generated on %s#\n", ctime(&t)); + fprintf(f, "# Max of %d hops discovered\n", fabric->maxhops_discovered); + fprintf(f, "# Initiated from node %016" PRIx64 " port %016" PRIx64 "\n", + fabric->from_node->info.nodeguid, fabric->from_node->info.nodeportguid); + + /* Make pass on switches */ + if (group) { + ibnd_chassis_list_t *ch = NULL; + + /* Chassis based switches first */ + for (ch = fabric->chassis; ch; ch = ch->next) { + int n = 0; + + if (!ch->chassisnum) + continue; + chguid = out_chassis(fabric, ch->chassisnum); + + chname = NULL; +/** + * Hal will this work for Xsigo? + */ + if (ibnd_is_xsigo_guid(chguid)) { + for (node = ch->nodes; node; node = node->chassis_next) { + if (ibnd_is_xsigo_hca(node->info.nodeguid)) { + chname = node->nodedesc; + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); + } + } + +#if 0 +/** + * vs. this? + */ + for (node = fabric->nodesdist[MAXHOPS]; node; node = node->dnext) { + if (!node->chrecord || + !node->chrecord->chassisnum) + continue; + + if (node->chrecord->chassisnum != ch->chassisnum) + continue; + + if (ibnd_is_xsigo_hca(node->nodeguid)) { + chname = node->nodedesc; + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); + } + } +#endif + } + + fprintf(f, "\n# Spine Nodes"); + for (n = 1; n <= (SPINES_MAX_NUM+1); n++) { + if (ch->spinenode[n]) { + out_switch(ch->spinenode[n], group, chname); + for (p = 1; p <= ch->spinenode[n]->info.numports; p++) { + port = ch->spinenode[n]->ports[p]; + if (port && port->remoteport) + out_switch_port(port, group); + } + } + } + fprintf(f, "\n# Line Nodes"); + for (n = 1; n <= (LINES_MAX_NUM+1); n++) { + if (ch->linenode[n]) { + out_switch(ch->linenode[n], group, chname); + for (p = 1; p <= ch->linenode[n]->info.numports; p++) { + port = ch->linenode[n]->ports[p]; + if (port && port->remoteport) + out_switch_port(port, group); + } + } + } + + fprintf(f, "\n# Chassis Switches"); + for (node = ch->nodes; node; node = node->chassis_next) { + if (node->info.type == IBND_SWITCH_NODE) { + out_switch(node, group, chname); + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_switch_port(port, group); + } + } + } + + fprintf(f, "\n# Chassis CAs"); + for (node = ch->nodes; node; node = node->chassis_next) { + if (node->info.type == IBND_CA_NODE) { + out_ca(node, group, chname); + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_ca_port(port, group); + } + } + } + + } + + } else { /* !group */ + for (node = fabric->switches; node; node = node->type_next) { + DEBUG("SWITCH: dist %d node %p\n", dist, node); + out_switch(node, group, chname); + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_switch_port(port, group); + } + } + } + + chname = NULL; + if (group) { + fprintf(f, "\nNon-Chassis Nodes\n"); + for (node = fabric->switches; node; node = node->type_next) { + DEBUG("SWITCH: dist %d node %p\n", dist, node); + /* Now, skip chassis based switches */ + if (node->chrecord && + node->chrecord->chassisnum) + continue; + out_switch(node, group, chname); + + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_switch_port(port, group); + } + } + + } + + /* Make pass on CAs */ + for (node = fabric->ch_adapters; node; node = node->type_next) { + DEBUG("CA: dist %d node %p\n", dist, node); + /* Now, skip chassis based CAs */ + if (group && node->chrecord && + node->chrecord->chassisnum) + continue; + out_ca(node, group, chname); + + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_ca_port(port, group); + } + } + + /* make pass on routers */ + for (node = fabric->routers; node; node = node->type_next) { + DEBUG("RT: dist %d node %p\n", dist, node); + /* Now, skip chassis based CAs */ + if (group && node->chrecord && + node->chrecord->chassisnum) + continue; + out_ca(node, group, chname); + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_ca_port(port, group); + } + } + + return i; +} + +void +usage(void) +{ + fprintf(stderr, "Usage: %s [-d(ebug)] -s(how) -l(ist) -g(rouping) -H(ca_list) -S(witch_list) -R(outer_list) -V(ersion) -C ca_name -P ca_port " + "-t(imeout) timeout_ms --node-name-map node-name-map] -p(orts) []\n", + argv0); + fprintf(stderr, " --node-name-map specify a node name map file\n"); + exit(-1); +} + +int +main(int argc, char **argv) +{ + int list = 0; + char *ca = 0; + int ca_port = 0; + int group = 0; + int ports_report = 0; + ibnd_fabric_t *fabric = NULL; + + static char const str_opts[] = "C:P:t:devslgHSRpVhu"; + static const struct option long_opts[] = { + { "C", 1, 0, 'C'}, + { "P", 1, 0, 'P'}, + { "debug", 0, 0, 'd'}, + { "show", 0, 0, 's'}, + { "list", 0, 0, 'l'}, + { "grouping", 0, 0, 'g'}, + { "Hca_list", 0, 0, 'H'}, + { "Switch_list", 0, 0, 'S'}, + { "Router_list", 0, 0, 'R'}, + { "timeout", 1, 0, 't'}, + { "node-name-map", 1, 0, 1}, + { "ports", 0, 0, 'p'}, + { "help", 0, 0, 'h'}, + { "usage", 0, 0, 'u'}, + { } + }; + + f = stdout; + + argv0 = argv[0]; + + while (1) { + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); + if ( ch == -1 ) + break; + switch(ch) { + case 1: + node_name_map_file = strdup(optarg); + break; + case 'C': + ca = optarg; + break; + case 'P': + ca_port = strtoul(optarg, 0, 0); + break; + case 'd': + debug = 1; + ibnd_debug(1); + break; + case 't': + timeout_ms = strtoul(optarg, 0, 0); + break; + case 's': + dumplevel = 1; + break; + case 'l': + list = LIST_CA_NODE | LIST_SWITCH_NODE | LIST_ROUTER_NODE; + break; + case 'g': + group = 1; + break; + case 'S': + list |= LIST_SWITCH_NODE; + break; + case 'H': + list |= LIST_CA_NODE; + break; + case 'R': + list |= LIST_ROUTER_NODE; + break; + case 'p': + ports_report = 1; + break; + default: + usage(); + break; + } + } + argc -= optind; + argv += optind; + + if (argc && !(f = fopen(argv[0], "w"))) + fprintf(stderr, "can't open file %s for writing", argv[0]); + + node_name_map = open_node_name_map(node_name_map_file); + + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } + + if (list) + list_nodes(fabric, list); + else + dump_topology(group, fabric); + + ibnd_destroy_fabric(fabric); + close_node_name_map(node_name_map); + exit(0); +} diff --git a/libibnetdisc/test/testleaks.c b/libibnetdisc/test/testleaks.c new file mode 100644 index 0000000..4c10afb --- /dev/null +++ b/libibnetdisc/test/testleaks.c @@ -0,0 +1,261 @@ +/* + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +char *argv0 = "iblinkinfotest"; +static FILE *f; + +static int timeout_ms = 500; + +void +print_port(ibnd_node_t *node, ibnd_port_t *port) +{ + char remote_guid_str[256]; + char remote_str[256]; + char link_str[256]; + char speed_msg[256]; + char ext_port_str[256]; + + if (!port) + return; + + remote_guid_str[0] = '\0'; + remote_str[0] = '\0'; + link_str[0] = '\0'; + speed_msg[0] = '\0'; + + if (port->remoteport) { + char remote_name_buf[256]; + strncpy(remote_name_buf, port->remoteport->node->nodedesc, 256); + + if (port->remoteport->ext_portnum) + snprintf(ext_port_str, 256, "%d", port->remoteport->ext_portnum); + else + ext_port_str[0] = '\0'; + + snprintf(remote_str, 256, + "%s%6d %4d[%2s] \"%s\" (%s)\n", + remote_guid_str, + port->remoteport->info.lid ? + port->remoteport->info.lid : + port->remoteport->node->smalid, + port->remoteport->portnum, + ext_port_str, + port->remoteport->node->nodedesc, + speed_msg + ); + } else { + snprintf(remote_str, 256, + "%6s %4s[%2s] \"\" ( )\n", "", "", ""); + } + + snprintf(link_str, 256, + "(%3s %s %6s/%8s)", + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active), + ibnd_linkstate_str(port->info.link_state), + ibnd_physstate_str(port->info.phys_state) + ); + + if (port->ext_portnum) + snprintf(ext_port_str, 256, "%d", port->ext_portnum); + else + ext_port_str[0] = '\0'; + + printf(" %6d %4d[%2s] ==%s==> %s", + node->smalid, port->portnum, + ext_port_str, + link_str, + remote_str + ); +} + +void +print_switch(ibnd_node_t *node, void *user_data) +{ + int i = 0; + + for (i = 1; i <= node->info.numports; i++) { + ibnd_port_t *port = node->ports[i]; + if (!port) + continue; + if (port->info.link_state == IBND_LINK_DOWN) { + print_port(node, port); + } + } +} + +void +usage(void) +{ + fprintf(stderr, + "Usage: %s [-hclp -S -D -C -P ]\n" + " Report link speed and connection for each port of each switch which is active\n" + " -h This help message\n" + " -S output only the node specified by guid\n" + " -D print only node specified by \n" + " -f specify node to start \"from\"\n" + " -n Number of hops to include away from specified node\n" + + " -t timeout for any single fabric query\n" + " -s show errors\n" + + " -C use selected Channel Adaptor name for queries\n" + " -P use selected channel adaptor port for queries\n" + " --debug print debug messages\n" + , + argv0); + exit(-1); +} + +int +main(int argc, char **argv) +{ + char *ca = 0; + int ca_port = 0; + ibnd_fabric_t *fabric = NULL; + uint64_t guid = 0; + char *dr_path = NULL; + char *from = NULL; + int hops = 0; + ib_portid_t port_id; + + static char const str_opts[] = "S:D:n:C:P:t:shuf:"; + static const struct option long_opts[] = { + { "S", 1, 0, 'S'}, + { "D", 1, 0, 'D'}, + { "num-hops", 1, 0, 'n'}, + { "ca-name", 1, 0, 'C'}, + { "ca-port", 1, 0, 'P'}, + { "timeout", 1, 0, 't'}, + { "show", 0, 0, 's'}, + { "help", 0, 0, 'h'}, + { "usage", 0, 0, 'u'}, + { "debug", 0, 0, 2}, + { "from", 1, 0, 'f'}, + { } + }; + + f = stdout; + + argv0 = argv[0]; + + while (1) { + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); + if ( ch == -1 ) + break; + switch(ch) { + case 2: + ibnd_debug(1); + break; + case 'f': + from = strdup(optarg); + break; + case 'C': + ca = strdup(optarg); + break; + case 'P': + ca_port = strtoul(optarg, 0, 0); + break; + case 'D': + dr_path = strdup(optarg); + break; + case 'n': + hops = (int)strtol(optarg, NULL, 0); + break; + case 't': + timeout_ms = strtoul(optarg, 0, 0); + break; + case 'S': + guid = (uint64_t)strtoull(optarg, 0, 0); + break; + default: + usage(); + break; + } + } + argc -= optind; + argv += optind; + + while (1) { + if (from) { + /* only scan part of the fabric */ + str2drpath(&(port_id.drpath), from, 0, 0); + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, &port_id, hops)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } + guid = 0; + } else { + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } + } + +#if 0 + if (guid) { + ibnd_node_t *sw = ibnd_find_node_guid(fabric, guid); + print_switch(sw, NULL); + } else if (dr_path) { + ibnd_node_t *sw = ibnd_find_node_dr(fabric, dr_path); + print_switch(sw, NULL); + } else { + ibnd_iter_nodes_type(fabric, print_switch, IBND_SWITCH_NODE, NULL); + } +#endif + + ibnd_destroy_fabric(fabric); + } + + exit(0); +} -- 1.5.4.5 From weiny2 at llnl.gov Thu Nov 20 16:38:14 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 20 Nov 2008 16:38:14 -0800 Subject: [ofa-general] [PATCH 2/3] Convert iblinkinfo.pl to C and use new ibnetdisc library. Message-ID: <20081120163814.3e7c0c78.weiny2@llnl.gov> >From b1c2cc8f96a3d88f2ef341ebee0b550fd5bd2a7b Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Thu, 20 Nov 2008 08:45:00 -0800 Subject: [PATCH] Convert iblinkinfo.pl to C and use new ibnetdisc library. Signed-off-by: Ira Weiny --- infiniband-diags/Makefile.am | 9 +- infiniband-diags/configure.in | 2 + infiniband-diags/scripts/iblinkinfo.pl | 327 -------------------------- infiniband-diags/src/iblinkinfo.c | 393 ++++++++++++++++++++++++++++++++ 4 files changed, 402 insertions(+), 329 deletions(-) delete mode 100755 infiniband-diags/scripts/iblinkinfo.pl create mode 100644 infiniband-diags/src/iblinkinfo.c diff --git a/infiniband-diags/Makefile.am b/infiniband-diags/Makefile.am index c22ba5e..8f26749 100644 --- a/infiniband-diags/Makefile.am +++ b/infiniband-diags/Makefile.am @@ -10,7 +10,7 @@ endif sbin_PROGRAMS = src/ibaddr src/ibnetdiscover src/ibping src/ibportstate \ src/ibroute src/ibstat src/ibsysstat src/ibtracert \ src/perfquery src/sminfo src/smpdump src/smpquery \ - src/saquery src/vendstat + src/saquery src/vendstat src/iblinkinfo.pl if ENABLE_TEST_UTILS sbin_PROGRAMS += src/ibsendtrap src/mcm_rereg_test @@ -27,7 +27,7 @@ sbin_SCRIPTS = scripts/ibcheckerrs scripts/ibchecknet scripts/ibchecknode \ scripts/dump_lfts.sh scripts/dump_mfts.sh \ scripts/set_nodedesc.sh \ scripts/ibqueryerrors.pl scripts/ibswportwatch.pl \ - scripts/iblinkinfo.pl scripts/ibprintswitch.pl \ + scripts/ibprintswitch.pl \ scripts/ibprintca.pl scripts/ibprintrt.pl \ scripts/ibfindnodesusing.pl scripts/ibidsverify.pl \ scripts/check_lft_balance.pl @@ -39,6 +39,11 @@ src_ibnetdiscover_SOURCES = src/ibnetdiscover.c src/grouping.c src/ibdiag_common src_ibnetdiscover_CFLAGS = -Wall $(DBGFLAGS) src_ibnetdiscover_LDFLAGS = -Wl,--rpath -Wl,$(libdir) +src_iblinkinfo_pl_SOURCES = src/iblinkinfo.c +src_iblinkinfo_pl_CFLAGS = -Wall $(DBGFLAGS) +src_iblinkinfo_pl_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ + -libcommon -libnetdisc + src_ibping_SOURCES = src/ibping.c src/ibdiag_common.c src_ibping_CFLAGS = -Wall $(DBGFLAGS) diff --git a/infiniband-diags/configure.in b/infiniband-diags/configure.in index d227219..46021d6 100644 --- a/infiniband-diags/configure.in +++ b/infiniband-diags/configure.in @@ -46,6 +46,8 @@ AC_CHECK_LIB(osmvendor, osmv_query_sa, [], AC_MSG_ERROR([osmv_query_sa() not found. diags require libosmvendor.]), [-lopensm]) AC_CHECK_LIB(opensm, osm_log_init_v2, [], AC_MSG_ERROR([osm_log_init_v2() not found. diags require libopensm.])) +AC_CHECK_LIB(ibnetdisc, ibnd_discover_fabric, [], + AC_MSG_ERROR([ibnd_discover_fabric() not found. diags require libibnetdisc.])) fi dnl Checks for header files. diff --git a/infiniband-diags/scripts/iblinkinfo.pl b/infiniband-diags/scripts/iblinkinfo.pl deleted file mode 100755 index b6b27ce..0000000 --- a/infiniband-diags/scripts/iblinkinfo.pl +++ /dev/null @@ -1,327 +0,0 @@ -#!/usr/bin/perl -# -# Copyright (c) 2006 The Regents of the University of California. -# Copyright (c) 2007-2008 Voltaire, Inc. All rights reserved. -# -# Produced at Lawrence Livermore National Laboratory. -# Written by Ira Weiny . -# -# This software is available to you under a choice of one of two -# licenses. You may choose to be licensed under the terms of the GNU -# General Public License (GPL) Version 2, available from the file -# COPYING in the main directory of this source tree, or the -# OpenIB.org BSD license below: -# -# Redistribution and use in source and binary forms, with or -# without modification, are permitted provided that the following -# conditions are met: -# -# - Redistributions of source code must retain the above -# copyright notice, this list of conditions and the following -# disclaimer. -# -# - Redistributions in binary form must reproduce the above -# copyright notice, this list of conditions and the following -# disclaimer in the documentation and/or other materials -# provided with the distribution. -# -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, -# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF -# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND -# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS -# BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN -# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN -# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -# SOFTWARE. -# - -use strict; - -use Getopt::Std; -use IBswcountlimits; - -sub usage_and_exit -{ - my $prog = $_[0]; - print -"Usage: $prog [-Rhclp -S -D -C -P ]\n"; - print -" Report link speed and connection for each port of each switch which is active\n"; - print " -h This help message\n"; - print -" -R Recalculate ibnetdiscover information (Default is to reuse ibnetdiscover output)\n"; - print -" -D output only the switch specified by direct route path\n"; - print " -S output only the switch specified by (hex format)\n"; - print " -d print only down links\n"; - print - " -l (line mode) print all information for each link on each line\n"; - print -" -p print additional switch settings (PktLifeTime,HoqLife,VLStallCount)\n"; - print " -c print port capabilities (enabled/supported values)\n"; - print " -C use selected Channel Adaptor name for queries\n"; - print " -P use selected channel adaptor port for queries\n"; - print " -g print port guids instead of node guids\n"; - exit 2; -} - -my $argv0 = `basename $0`; -my $regenerate_map = undef; -my $single_switch = undef; -my $direct_route = undef; -my $line_mode = undef; -my $print_add_switch = undef; -my $print_extended_cap = undef; -my $only_down_links = undef; -my $ca_name = ""; -my $ca_port = ""; -my $print_port_guids = undef; -my $switch_found = "no"; -chomp $argv0; - -if (!getopts("hcpldRS:D:C:P:g")) { usage_and_exit $argv0; } -if (defined $Getopt::Std::opt_h) { usage_and_exit $argv0; } -if (defined $Getopt::Std::opt_D) { $direct_route = $Getopt::Std::opt_D; } -if (defined $Getopt::Std::opt_R) { $regenerate_map = $Getopt::Std::opt_R; } -if (defined $Getopt::Std::opt_S) { - $single_switch = format_guid($Getopt::Std::opt_S); -} -if (defined $Getopt::Std::opt_d) { $only_down_links = $Getopt::Std::opt_d; } -if (defined $Getopt::Std::opt_l) { $line_mode = $Getopt::Std::opt_l; } -if (defined $Getopt::Std::opt_p) { $print_add_switch = $Getopt::Std::opt_p; } -if (defined $Getopt::Std::opt_c) { $print_extended_cap = $Getopt::Std::opt_c; } -if (defined $Getopt::Std::opt_C) { $ca_name = $Getopt::Std::opt_C; } -if (defined $Getopt::Std::opt_P) { $ca_port = $Getopt::Std::opt_P; } -if (defined $Getopt::Std::opt_g) { $print_port_guids = $Getopt::Std::opt_g; } - -my $extra_smpquery_params = get_ca_name_port_param_string($ca_name, $ca_port); - -sub main -{ - get_link_ends($regenerate_map, $ca_name, $ca_port); - if (defined($direct_route)) { - # convert DR to guid, then use original single_switch option - $single_switch = convert_dr_to_guid($direct_route); - if (!defined($single_switch) || !is_switch($single_switch)) { - printf("The direct route (%s) does not map to a switch.\n", - $direct_route); - return; - } - } - foreach my $switch (sort (keys(%IBswcountlimits::link_ends))) { - if ($single_switch && $switch ne $single_switch) { - next; - } else { - $switch_found = "yes"; - } - my $switch_prompt = "no"; - my $num_ports = get_num_ports($switch, $ca_name, $ca_port); - if ($num_ports == 0) { - printf("ERROR: switch $switch has 0 ports???\n"); - } - my @output_lines = undef; - my $pkt_lifetime = ""; - my $pkt_life_prompt = ""; - my $port_timeouts = ""; - my $print_switch = "yes"; - if ($only_down_links) { $print_switch = "no"; } - if ($print_add_switch) { - my $data = `smpquery $extra_smpquery_params -G switchinfo $switch`; - if ($data eq "") { - printf("ERROR: failed to get switchinfo for $switch\n"); - } - my @lines = split("\n", $data); - foreach my $line (@lines) { - if ($line =~ /^LifeTime:\.+(.*)/) { $pkt_lifetime = $1; } - } - $pkt_life_prompt = sprintf(" (LT: %2s)", $pkt_lifetime); - } - foreach my $port (1 .. $num_ports) { - my $hr = $IBswcountlimits::link_ends{$switch}{$port}; - if ($switch_prompt eq "no" && !$line_mode) { - my $switch_name = ""; - my $tmp_port = $port; - while ($switch_name eq "" && $tmp_port <= $num_ports) { - # the first port is down find switch name with up port - my $hr = $IBswcountlimits::link_ends{$switch}{$tmp_port}; - $switch_name = $hr->{loc_desc}; - $tmp_port++; - } - if ($switch_name eq "") { - printf( - "WARNING: Switch Name not found for $switch\n"); - } - push( - @output_lines, - sprintf( - "Switch %18s %s%s:\n", - $switch, $switch_name, $pkt_life_prompt - ) - ); - $switch_prompt = "yes"; - } - my $data = - `smpquery $extra_smpquery_params -G portinfo $switch $port`; - if ($data eq "") { - printf( - "ERROR: failed to get portinfo for $switch port $port\n"); - } - my @lines = split("\n", $data); - my $speed = ""; - my $speed_sup = ""; - my $speed_enable = ""; - my $width = ""; - my $width_sup = ""; - my $width_enable = ""; - my $state = ""; - my $hoq_life = ""; - my $vl_stall = ""; - my $phy_link_state = ""; - - foreach my $line (@lines) { - if ($line =~ /^LinkSpeedActive:\.+(.*)/) { $speed = $1; } - if ($line =~ /^LinkSpeedEnabled:\.+(.*)/) { - $speed_enable = $1; - } - if ($line =~ /^LinkSpeedSupported:\.+(.*)/) { $speed_sup = $1; } - if ($line =~ /^LinkWidthActive:\.+(.*)/) { $width = $1; } - if ($line =~ /^LinkWidthEnabled:\.+(.*)/) { - $width_enable = $1; - } - if ($line =~ /^LinkWidthSupported:\.+(.*)/) { $width_sup = $1; } - if ($line =~ /^LinkState:\.+(.*)/) { $state = $1; } - if ($line =~ /^HoqLife:\.+(.*)/) { $hoq_life = $1; } - if ($line =~ /^VLStallCount:\.+(.*)/) { $vl_stall = $1; } - if ($line =~ /^PhysLinkState:\.+(.*)/) { $phy_link_state = $1; } - } - my $rem_port = $hr->{rem_port}; - my $rem_lid = $hr->{rem_lid}; - my $rem_speed_sup = ""; - my $rem_speed_enable = ""; - my $rem_width_sup = ""; - my $rem_width_enable = ""; - if ($rem_lid ne "" && $rem_port ne "") { - $data = - `smpquery $extra_smpquery_params portinfo $rem_lid $rem_port`; - if ($data eq "") { - printf( - "ERROR: failed to get portinfo for $switch port $port\n" - ); - } - my @lines = split("\n", $data); - foreach my $line (@lines) { - if ($line =~ /^LinkSpeedEnabled:\.+(.*)/) { - $rem_speed_enable = $1; - } - if ($line =~ /^LinkSpeedSupported:\.+(.*)/) { - $rem_speed_sup = $1; - } - if ($line =~ /^LinkWidthEnabled:\.+(.*)/) { - $rem_width_enable = $1; - } - if ($line =~ /^LinkWidthSupported:\.+(.*)/) { - $rem_width_sup = $1; - } - } - } - my $capabilities = ""; - if ($print_extended_cap) { - $capabilities = sprintf("(%3s %s %6s / %8s [%s/%s][%s/%s])", - $width, $speed, $state, $phy_link_state, $width_enable, - $width_sup, $speed_enable, $speed_sup); - } else { - $capabilities = sprintf("(%3s %s %6s / %8s)", - $width, $speed, $state, $phy_link_state); - } - if ($print_add_switch) { - $port_timeouts = - sprintf(" (HOQ:%s VL_Stall:%s)", $hoq_life, $vl_stall); - } - if (!$only_down_links || ($only_down_links && $state eq "Down")) { - my $width_msg = ""; - my $speed_msg = ""; - if ($rem_width_enable ne "" && $rem_width_sup ne "") { - if ( $width_enable =~ /12X/ - && $rem_width_enable =~ /12X/ - && $width !~ /12X/) - { - $width_msg = "Could be 12X"; - } else { - if ( $width_enable =~ /8X/ - && $rem_width_enable =~ /8X/ - && $width !~ /8X/) - { - $width_msg = "Could be 8X"; - } else { - if ( $width_enable =~ /4X/ - && $rem_width_enable =~ /4X/ - && $width !~ /4X/) - { - $width_msg = "Could be 4X"; - } - } - } - } - if ($rem_speed_enable ne "" && $rem_speed_sup ne "") { - if ( $speed_enable =~ /10\.0/ - && $rem_speed_enable =~ /10\.0/ - && $speed !~ /10\.0/) - { - $speed_msg = "Could be 10.0 Gbps"; - } else { - if ( $speed_enable =~ /5\.0/ - && $rem_speed_enable =~ /5\.0/ - && $speed !~ /5\.0/) - { - $speed_msg = "Could be 5.0 Gbps"; - } - } - } - - if ($line_mode) { - my $line_begin = sprintf("%18s \"%30s\"%s", - $switch, $hr->{loc_desc}, $pkt_life_prompt); - my $ext_guid = sprintf("%18s", $hr->{rem_guid}); - if ($print_port_guids && $hr->{rem_port_guid} ne "") { - $ext_guid = sprintf("0x%016s", $hr->{rem_port_guid}); - } - push( - @output_lines, - sprintf( -"%s %6s %4s[%2s] ==%s%s==> %18s %6s %4s[%2s] \"%s\" ( %s %s)\n", - $line_begin, $hr->{loc_sw_lid}, - $port, $hr->{loc_ext_port}, - $capabilities, $port_timeouts, - $ext_guid, $hr->{rem_lid}, - $hr->{rem_port}, $hr->{rem_ext_port}, - $hr->{rem_desc}, $width_msg, - $speed_msg - ) - ); - } else { - push( - @output_lines, - sprintf( -" %6s %4s[%2s] ==%s%s==> %6s %4s[%2s] \"%s\" ( %s %s)\n", - $hr->{loc_sw_lid}, $port, - $hr->{loc_ext_port}, $capabilities, - $port_timeouts, $hr->{rem_lid}, - $hr->{rem_port}, $hr->{rem_ext_port}, - $hr->{rem_desc}, $width_msg, - $speed_msg - ) - ); - } - $print_switch = "yes"; - } - } - if ($print_switch eq "yes") { - foreach my $line (@output_lines) { print $line; } - } - } - if ($single_switch && $switch_found ne "yes") { - printf("Switch \"%s\" not found.\n", $single_switch); - } -} -main; - diff --git a/infiniband-diags/src/iblinkinfo.c b/infiniband-diags/src/iblinkinfo.c new file mode 100644 index 0000000..1d503bb --- /dev/null +++ b/infiniband-diags/src/iblinkinfo.c @@ -0,0 +1,393 @@ +/* + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +char *argv0 = "iblinkinfotest"; +static FILE *f; + +static char *node_name_map_file = NULL; +static nn_map_t *node_name_map = NULL; + +static int timeout_ms = 500; + +static int down_links_only = 0; +static int line_mode = 0; +static int add_sw_settings = 0; +static int print_port_guids = 0; + +static unsigned int +get_max(unsigned int num) +{ + unsigned int v = num; // 32-bit word to find the log base 2 of + unsigned r = 0; // r will be lg(v) + + while (v >>= 1) // unroll for more speed... + { + r++; + } + + return (1 << r); +} + +void +get_msg(char *width_msg, char *speed_msg, int msg_size, ibnd_port_t *port) +{ + int max_speed = 0; + + int max_width = get_max(port->info.link_width_supported + & port->remoteport->info.link_width_supported); + if ((max_width & port->info.link_width_active) == 0) { + // we are not at the max supported width + // print what we could be at. + snprintf(width_msg, msg_size, "Could be %s", + ibnd_linkwidth_str(max_width)); + } + + max_speed = get_max(port->info.link_speed_supported + & port->remoteport->info.link_speed_supported); + if ((max_speed & port->info.link_speed_active) == 0) { + // we are not at the max supported speed + // print what we could be at. + snprintf(speed_msg, msg_size, "Could be %s", + ibnd_linkspeed_str(max_speed)); + } +} + +void +print_port(ibnd_node_t *node, ibnd_port_t *port) +{ + char remote_guid_str[256]; + char remote_str[256]; + char link_str[256]; + char width_msg[256]; + char speed_msg[256]; + char ext_port_str[256]; + + if (!port) + return; + + remote_guid_str[0] = '\0'; + remote_str[0] = '\0'; + link_str[0] = '\0'; + width_msg[0] = '\0'; + speed_msg[0] = '\0'; + + if (port->remoteport) { + char remote_name_buf[256]; + strncpy(remote_name_buf, port->remoteport->node->nodedesc, 256); + + if (port->remoteport->ext_portnum) + snprintf(ext_port_str, 256, "%d", port->remoteport->ext_portnum); + else + ext_port_str[0] = '\0'; + + get_msg(width_msg, speed_msg, 256, port); + if (line_mode) { + if (print_port_guids) { + snprintf(remote_guid_str, 256, + "0x%016lx ", + port->remoteport->guid); + } else { + snprintf(remote_guid_str, 256, + "0x%016lx ", + port->remoteport->node->info.nodeguid); + } + } + + snprintf(remote_str, 256, + "%s%6d %4d[%2s] \"%s\" (%s %s)\n", + remote_guid_str, + port->remoteport->info.lid ? + port->remoteport->info.lid : + port->remoteport->node->smalid, + port->remoteport->portnum, + ext_port_str, + remap_node_name(node_name_map, + port->remoteport->node->info.nodeguid, + remote_name_buf), + width_msg, + speed_msg + ); + } else { + snprintf(remote_str, 256, + "%6s %4s[%2s] \"\" ( )\n", "", "", ""); + } + + if (add_sw_settings) { + snprintf(link_str, 256, + "(%3s %s %6s/%8s) (HOQ:%d VL_Stall:%d)", + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active), + ibnd_linkstate_str(port->info.link_state), + ibnd_physstate_str(port->info.phys_state), + port->info.hoq_lifetime, + port->info.vl_stall_count + ); + } else { + snprintf(link_str, 256, + "(%3s %s %6s/%8s)", + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active), + ibnd_linkstate_str(port->info.link_state), + ibnd_physstate_str(port->info.phys_state) + ); + } + + if (port->ext_portnum) + snprintf(ext_port_str, 256, "%d", port->ext_portnum); + else + ext_port_str[0] = '\0'; + + if (line_mode) { + char name_buf[256]; + strncpy(name_buf, node->nodedesc, 256); + printf("0x%016lx \"%30s\" %6d %4d[%2s] ==%s==> %s", + node->info.nodeguid, + remap_node_name(node_name_map, + node->info.nodeguid, + name_buf), + node->smalid, port->portnum, + ext_port_str, + link_str, + remote_str + ); + } else { + printf(" %6d %4d[%2s] ==%s==> %s", + node->smalid, port->portnum, + ext_port_str, + link_str, + remote_str + ); + } +} + +void +print_switch(ibnd_node_t *node, void *user_data) +{ + int i = 0; + + if (!line_mode) { + char name_buf[256]; + strncpy(name_buf, node->nodedesc, 256); + printf("Switch 0x%016lx %s:\n", + node->info.nodeguid, + remap_node_name(node_name_map, + node->info.nodeguid, + name_buf)); + } + + for (i = 1; i <= node->info.numports; i++) { + ibnd_port_t *port = node->ports[i]; + if (!port) + continue; + if (!down_links_only || port->info.link_state == IBND_LINK_DOWN) { + print_port(node, port); + } + } +} + +void +usage(void) +{ + fprintf(stderr, + "Usage: %s [-hclp -S -D -C -P ]\n" + " Report link speed and connection for each port of each switch which is active\n" + " -h This help message\n" + " -S output only the node specified by guid\n" + " -D print only node specified by \n" + " -f specify node to start \"from\"\n" + " -n Number of hops to include away from specified node\n" + " -d print only down links\n" + " -l (line mode) print all information for each link on each line\n" + " -p print additional switch settings (PktLifeTime,HoqLife,VLStallCount)\n" + + + " -t timeout for any single fabric query\n" + " -s show progress during scan\n" + " --node-name-map use specified node name map\n" + + " -C use selected Channel Adaptor name for queries\n" + " -P use selected channel adaptor port for queries\n" + " -g print port guids instead of node guids\n" + " --debug print debug messages\n" + , + argv0); + exit(-1); +} + +int +main(int argc, char **argv) +{ + char *ca = 0; + int ca_port = 0; + ibnd_fabric_t *fabric = NULL; + uint64_t guid = 0; + char *dr_path = NULL; + char *from = NULL; + int hops = 0; + ib_portid_t port_id; + + static char const str_opts[] = "S:D:n:C:P:t:sldgphuf:"; + static const struct option long_opts[] = { + { "S", 1, 0, 'S'}, + { "D", 1, 0, 'D'}, + { "num-hops", 1, 0, 'n'}, + { "down-links-only", 0, 0, 'd'}, + { "line-mode", 0, 0, 'l'}, + { "ca-name", 1, 0, 'C'}, + { "ca-port", 1, 0, 'P'}, + { "timeout", 1, 0, 't'}, + { "show", 0, 0, 's'}, + { "print-port-guids", 0, 0, 'g'}, + { "print-additional", 0, 0, 'p'}, + { "help", 0, 0, 'h'}, + { "usage", 0, 0, 'u'}, + { "node-name-map", 1, 0, 1}, + { "debug", 0, 0, 2}, + { "from", 1, 0, 'f'}, + { } + }; + + f = stdout; + + argv0 = argv[0]; + + while (1) { + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); + if ( ch == -1 ) + break; + switch(ch) { + case 1: + node_name_map_file = strdup(optarg); + break; + case 2: + ibnd_debug(1); + break; + case 'f': + from = strdup(optarg); + break; + case 'C': + ca = strdup(optarg); + break; + case 'P': + ca_port = strtoul(optarg, 0, 0); + break; + case 'D': + dr_path = strdup(optarg); + break; + case 'n': + hops = (int)strtol(optarg, NULL, 0); + break; + case 'd': + down_links_only = 1; + break; + case 'l': + line_mode = 1; + break; + case 't': + timeout_ms = strtoul(optarg, 0, 0); + break; + case 's': + ibnd_show_progress(1); + break; + case 'g': + print_port_guids = 1; + break; + case 'S': + guid = (uint64_t)strtoull(optarg, 0, 0); + break; + case 'p': + add_sw_settings = 1; + break; + default: + usage(); + break; + } + } + argc -= optind; + argv += optind; + + if (argc && !(f = fopen(argv[0], "w"))) + fprintf(stderr, "can't open file %s for writing", argv[0]); + + node_name_map = open_node_name_map(node_name_map_file); + + if (from) { + /* only scan part of the fabric */ + str2drpath(&(port_id.drpath), from, 0, 0); + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, &port_id, hops)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } + guid = 0; + } else { + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } + } + + if (guid) { + ibnd_node_t *sw = ibnd_find_node_guid(fabric, guid); + print_switch(sw, NULL); + } else if (dr_path) { + ibnd_node_t *sw = ibnd_find_node_dr(fabric, dr_path); + print_switch(sw, NULL); + } else { + ibnd_iter_nodes_type(fabric, print_switch, IBND_SWITCH_NODE, NULL); + } + + ibnd_destroy_fabric(fabric); + + close_node_name_map(node_name_map); + exit(0); +} -- 1.5.4.5 From ddiss at sgi.com Thu Nov 20 22:10:45 2008 From: ddiss at sgi.com (David Disseldorp) Date: Fri, 21 Nov 2008 17:10:45 +1100 Subject: [ofa-general] [PATCH] iser: avoid recv buf exhaustion Message-ID: <1227247845-16023-1-git-send-email-ddiss@sgi.com> iSCSI/iSER targets may send PDUs without a prior request from the initiator, RFC 5046 refers to these PDUs as "unexpected". NOP-In PDUs with itt=RESERVED and Asynchronous Message PDUs occupy this category. The amount of active "unexpected" PDU's an iSER target may have at any time is governed by the MaxOutstandingUnexpectedPDUs key, which is not yet supported. Currently when an iSER target sends an "unexpected" PDU, the initiators recv buffer consumed by the PDU is not replaced. If over initial_post_recv_bufs_num "unexpected" PDUs are received then the receive queue will run out of receive work requests. This patch ensures recv buffers consumed by "unexpected" PDUs are replaced prior to sending the next control-type PDU. Signed-off-by: David Disseldorp Signed-off-by: Ken Sandars --- drivers/infiniband/ulp/iser/iscsi_iser.h | 3 + drivers/infiniband/ulp/iser/iser_initiator.c | 76 ++++++++++++++++++++++++-- drivers/infiniband/ulp/iser/iser_verbs.c | 1 + 3 files changed, 74 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 81a8262..8611195 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -252,6 +252,9 @@ struct iser_conn { wait_queue_head_t wait; /* waitq for conn/disconn */ atomic_t post_recv_buf_count; /* posted rx count */ atomic_t post_send_buf_count; /* posted tx count */ + atomic_t unexpected_pdu_count;/* count of received * + * unexpected pdus * + * not yet retired */ char name[ISER_OBJECT_NAME_SIZE]; struct iser_page_vec *page_vec; /* represents SG to fmr maps* * maps serialized as tx is*/ diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index cdd2831..9f8cffb 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -274,8 +274,10 @@ int iser_conn_set_full_featured_mode(struct iscsi_conn *conn) struct iscsi_iser_conn *iser_conn = conn->dd_data; int i; - /* no need to keep it in a var, we are after login so if this should - * be negotiated, by now the result should be available here */ + /* + * FIXME this value should be declared to the target during login with + * the MaxOutstandingUnexpectedPDUs key when supported + */ int initial_post_recv_bufs_num = ISER_MAX_RX_MISC_PDUS; iser_dbg("Initially post: %d\n", initial_post_recv_bufs_num); @@ -310,6 +312,33 @@ iser_check_xmit(struct iscsi_conn *conn, void *task) return 0; } +static inline int +iser_post_unexpected_recvs(struct iscsi_conn *conn) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + int outstanding_unexp_pdus; + int err = 0; + + if (atomic_read(&iser_conn->ib_conn->unexpected_pdu_count) == 0) + goto out; + + outstanding_unexp_pdus = + atomic_xchg(&iser_conn->ib_conn->unexpected_pdu_count, 0); + + while (outstanding_unexp_pdus > 0) { + if (iser_post_receive_control(conn) != 0) { + iser_err("post_rcv failed\n"); + err = -ENOMEM; + atomic_add(outstanding_unexp_pdus, + &iser_conn->ib_conn->unexpected_pdu_count); + goto out; + } + outstanding_unexp_pdus--; + } + +out: + return err; +} /** * iser_send_command - send command PDU @@ -372,6 +401,7 @@ int iser_send_command(struct iscsi_conn *conn, iser_reg_single(iser_conn->ib_conn->device, send_dto->regd[0], DMA_TO_DEVICE); + /* post recv buffer for SCSI response */ if (iser_post_receive_control(conn) != 0) { iser_err("post_recv failed!\n"); err = -ENOMEM; @@ -380,6 +410,12 @@ int iser_send_command(struct iscsi_conn *conn, iser_task->status = ISER_TASK_STATUS_STARTED; + /* + * post recv bufs for those consumed by unexpected pdus from target + * errors are ignored, as retry occurs on next send + */ + iser_post_unexpected_recvs(conn); + err = iser_post_send(&iser_task->desc); if (!err) return 0; @@ -478,6 +514,7 @@ int iser_send_control(struct iscsi_conn *conn, int err = 0; struct iser_regd_buf *regd_buf; struct iser_device *device; + unsigned char opcode; if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); @@ -512,12 +549,24 @@ int iser_send_control(struct iscsi_conn *conn, data_seg_len); } - if (iser_post_receive_control(conn) != 0) { - iser_err("post_rcv_buff failed!\n"); - err = -ENOMEM; - goto send_control_error; + opcode = task->hdr->opcode & ISCSI_OPCODE_MASK; + + /* post recv buffer for response if one is expected */ + if (!((opcode == ISCSI_OP_NOOP_OUT) + && (task->hdr->itt == RESERVED_ITT))) { + if (iser_post_receive_control(conn) != 0) { + iser_err("post_rcv_buff failed!\n"); + err = -ENOMEM; + goto send_control_error; + } } + /* + * post recv bufs for those consumed by unexpected pdus from target + * errors are ignored, as retry occurs on next send + */ + iser_post_unexpected_recvs(conn); + err = iser_post_send(mdesc); if (!err) return 0; @@ -586,6 +635,21 @@ void iser_rcv_completion(struct iser_desc *rx_desc, * parallel to the execution of iser_conn_term. So the code that waits * * for the posted rx bufs refcount to become zero handles everything */ atomic_dec(&conn->ib_conn->post_recv_buf_count); + + /* + * if an unexpected PDU was received then the recv wr consumed must + * be replaced, this is done in the next send of a control-type PDU + */ + if ((opcode == ISCSI_OP_NOOP_IN) + && (hdr->itt == RESERVED_ITT)) { + /* nop-in with itt = 0xffffffff */ + atomic_inc(&conn->ib_conn->unexpected_pdu_count); + } + else if (opcode == ISCSI_OP_ASYNC_EVENT) { + /* asyncronous message */ + atomic_inc(&conn->ib_conn->unexpected_pdu_count); + } + /* a reject PDU consumes the recv buf posted for the response */ } void iser_snd_completion(struct iser_desc *tx_desc) diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 26ff621..6dc6b17 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -498,6 +498,7 @@ void iser_conn_init(struct iser_conn *ib_conn) init_waitqueue_head(&ib_conn->wait); atomic_set(&ib_conn->post_recv_buf_count, 0); atomic_set(&ib_conn->post_send_buf_count, 0); + atomic_set(&ib_conn->unexpected_pdu_count, 0); atomic_set(&ib_conn->refcount, 1); INIT_LIST_HEAD(&ib_conn->conn_list); spin_lock_init(&ib_conn->lock); -- 1.5.4.5 From jackm at dev.mellanox.co.il Thu Nov 20 23:02:01 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Fri, 21 Nov 2008 09:02:01 +0200 Subject: [ofa-general] Re: Race condition in userspace libraries with create/destroy qp In-Reply-To: References: <200811201211.46527.jackm@dev.mellanox.co.il> Message-ID: <200811210902.03040.jackm@dev.mellanox.co.il> On Friday 21 November 2008 00:50, Roland Dreier wrote: > > 2. Create a mutex for this purpose, and use it to force the create and destroy qp operations >  >    to be atomic WRT  the ibv_cmd_xxx_qp operations and the store/clear qp operations. > > This looks like the best solution. > > I wonder if we should just add this synchronization in libibverbs rather > than individual drivers?  I notice that libcxgb3 seems to have the same > bug AFAICS.  But maybe it's better to just keep the simple rule that > driver libraries are responsible for locking their own data structures. > Thanks for responding so quickly! I prefer to keep the rule that low-level driver libraries are responsible. Its not clear that all low-level drivers necessarily have this issue. BTW, I notice that there is a ctx->qp_table_mutex (used only in file libmlx4/src/qp.c). What if I steal that and move its use upwards into procedures mlx4_create_qp/mlx4_destroy_qp? (a bit cheesy, but it saves creating yet another mutex in the mlx4 user context). - Jack From sashak at voltaire.com Fri Nov 21 01:28:37 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 21 Nov 2008 11:28:37 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_trap_rcv.c disable the port with the least hop count In-Reply-To: <49251926.9090509@gmail.com> References: <49251926.9090509@gmail.com> Message-ID: <20081121092837.GA6965@sashak.voltaire.com> On 10:00 Thu 20 Nov , Eli Dorfman wrote: > disable the port with the least hop count. > this will address the case of inter switch link where the > most remote port (from opensm) is sending traps. > in that case we would like to disable the nearest switch port (from opensm). > > Signed-off-by: Eli Dorfman Applied. Thanks. Sasha From sashak at voltaire.com Fri Nov 21 01:29:16 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 21 Nov 2008 11:29:16 +0200 Subject: [ofa-general] Re: [PATCH v2] opensm/osm_state_mgr.c: bug fix in unicast cache In-Reply-To: <492520CF.4080001@dev.mellanox.co.il> References: <492520CF.4080001@dev.mellanox.co.il> Message-ID: <20081121092916.GB6965@sashak.voltaire.com> On 10:33 Thu 20 Nov , Yevgeny Kliteynik wrote: > Hi Sasha, > > When there are errors during initialization and new > heavy sweep is forced, unicast cache might hold a > snapshot of the previous routing, and since there > might be no *topology* changes, unicast cache will > apply that cached routing, which might be wrong. > > This patch invalidates cache explicitly if there > were initialization errors in addition to few other > cases. > > V2: don't invalidate cache when > opt.force_heavy_sweep is on. > > This fix addresses bug #1398. > > Signed-off-by: Yevgeny Kliteynik Applied. Thanks. Sasha From sashak at voltaire.com Fri Nov 21 01:45:14 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 21 Nov 2008 11:45:14 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_trap_rcv.c disable the port with the least hop count In-Reply-To: <49251926.9090509@gmail.com> References: <49251926.9090509@gmail.com> Message-ID: <20081121094514.GC6965@sashak.voltaire.com> Hi Eli, On 10:00 Thu 20 Nov , Eli Dorfman wrote: > disable the port with the least hop count. > this will address the case of inter switch link where the > most remote port (from opensm) is sending traps. > in that case we would like to disable the nearest switch port (from opensm). > > Signed-off-by: Eli Dorfman I applied the patch. However have some question. > --- > opensm/opensm/osm_trap_rcv.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/opensm/opensm/osm_trap_rcv.c b/opensm/opensm/osm_trap_rcv.c > index 07c5183..d1dfbd4 100644 > --- a/opensm/opensm/osm_trap_rcv.c > +++ b/opensm/opensm/osm_trap_rcv.c > @@ -239,8 +239,8 @@ static int disable_port(osm_sm_t *sm, osm_physp_t *p) > ib_port_info_t *pi = (ib_port_info_t *)payload; > int ret; > > - /* in case of endport - disable switch's peer port */ > - if (osm_node_get_type(p->p_node) != IB_NODE_TYPE_SWITCH) > + /* select the nearest port to master opensm */ > + if (p->dr_path.hop_count > p->p_remote_physp->dr_path.hop_count) > p = p->p_remote_physp; Is it possible that this noisy port is switch external port, "the nearest" to OpenSM node and doesn't have remote port (due to unstable link)? We saw such cases in practice and it is handled by OpenSM in a light sweep (see __osm_state_mgr_get_remote_port_info() calls in __osm_state_mgr_light_sweep_start() function). With endports check only is is impossible IMO, but with I don't see that it cannot happen with switch ports. Right? If so then maybe the code should look like: if (p->p_remote_physp && p->dr_path.hop_count > p->p_remote_physp->dr_path.hop_count) p = p->p_remote_physp; Sasha > > /* If trap 131, might want to disable peer port if available */ > -- > 1.5.5 > From vlad at lists.openfabrics.org Fri Nov 21 03:26:33 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Fri, 21 Nov 2008 03:26:33 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081121-0200 daily build status Message-ID: <20081121112633.D12CAE60AE8@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From hal.rosenstock at gmail.com Fri Nov 21 04:25:23 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Fri, 21 Nov 2008 07:25:23 -0500 Subject: ***SPAM*** Re: [ofa-general] [PATCH 0/3] ibnetdiscover library "libibnetdisc" In-Reply-To: <20081120163809.26a3c499.weiny2@llnl.gov> References: <20081120163809.26a3c499.weiny2@llnl.gov> Message-ID: Hi Ira, On Thu, Nov 20, 2008 at 7:38 PM, Ira Weiny wrote: > The following 3 patches implement "libibnetdisc" which provides the > functionality of ibnetdiscover in a C library. > > I mentioned this to Sasha at the last Sonoma conference and posted the bulk of > this code to the list a few months ago. This libary is still providing the 85% > performance speed up of iblinkinfo.pl on our clusters. > > This new series is heavily tested and, for our hardware, preserves the > functionality of ibnetdiscover. Since I don't have a Xsigo box to test on I > can only verify that it compiles correctly. Have you also verified this QLogic/Silverstorm and Cisco chassis switches ? They were supported too. -- Hal > Ira > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From fenkes at de.ibm.com Fri Nov 21 07:37:14 2008 From: fenkes at de.ibm.com (Joachim Fenkes) Date: Fri, 21 Nov 2008 16:37:14 +0100 Subject: [ofa-general] [PATCH] IB/ehca: Fix lockdep failures for shca_list_lock In-Reply-To: <48499C11.7030504@gmail.com> References: <200806061835.43802.fenkes@de.ibm.com> <48499C11.7030504@gmail.com> Message-ID: <200811211637.15300.fenkes@de.ibm.com> From: Michael Ellerman shca_list_lock is taken from softirq context in ehca_poll_eqs, so we need to lock IRQ safe elsewhere. Signed-off-by: Michael Ellerman Acked-by: Joachim Fenkes --- drivers/infiniband/hw/ehca/ehca_main.c | 17 ++++++++++------- 1 files changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index bb02a86..021c454 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -717,6 +717,7 @@ static int __devinit ehca_probe(struct of_device *dev, const u64 *handle; struct ib_pd *ibpd; int ret, i, eq_size; + u64 flags; handle = of_get_property(dev->node, "ibm,hca-handle", NULL); if (!handle) { @@ -830,9 +831,9 @@ static int __devinit ehca_probe(struct of_device *dev, ehca_err(&shca->ib_device, "Cannot create device attributes ret=%d", ret); - spin_lock(&shca_list_lock); + spin_lock_irqsave(&shca_list_lock, flags); list_add(&shca->shca_list, &shca_list); - spin_unlock(&shca_list_lock); + spin_unlock_irqrestore(&shca_list_lock, flags); return 0; @@ -878,6 +879,7 @@ probe1: static int __devexit ehca_remove(struct of_device *dev) { struct ehca_shca *shca = dev->dev.driver_data; + u64 flags; int ret; sysfs_remove_group(&dev->dev.kobj, &ehca_dev_attr_grp); @@ -915,9 +917,9 @@ static int __devexit ehca_remove(struct of_device *dev) ib_dealloc_device(&shca->ib_device); - spin_lock(&shca_list_lock); + spin_lock_irqsave(&shca_list_lock, flags); list_del(&shca->shca_list); - spin_unlock(&shca_list_lock); + spin_unlock_irqrestore(&shca_list_lock, flags); return ret; } @@ -975,6 +977,7 @@ static int ehca_mem_notifier(struct notifier_block *nb, unsigned long action, void *data) { static unsigned long ehca_dmem_warn_time; + unsigned long flags; switch (action) { case MEM_CANCEL_OFFLINE: @@ -985,12 +988,12 @@ static int ehca_mem_notifier(struct notifier_block *nb, case MEM_GOING_ONLINE: case MEM_GOING_OFFLINE: /* only ok if no hca is attached to the lpar */ - spin_lock(&shca_list_lock); + spin_lock_irqsave(&shca_list_lock, flags); if (list_empty(&shca_list)) { - spin_unlock(&shca_list_lock); + spin_unlock_irqrestore(&shca_list_lock, flags); return NOTIFY_OK; } else { - spin_unlock(&shca_list_lock); + spin_unlock_irqrestore(&shca_list_lock, flags); if (printk_timed_ratelimit(&ehca_dmem_warn_time, 30 * 1000)) ehca_gen_err("DMEM operations are not allowed" -- 1.5.5 From fenkes at de.ibm.com Fri Nov 21 08:18:16 2008 From: fenkes at de.ibm.com (Joachim Fenkes) Date: Fri, 21 Nov 2008 17:18:16 +0100 Subject: [ofa-general] [PATCH] IB/ehca: Fix locking for shca_list_lock In-Reply-To: <1227283347.3599.8.camel@johannes.berg> References: <200806061835.43802.fenkes@de.ibm.com> <200811211637.15300.fenkes@de.ibm.com> <1227283347.3599.8.camel@johannes.berg> Message-ID: <200811211718.17489.fenkes@de.ibm.com> shca_list_lock is taken from softirq context in ehca_poll_eqs, so we need to lock IRQ safe elsewhere. Signed-off-by: Michael Ellerman Signed-off-by: Joachim Fenkes --- On Friday 21 November 2008 17:02, Johannes Berg wrote: > On Fri, 2008-11-21 at 16:37 +0100, Joachim Fenkes wrote: > > > + u64 flags; > > > - spin_lock(&shca_list_lock); > > + spin_lock_irqsave(&shca_list_lock, flags); > > That's wrong and I think will give a warning on all machines where > u64 != unsigned long. Might not particularly matter in this case. Doesn't matter for a ppc64 only driver, but you're right nonetheless. Thanks. > Also, generally it seems wrong to say "fix lockdep failure" when the > patch really fixes a bug that lockdep happened to find. Whatever -- changed. Here's the updated patch. Regards, Joachim drivers/infiniband/hw/ehca/ehca_main.c | 17 ++++++++++------- 1 files changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index bb02a86..169aa1a 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -717,6 +717,7 @@ static int __devinit ehca_probe(struct of_device *dev, const u64 *handle; struct ib_pd *ibpd; int ret, i, eq_size; + unsigned long flags; handle = of_get_property(dev->node, "ibm,hca-handle", NULL); if (!handle) { @@ -830,9 +831,9 @@ static int __devinit ehca_probe(struct of_device *dev, ehca_err(&shca->ib_device, "Cannot create device attributes ret=%d", ret); - spin_lock(&shca_list_lock); + spin_lock_irqsave(&shca_list_lock, flags); list_add(&shca->shca_list, &shca_list); - spin_unlock(&shca_list_lock); + spin_unlock_irqrestore(&shca_list_lock, flags); return 0; @@ -878,6 +879,7 @@ probe1: static int __devexit ehca_remove(struct of_device *dev) { struct ehca_shca *shca = dev->dev.driver_data; + unsigned long flags; int ret; sysfs_remove_group(&dev->dev.kobj, &ehca_dev_attr_grp); @@ -915,9 +917,9 @@ static int __devexit ehca_remove(struct of_device *dev) ib_dealloc_device(&shca->ib_device); - spin_lock(&shca_list_lock); + spin_lock_irqsave(&shca_list_lock, flags); list_del(&shca->shca_list); - spin_unlock(&shca_list_lock); + spin_unlock_irqrestore(&shca_list_lock, flags); return ret; } @@ -975,6 +977,7 @@ static int ehca_mem_notifier(struct notifier_block *nb, unsigned long action, void *data) { static unsigned long ehca_dmem_warn_time; + unsigned long flags; switch (action) { case MEM_CANCEL_OFFLINE: @@ -985,12 +988,12 @@ static int ehca_mem_notifier(struct notifier_block *nb, case MEM_GOING_ONLINE: case MEM_GOING_OFFLINE: /* only ok if no hca is attached to the lpar */ - spin_lock(&shca_list_lock); + spin_lock_irqsave(&shca_list_lock, flags); if (list_empty(&shca_list)) { - spin_unlock(&shca_list_lock); + spin_unlock_irqrestore(&shca_list_lock, flags); return NOTIFY_OK; } else { - spin_unlock(&shca_list_lock); + spin_unlock_irqrestore(&shca_list_lock, flags); if (printk_timed_ratelimit(&ehca_dmem_warn_time, 30 * 1000)) ehca_gen_err("DMEM operations are not allowed" -- 1.5.5 From johannes at sipsolutions.net Fri Nov 21 08:02:27 2008 From: johannes at sipsolutions.net (Johannes Berg) Date: Fri, 21 Nov 2008 17:02:27 +0100 Subject: [ofa-general] Re: [PATCH] IB/ehca: Fix lockdep failures for shca_list_lock In-Reply-To: <200811211637.15300.fenkes@de.ibm.com> References: <200806061835.43802.fenkes@de.ibm.com> <48499C11.7030504@gmail.com> <200811211637.15300.fenkes@de.ibm.com> Message-ID: <1227283347.3599.8.camel@johannes.berg> On Fri, 2008-11-21 at 16:37 +0100, Joachim Fenkes wrote: > + u64 flags; > - spin_lock(&shca_list_lock); > + spin_lock_irqsave(&shca_list_lock, flags); That's wrong and I think will give a warning on all machines where u64 != unsigned long. Might not particularly matter in this case. Also, generally it seems wrong to say "fix lockdep failure" when the patch really fixes a bug that lockdep happened to find. johannes -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part URL: From rdreier at cisco.com Fri Nov 21 10:28:03 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 21 Nov 2008 10:28:03 -0800 Subject: [ofa-general] Re: [PATCH] IB/ehca: Fix locking for shca_list_lock In-Reply-To: <200811211718.17489.fenkes@de.ibm.com> (Joachim Fenkes's message of "Fri, 21 Nov 2008 17:18:16 +0100") References: <200806061835.43802.fenkes@de.ibm.com> <200811211637.15300.fenkes@de.ibm.com> <1227283347.3599.8.camel@johannes.berg> <200811211718.17489.fenkes@de.ibm.com> Message-ID: Looks good... I'll add this for 2.6.29, since as far as I can tell this bug has been there approximately forever already. From rdreier at cisco.com Fri Nov 21 10:44:01 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 21 Nov 2008 10:44:01 -0800 Subject: [ofa-general] Re: Race condition in userspace libraries with create/destroy qp In-Reply-To: <200811210902.03040.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Fri, 21 Nov 2008 09:02:01 +0200") References: <200811201211.46527.jackm@dev.mellanox.co.il> <200811210902.03040.jackm@dev.mellanox.co.il> Message-ID: > I prefer to keep the rule that low-level driver libraries are responsible. > Its not clear that all low-level drivers necessarily have this issue. Yes, makes sense to me. > BTW, I notice that there is a ctx->qp_table_mutex (used only in file > libmlx4/src/qp.c). What if I steal that and move its use upwards into > procedures mlx4_create_qp/mlx4_destroy_qp? (a bit cheesy, but it saves > creating yet another mutex in the mlx4 user context). Actually I don't think it's cheesy at all -- expanding the region where qp_table_mutex is held to avoid this bug makes perfect sense to me and seems like a clean solution. - R. From sashak at voltaire.com Fri Nov 21 11:24:28 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 21 Nov 2008 21:24:28 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_switch.h: use updated LFT for routing In-Reply-To: <492550E3.90805@dev.mellanox.co.il> References: <492550E3.90805@dev.mellanox.co.il> Message-ID: <20081121192428.GB8310@sashak.voltaire.com> Hi Yevgeny, On 13:58 Thu 20 Nov , Yevgeny Kliteynik wrote: > > Function osm_switch_get_port_by_lid() was using the switch's > LFT, so this LFT might not be updated to recent routing. I guess it could be only with 'subnet_initialization_error' flag up (failed LinFwdTbl set will trigger this flag). > I think that this was also relevant before the LFT simplification. Yes, logically it should be so, but... > One immediate outcome of this bug is opensm.fdbs file - when it > is dumped from the switch LFT (and not from lft_buf), Why this bug is triggered only now? > it sometimes > doesn't match the lst file. What this "sometimes" mean? I think the case should be investigated deeper. By such patch we are just trying to hide a possible issue. As far as I understand opensm.fdbs (and other routing dump) are generated only after all LinFwdTbl responses are arrived, when some of them failed 'subnet_initialization_error' flag is up and OpenSM will resweep. If so why is 'opensm.fdbs' broken? It is not immediately clear for me. Sasha > > Signed-off-by: Yevgeny Kliteynik > --- > opensm/include/opensm/osm_switch.h | 6 +++++- > 1 files changed, 5 insertions(+), 1 deletions(-) > > diff --git a/opensm/include/opensm/osm_switch.h b/opensm/include/opensm/osm_switch.h > index caa0bc5..f06931c 100644 > --- a/opensm/include/opensm/osm_switch.h > +++ b/opensm/include/opensm/osm_switch.h > @@ -411,7 +411,11 @@ osm_switch_get_port_by_lid(IN const osm_switch_t * const p_sw, > { > if (lid_ho == 0 || lid_ho > IB_LID_UCAST_END_HO) > return OSM_NO_PATH; > - return p_sw->lft[lid_ho]; > + > + if (p_sw->lft_buf) > + return p_sw->lft_buf[lid_ho]; > + else > + return p_sw->lft[lid_ho]; > } > /* > * PARAMETERS > -- > 1.5.1.4 > > From chien.tin.tung at intel.com Fri Nov 21 12:50:38 2008 From: chien.tin.tung at intel.com (Chien Tung) Date: Fri, 21 Nov 2008 14:50:38 -0600 Subject: [ofa-general] [PATCH 01/10] RDMA/nes: Cleanup cqp_request list usage Message-ID: <20081121205038.GA5976@ctung-MOBL> From: Faisal Latif RDMA/nes: Cleanup cqp_request list usage Use nes_free_cqp_request() from commit 1ff66e8c1faee7c2711b84b9c89e1c5fcd767839. Change some continue to break in nes_cm_timer_tick. Send_entry was a list processed in a loop, thus continue. Now it is a single item, changing continue to break to be semantically correct. Signed-off-by: Faisal Latif Signed-off-by: Chien Tung -- Roland, This patch series is a continuation of nes_cm rework/bugfix. Most of them deal with resource management and shutdown issues. They have been tested with Intel MPI/DAPL and proved to scale much better than current code base. Please consider them for 2.6.28. Regards, Chien diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 2caf9da..2a1d6c7 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -519,7 +519,7 @@ static void nes_cm_timer_tick(unsigned long pass) do { send_entry = cm_node->send_entry; if (!send_entry) - continue; + break; if (time_after(send_entry->timetosend, jiffies)) { if (cm_node->state != NES_CM_STATE_TSA) { if ((nexttimeout > @@ -528,18 +528,18 @@ static void nes_cm_timer_tick(unsigned long pass) nexttimeout = send_entry->timetosend; settimer = 1; - continue; + break; } } else { free_retrans_entry(cm_node); - continue; + break; } } if ((cm_node->state == NES_CM_STATE_TSA) || (cm_node->state == NES_CM_STATE_CLOSED)) { free_retrans_entry(cm_node); - continue; + break; } if (!send_entry->retranscount || @@ -557,7 +557,7 @@ static void nes_cm_timer_tick(unsigned long pass) NES_CM_EVENT_ABORTED); spin_lock_irqsave(&cm_node->retrans_list_lock, flags); - continue; + break; } atomic_inc(&send_entry->skb->users); cm_packets_retrans++; @@ -583,7 +583,7 @@ static void nes_cm_timer_tick(unsigned long pass) send_entry->retrycount--; nexttimeout = jiffies + NES_SHORT_TIME; settimer = 1; - continue; + break; } else { cm_packets_sent++; } diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index d36c9a0..4fdb724 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -1695,13 +1695,8 @@ static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries, /* use 4k pbl */ nes_debug(NES_DBG_CQ, "pbl_entries=%u, use a 4k PBL\n", pbl_entries); if (nesadapter->free_4kpbl == 0) { - if (cqp_request->dynamic) { - spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); - kfree(cqp_request); - } else { - list_add_tail(&cqp_request->list, &nesdev->cqp_avail_reqs); - spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); - } + spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); + nes_free_cqp_request(nesdev, cqp_request); if (!context) pci_free_consistent(nesdev->pcidev, nescq->cq_mem_size, mem, nescq->hw_cq.cq_pbase); @@ -1717,13 +1712,8 @@ static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries, /* use 256 byte pbl */ nes_debug(NES_DBG_CQ, "pbl_entries=%u, use a 256 byte PBL\n", pbl_entries); if (nesadapter->free_256pbl == 0) { - if (cqp_request->dynamic) { - spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); - kfree(cqp_request); - } else { - list_add_tail(&cqp_request->list, &nesdev->cqp_avail_reqs); - spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); - } + spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); + nes_free_cqp_request(nesdev, cqp_request); if (!context) pci_free_consistent(nesdev->pcidev, nescq->cq_mem_size, mem, nescq->hw_cq.cq_pbase); @@ -1928,13 +1918,8 @@ static int nes_reg_mr(struct nes_device *nesdev, struct nes_pd *nespd, /* Two level PBL */ if ((pbl_count+1) > nesadapter->free_4kpbl) { nes_debug(NES_DBG_MR, "Out of 4KB Pbls for two level request.\n"); - if (cqp_request->dynamic) { - spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); - kfree(cqp_request); - } else { - list_add_tail(&cqp_request->list, &nesdev->cqp_avail_reqs); - spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); - } + spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); + nes_free_cqp_request(nesdev, cqp_request); return -ENOMEM; } else { nesadapter->free_4kpbl -= pbl_count+1; @@ -1942,13 +1927,8 @@ static int nes_reg_mr(struct nes_device *nesdev, struct nes_pd *nespd, } else if (residual_page_count > 32) { if (pbl_count > nesadapter->free_4kpbl) { nes_debug(NES_DBG_MR, "Out of 4KB Pbls.\n"); - if (cqp_request->dynamic) { - spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); - kfree(cqp_request); - } else { - list_add_tail(&cqp_request->list, &nesdev->cqp_avail_reqs); - spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); - } + spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); + nes_free_cqp_request(nesdev, cqp_request); return -ENOMEM; } else { nesadapter->free_4kpbl -= pbl_count; @@ -1956,13 +1936,8 @@ static int nes_reg_mr(struct nes_device *nesdev, struct nes_pd *nespd, } else { if (pbl_count > nesadapter->free_256pbl) { nes_debug(NES_DBG_MR, "Out of 256B Pbls.\n"); - if (cqp_request->dynamic) { - spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); - kfree(cqp_request); - } else { - list_add_tail(&cqp_request->list, &nesdev->cqp_avail_reqs); - spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); - } + spin_unlock_irqrestore(&nesadapter->pbl_lock, flags); + nes_free_cqp_request(nesdev, cqp_request); return -ENOMEM; } else { nesadapter->free_256pbl -= pbl_count; From chien.tin.tung at intel.com Fri Nov 21 12:50:41 2008 From: chien.tin.tung at intel.com (Chien Tung) Date: Fri, 21 Nov 2008 14:50:41 -0600 Subject: [ofa-general] [PATCH 02/10] RDMA/nes: Lock down connected_nodes list while processing it Message-ID: <20081121205041.GA828@ctung-MOBL> From: Faisal Latif RDMA/nes: Lock down connected_nodes list while processing it While processing connected_nodes list, we would release the lock when we need to send reset to remote partner. That created a window where the list can be modified. Change this into a two step process. Place nodes that need processing on a local list then process the local list. Signed-off-by: Faisal Latif Signed-off-by: Chien Tung -- diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 2a1d6c7..257d994 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -459,13 +459,23 @@ static void nes_cm_timer_tick(unsigned long pass) int ret = NETDEV_TX_OK; enum nes_cm_node_state last_state; + struct list_head timer_list; + INIT_LIST_HEAD(&timer_list); spin_lock_irqsave(&cm_core->ht_lock, flags); list_for_each_safe(list_node, list_core_temp, - &cm_core->connected_nodes) { + &cm_core->connected_nodes) { cm_node = container_of(list_node, struct nes_cm_node, list); - add_ref_cm_node(cm_node); - spin_unlock_irqrestore(&cm_core->ht_lock, flags); + if (!list_empty(&cm_node->recv_list) || (cm_node->send_entry)) { + add_ref_cm_node(cm_node); + list_add(&cm_node->timer_entry, &timer_list); + } + } + spin_unlock_irqrestore(&cm_core->ht_lock, flags); + + list_for_each_safe(list_node, list_core_temp, &timer_list) { + cm_node = container_of(list_node, struct nes_cm_node, + timer_entry); spin_lock_irqsave(&cm_node->recv_list_lock, flags); list_for_each_safe(list_core, list_node_temp, &cm_node->recv_list) { @@ -615,14 +625,12 @@ static void nes_cm_timer_tick(unsigned long pass) spin_unlock_irqrestore(&cm_node->retrans_list_lock, flags); rem_ref_cm_node(cm_node->cm_core, cm_node); - spin_lock_irqsave(&cm_core->ht_lock, flags); if (ret != NETDEV_TX_OK) { nes_debug(NES_DBG_CM, "rexmit failed for cm_node=%p\n", cm_node); break; } } - spin_unlock_irqrestore(&cm_core->ht_lock, flags); if (settimer) { if (!timer_pending(&cm_core->tcp_timer)) { @@ -925,28 +933,36 @@ static int mini_cm_dec_refcnt_listen(struct nes_cm_core *cm_core, struct list_head *list_pos = NULL; struct list_head *list_temp = NULL; struct nes_cm_node *cm_node = NULL; + struct list_head reset_list; nes_debug(NES_DBG_CM, "attempting listener= %p free_nodes= %d, " "refcnt=%d\n", listener, free_hanging_nodes, atomic_read(&listener->ref_count)); /* free non-accelerated child nodes for this listener */ + INIT_LIST_HEAD(&reset_list); if (free_hanging_nodes) { spin_lock_irqsave(&cm_core->ht_lock, flags); list_for_each_safe(list_pos, list_temp, - &g_cm_core->connected_nodes) { + &g_cm_core->connected_nodes) { cm_node = container_of(list_pos, struct nes_cm_node, list); if ((cm_node->listener == listener) && - (!cm_node->accelerated)) { - cleanup_retrans_entry(cm_node); - spin_unlock_irqrestore(&cm_core->ht_lock, - flags); - send_reset(cm_node, NULL); - spin_lock_irqsave(&cm_core->ht_lock, flags); + (!cm_node->accelerated)) { + add_ref_cm_node(cm_node); + list_add(&cm_node->reset_entry, &reset_list); } } spin_unlock_irqrestore(&cm_core->ht_lock, flags); } + + list_for_each_safe(list_pos, list_temp, &reset_list) { + cm_node = container_of(list_pos, struct nes_cm_node, + reset_entry); + cleanup_retrans_entry(cm_node); + send_reset(cm_node, NULL); + rem_ref_cm_node(cm_node->cm_core, cm_node); + } + spin_lock_irqsave(&cm_core->listen_list_lock, flags); if (!atomic_dec_return(&listener->ref_count)) { list_del(&listener->list); diff --git a/drivers/infiniband/hw/nes/nes_cm.h b/drivers/infiniband/hw/nes/nes_cm.h index 367b3d2..282a9cb 100644 --- a/drivers/infiniband/hw/nes/nes_cm.h +++ b/drivers/infiniband/hw/nes/nes_cm.h @@ -292,6 +292,8 @@ struct nes_cm_node { int apbvt_set; int accept_pend; int freed; + struct list_head timer_entry; + struct list_head reset_entry; struct nes_qp *nesqp; }; From chien.tin.tung at intel.com Fri Nov 21 12:50:44 2008 From: chien.tin.tung at intel.com (Chien Tung) Date: Fri, 21 Nov 2008 14:50:44 -0600 Subject: [ofa-general] [PATCH 03/10] RDMA/nes: Remove tx_free_list Message-ID: <20081121205044.GA7424@ctung-MOBL> From: Faisal Latif RDMA/nes: Remove tx_free_list There is no lock protecting tx_free_list thus causing a system crash when skb_dequeue() is called and the list is empty. Since it did not give any performance boost under heavy load, removing it to simplfy the code. Change get_free_pkt call to allocate MAX_CM_BUFFER skb for connection establishment/teardown as well as MPA request/response. Signed-off-by: Faisal Latif Signed-off-by: Chien Tung -- diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 257d994..fe07797 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -94,7 +94,7 @@ static int mini_cm_set(struct nes_cm_core *, u32, u32); static struct sk_buff *form_cm_frame(struct sk_buff *, struct nes_cm_node *, void *, u32, void *, u32, u8); -static struct sk_buff *get_free_pkt(struct nes_cm_node *cm_node); +static struct sk_buff *get_free_pkt(u32); static int add_ref_cm_node(struct nes_cm_node *); static int rem_ref_cm_node(struct nes_cm_core *, struct nes_cm_node *); @@ -356,7 +356,6 @@ static void print_core(struct nes_cm_core *core) nes_debug(NES_DBG_CM, "State : %u \n", core->state); - nes_debug(NES_DBG_CM, "Tx Free cnt : %u \n", skb_queue_len(&core->tx_free_list)); nes_debug(NES_DBG_CM, "Listen Nodes : %u \n", atomic_read(&core->listen_node_cnt)); nes_debug(NES_DBG_CM, "Active Nodes : %u \n", atomic_read(&core->node_cnt)); @@ -691,7 +690,7 @@ static int send_syn(struct nes_cm_node *cm_node, u32 sendack, optionssize += 1; if (!skb) - skb = get_free_pkt(cm_node); + skb = get_free_pkt(MAX_CM_BUFFER); if (!skb) { nes_debug(NES_DBG_CM, "Failed to get a Free pkt\n"); return -1; @@ -716,7 +715,7 @@ static int send_reset(struct nes_cm_node *cm_node, struct sk_buff *skb) int flags = SET_RST | SET_ACK; if (!skb) - skb = get_free_pkt(cm_node); + skb = get_free_pkt(MAX_CM_BUFFER); if (!skb) { nes_debug(NES_DBG_CM, "Failed to get a Free pkt\n"); return -1; @@ -737,7 +736,7 @@ static int send_ack(struct nes_cm_node *cm_node, struct sk_buff *skb) int ret; if (!skb) - skb = get_free_pkt(cm_node); + skb = get_free_pkt(MAX_CM_BUFFER); if (!skb) { nes_debug(NES_DBG_CM, "Failed to get a Free pkt\n"); @@ -760,7 +759,7 @@ static int send_fin(struct nes_cm_node *cm_node, struct sk_buff *skb) /* if we didn't get a frame get one */ if (!skb) - skb = get_free_pkt(cm_node); + skb = get_free_pkt(MAX_CM_BUFFER); if (!skb) { nes_debug(NES_DBG_CM, "Failed to get a Free pkt\n"); @@ -777,40 +776,9 @@ static int send_fin(struct nes_cm_node *cm_node, struct sk_buff *skb) /** * get_free_pkt */ -static struct sk_buff *get_free_pkt(struct nes_cm_node *cm_node) -{ - struct sk_buff *skb, *new_skb; - - /* check to see if we need to repopulate the free tx pkt queue */ - if (skb_queue_len(&cm_node->cm_core->tx_free_list) < NES_CM_FREE_PKT_LO_WATERMARK) { - while (skb_queue_len(&cm_node->cm_core->tx_free_list) < - cm_node->cm_core->free_tx_pkt_max) { - /* replace the frame we took, we won't get it back */ - new_skb = dev_alloc_skb(cm_node->cm_core->mtu); - BUG_ON(!new_skb); - /* add a replacement frame to the free tx list head */ - skb_queue_head(&cm_node->cm_core->tx_free_list, new_skb); - } - } - - skb = skb_dequeue(&cm_node->cm_core->tx_free_list); - - return skb; -} - - -/** - * make_hashkey - generate hash key from node tuple - */ -static inline int make_hashkey(u16 loc_port, nes_addr_t loc_addr, u16 rem_port, - nes_addr_t rem_addr) +static struct sk_buff *get_free_pkt(u32 pktsize) { - u32 hashkey = 0; - - hashkey = loc_addr + rem_addr + loc_port + rem_port; - hashkey = (hashkey % NES_CM_HASHTABLE_SIZE); - - return hashkey; + return dev_alloc_skb(pktsize); } @@ -821,13 +789,9 @@ static struct nes_cm_node *find_node(struct nes_cm_core *cm_core, u16 rem_port, nes_addr_t rem_addr, u16 loc_port, nes_addr_t loc_addr) { unsigned long flags; - u32 hashkey; struct list_head *hte; struct nes_cm_node *cm_node; - /* make a hash index key for this packet */ - hashkey = make_hashkey(loc_port, loc_addr, rem_port, rem_addr); - /* get a handle on the hte */ hte = &cm_core->connected_nodes; @@ -895,7 +859,6 @@ static struct nes_cm_listener *find_listener(struct nes_cm_core *cm_core, static int add_hte_node(struct nes_cm_core *cm_core, struct nes_cm_node *cm_node) { unsigned long flags; - u32 hashkey; struct list_head *hte; if (!cm_node || !cm_core) @@ -904,11 +867,6 @@ static int add_hte_node(struct nes_cm_core *cm_core, struct nes_cm_node *cm_node nes_debug(NES_DBG_CM, "Adding Node %p to Active Connection HT\n", cm_node); - /* first, make an index into our hash table */ - hashkey = make_hashkey(cm_node->loc_port, cm_node->loc_addr, - cm_node->rem_port, cm_node->rem_addr); - cm_node->hashkey = hashkey; - spin_lock_irqsave(&cm_core->ht_lock, flags); /* get a handle on the hash table element (list head for this slot) */ @@ -2151,10 +2109,7 @@ static void mini_cm_recv_pkt(struct nes_cm_core *cm_core, */ static struct nes_cm_core *nes_cm_alloc_core(void) { - int i; - struct nes_cm_core *cm_core; - struct sk_buff *skb = NULL; /* setup the CM core */ /* alloc top level core control structure */ @@ -2172,19 +2127,6 @@ static struct nes_cm_core *nes_cm_alloc_core(void) atomic_set(&cm_core->events_posted, 0); - /* init the packet lists */ - skb_queue_head_init(&cm_core->tx_free_list); - - for (i = 0; i < NES_CM_DEFAULT_FRAME_CNT; i++) { - skb = dev_alloc_skb(cm_core->mtu); - if (!skb) { - kfree(cm_core); - return NULL; - } - /* add 'raw' skb to free frame list */ - skb_queue_head(&cm_core->tx_free_list, skb); - } - cm_core->api = &nes_cm_api; spin_lock_init(&cm_core->ht_lock); diff --git a/drivers/infiniband/hw/nes/nes_cm.h b/drivers/infiniband/hw/nes/nes_cm.h index 282a9cb..89d80fc 100644 --- a/drivers/infiniband/hw/nes/nes_cm.h +++ b/drivers/infiniband/hw/nes/nes_cm.h @@ -161,6 +161,8 @@ struct nes_timer_entry { #define NES_CM_DEF_SEQ2 0x18ed5740 #define NES_CM_DEF_LOCAL_ID2 0xb807 +#define MAX_CM_BUFFER 512 + typedef u32 nes_addr_t; @@ -254,8 +256,6 @@ struct nes_cm_listener { /* per connection node and node state information */ struct nes_cm_node { - u32 hashkey; - nes_addr_t loc_addr, rem_addr; u16 loc_port, rem_port; @@ -352,7 +352,6 @@ struct nes_cm_core { u32 mtu; u32 free_tx_pkt_max; u32 rx_pkt_posted; - struct sk_buff_head tx_free_list; atomic_t ht_node_cnt; struct list_head connected_nodes; /* struct list_head hashtable[NES_CM_HASHTABLE_SIZE]; */ From chien.tin.tung at intel.com Fri Nov 21 12:50:46 2008 From: chien.tin.tung at intel.com (Chien Tung) Date: Fri, 21 Nov 2008 14:50:46 -0600 Subject: [ofa-general] [PATCH 04/10] RDMA/nes: Avoid race condition between MPA request and reset event to rdma_cm Message-ID: <20081121205046.GA5428@ctung-MOBL> From: Faisal Latif RDMA/nes: Avoid race condition between MPA request and reset event to rdma_cm In passive open after indicating MPA request to rdma_cm, an incoming RST would fire a reset event to rdma_cm causing it to crash since the current state is not connected. The solution is to wait for nes_accept() or nes_reject() before firing the reset event. If nes_accept() or nes_reject() is already done, then the reset event will be fired when RST is processed. Signed-off-by: Faisal Latif Signed-off-by: Chien Tung -- diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index fe07797..01fd309 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -1318,6 +1318,7 @@ static void handle_rst_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb, { int reset = 0; /* whether to send reset in case of err.. */ + int passive_state; atomic_inc(&cm_resets_recvd); nes_debug(NES_DBG_CM, "Received Reset, cm_node = %p, state = %u." " refcnt=%d\n", cm_node, cm_node->state, @@ -1331,7 +1332,14 @@ static void handle_rst_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb, cm_node->listener, cm_node->state); active_open_err(cm_node, skb, reset); break; - /* For PASSIVE open states, remove the cm_node event */ + case NES_CM_STATE_MPAREQ_RCVD: + passive_state = atomic_add_return(1, &cm_node->passive_state); + if (passive_state == NES_SEND_RESET_EVENT) + create_event(cm_node, NES_CM_EVENT_RESET); + cleanup_retrans_entry(cm_node); + cm_node->state = NES_CM_STATE_CLOSED; + dev_kfree_skb_any(skb); + break; case NES_CM_STATE_ESTABLISHED: case NES_CM_STATE_SYN_RCVD: case NES_CM_STATE_LISTENING: @@ -1339,7 +1347,14 @@ static void handle_rst_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb, passive_open_err(cm_node, skb, reset); break; case NES_CM_STATE_TSA: + active_open_err(cm_node, skb, reset); + break; + case NES_CM_STATE_CLOSED: + cleanup_retrans_entry(cm_node); + drop_packet(skb); + break; default: + drop_packet(skb); break; } } @@ -1368,6 +1383,9 @@ static void handle_rcv_mpa(struct nes_cm_node *cm_node, struct sk_buff *skb, dev_kfree_skb_any(skb); if (type == NES_CM_EVENT_CONNECTED) cm_node->state = NES_CM_STATE_TSA; + else + atomic_set(&cm_node->passive_state, + NES_PASSIVE_STATE_INDICATED); create_event(cm_node, type); } @@ -1944,6 +1962,7 @@ static int mini_cm_reject(struct nes_cm_core *cm_core, struct ietf_mpa_frame *mpa_frame, struct nes_cm_node *cm_node) { int ret = 0; + int passive_state; nes_debug(NES_DBG_CM, "%s cm_node=%p type=%d state=%d\n", __func__, cm_node, cm_node->tcp_cntxt.client, cm_node->state); @@ -1951,9 +1970,13 @@ static int mini_cm_reject(struct nes_cm_core *cm_core, if (cm_node->tcp_cntxt.client) return ret; cleanup_retrans_entry(cm_node); - cm_node->state = NES_CM_STATE_CLOSED; - ret = send_reset(cm_node, NULL); + passive_state = atomic_add_return(1, &cm_node->passive_state); + cm_node->state = NES_CM_STATE_CLOSED; + if (passive_state == NES_SEND_RESET_EVENT) + rem_ref_cm_node(cm_core, cm_node); + else + ret = send_reset(cm_node, NULL); return ret; } @@ -2355,7 +2378,6 @@ static int nes_cm_disconn_true(struct nes_qp *nesqp) atomic_inc(&cm_disconnects); cm_event.event = IW_CM_EVENT_DISCONNECT; if (last_ae == NES_AEQE_AEID_LLP_CONNECTION_RESET) { - issued_disconnect_reset = 1; cm_event.status = IW_CM_EVENT_STATUS_RESET; nes_debug(NES_DBG_CM, "Generating a CM " "Disconnect Event (status reset) for " @@ -2505,6 +2527,7 @@ int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) struct nes_v4_quad nes_quad; u32 crc_value; int ret; + int passive_state; ibqp = nes_get_qp(cm_id->device, conn_param->qpn); if (!ibqp) @@ -2672,8 +2695,6 @@ int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) conn_param->private_data_len + sizeof(struct ietf_mpa_frame)); - attr.qp_state = IB_QPS_RTS; - nes_modify_qp(&nesqp->ibqp, &attr, IB_QP_STATE, NULL); /* notify OF layer that accept event was successfull */ cm_id->add_ref(cm_id); @@ -2686,6 +2707,8 @@ int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) cm_event.private_data = NULL; cm_event.private_data_len = 0; ret = cm_id->event_handler(cm_id, &cm_event); + attr.qp_state = IB_QPS_RTS; + nes_modify_qp(&nesqp->ibqp, &attr, IB_QP_STATE, NULL); if (cm_node->loopbackpartner) { cm_node->loopbackpartner->mpa_frame_size = nesqp->private_data_len; @@ -2698,6 +2721,9 @@ int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) printk(KERN_ERR "%s[%u] OFA CM event_handler returned, " "ret=%d\n", __func__, __LINE__, ret); + passive_state = atomic_add_return(1, &cm_node->passive_state); + if (passive_state == NES_SEND_RESET_EVENT) + create_event(cm_node, NES_CM_EVENT_RESET); return 0; } @@ -3180,6 +3206,18 @@ static void cm_event_reset(struct nes_cm_event *event) cm_event.private_data_len = 0; ret = cm_id->event_handler(cm_id, &cm_event); + cm_id->add_ref(cm_id); + atomic_inc(&cm_closes); + cm_event.event = IW_CM_EVENT_CLOSE; + cm_event.status = IW_CM_EVENT_STATUS_OK; + cm_event.provider_data = cm_id->provider_data; + cm_event.local_addr = cm_id->local_addr; + cm_event.remote_addr = cm_id->remote_addr; + cm_event.private_data = NULL; + cm_event.private_data_len = 0; + nes_debug(NES_DBG_CM, "NODE %p Generating CLOSE\n", event->cm_node); + ret = cm_id->event_handler(cm_id, &cm_event); + nes_debug(NES_DBG_CM, "OFA CM event_handler returned, ret=%d\n", ret); diff --git a/drivers/infiniband/hw/nes/nes_cm.h b/drivers/infiniband/hw/nes/nes_cm.h index 89d80fc..6f01095 100644 --- a/drivers/infiniband/hw/nes/nes_cm.h +++ b/drivers/infiniband/hw/nes/nes_cm.h @@ -76,6 +76,10 @@ enum nes_timer_type { NES_TIMER_TYPE_CLOSE, }; +#define NES_PASSIVE_STATE_INDICATED 0 +#define NES_DO_NOT_SEND_RESET_EVENT 1 +#define NES_SEND_RESET_EVENT 2 + #define MAX_NES_IFS 4 #define SET_ACK 1 @@ -295,6 +299,7 @@ struct nes_cm_node { struct list_head timer_entry; struct list_head reset_entry; struct nes_qp *nesqp; + atomic_t passive_state; }; /* structure for client or CM to fill when making CM api calls. */ From chien.tin.tung at intel.com Fri Nov 21 12:50:49 2008 From: chien.tin.tung at intel.com (Chien Tung) Date: Fri, 21 Nov 2008 14:50:49 -0600 Subject: [ofa-general] [PATCH 05/10] RDMA/nes: Forward packets for a new connection with stale APBVT entry Message-ID: <20081121205049.GA6388@ctung-MOBL> From: Faisal Latif RDMA/nes: Forward packets for a new connection with stale APBVT entry Under heavy traffic, there is a small windows when an APBVT entry is not yet removed and a new connection is established. Packets for the new connection are dropped until APBVT entry is removed. This patch will forward the packets instead of dropping them. Signed-off-by: Faisal Latif Signed-off-by: Chien Tung -- diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 01fd309..fd2dba7 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -86,7 +86,7 @@ static int mini_cm_accept(struct nes_cm_core *, struct ietf_mpa_frame *, struct nes_cm_node *); static int mini_cm_reject(struct nes_cm_core *, struct ietf_mpa_frame *, struct nes_cm_node *); -static void mini_cm_recv_pkt(struct nes_cm_core *, struct nes_vnic *, +static int mini_cm_recv_pkt(struct nes_cm_core *, struct nes_vnic *, struct sk_buff *); static int mini_cm_dealloc_core(struct nes_cm_core *); static int mini_cm_get(struct nes_cm_core *); @@ -2034,7 +2034,7 @@ static int mini_cm_close(struct nes_cm_core *cm_core, struct nes_cm_node *cm_nod * recv_pkt - recv an ETHERNET packet, and process it through CM * node state machine */ -static void mini_cm_recv_pkt(struct nes_cm_core *cm_core, +static int mini_cm_recv_pkt(struct nes_cm_core *cm_core, struct nes_vnic *nesvnic, struct sk_buff *skb) { struct nes_cm_node *cm_node = NULL; @@ -2042,23 +2042,16 @@ static void mini_cm_recv_pkt(struct nes_cm_core *cm_core, struct iphdr *iph; struct tcphdr *tcph; struct nes_cm_info nfo; + int skb_handled = 1; if (!skb) - return; + return 0; if (skb->len < sizeof(struct iphdr) + sizeof(struct tcphdr)) { - dev_kfree_skb_any(skb); - return; + return 0; } iph = (struct iphdr *)skb->data; tcph = (struct tcphdr *)(skb->data + sizeof(struct iphdr)); - skb_reset_network_header(skb); - skb_set_transport_header(skb, sizeof(*tcph)); - if (!tcph) { - dev_kfree_skb_any(skb); - return; - } - skb->len = ntohs(iph->tot_len); nfo.loc_addr = ntohl(iph->daddr); nfo.loc_port = ntohs(tcph->dest); @@ -2079,23 +2072,21 @@ static void mini_cm_recv_pkt(struct nes_cm_core *cm_core, /* Only type of packet accepted are for */ /* the PASSIVE open (syn only) */ if ((!tcph->syn) || (tcph->ack)) { - cm_packets_dropped++; + skb_handled = 0; break; } listener = find_listener(cm_core, nfo.loc_addr, nfo.loc_port, NES_CM_LISTENER_ACTIVE_STATE); - if (listener) { - nfo.cm_id = listener->cm_id; - nfo.conn_type = listener->conn_type; - } else { - nes_debug(NES_DBG_CM, "Unable to find listener " - "for the pkt\n"); - cm_packets_dropped++; - dev_kfree_skb_any(skb); + if (!listener) { + nfo.cm_id = NULL; + nfo.conn_type = 0; + nes_debug(NES_DBG_CM, "Unable to find listener for the pkt\n"); + skb_handled = 0; break; } - + nfo.cm_id = listener->cm_id; + nfo.conn_type = listener->conn_type; cm_node = make_cm_node(cm_core, nesvnic, &nfo, listener); if (!cm_node) { @@ -2121,9 +2112,13 @@ static void mini_cm_recv_pkt(struct nes_cm_core *cm_core, dev_kfree_skb_any(skb); break; } + skb_reset_network_header(skb); + skb_set_transport_header(skb, sizeof(*tcph)); + skb->len = ntohs(iph->tot_len); process_packet(cm_node, skb, cm_core); rem_ref_cm_node(cm_core, cm_node); } while (0); + return skb_handled; } @@ -2927,15 +2922,16 @@ int nes_destroy_listen(struct iw_cm_id *cm_id) */ int nes_cm_recv(struct sk_buff *skb, struct net_device *netdevice) { + int rc = 0; cm_packets_received++; if ((g_cm_core) && (g_cm_core->api)) { - g_cm_core->api->recv_pkt(g_cm_core, netdev_priv(netdevice), skb); + rc = g_cm_core->api->recv_pkt(g_cm_core, netdev_priv(netdevice), skb); } else { nes_debug(NES_DBG_CM, "Unable to process packet for CM," " cm is not setup properly.\n"); } - return 0; + return rc; } diff --git a/drivers/infiniband/hw/nes/nes_cm.h b/drivers/infiniband/hw/nes/nes_cm.h index 6f01095..fafa350 100644 --- a/drivers/infiniband/hw/nes/nes_cm.h +++ b/drivers/infiniband/hw/nes/nes_cm.h @@ -396,7 +396,7 @@ struct nes_cm_ops { struct nes_cm_node *); int (*reject)(struct nes_cm_core *, struct ietf_mpa_frame *, struct nes_cm_node *); - void (*recv_pkt)(struct nes_cm_core *, struct nes_vnic *, + int (*recv_pkt)(struct nes_cm_core *, struct nes_vnic *, struct sk_buff *); int (*destroy_cm_core)(struct nes_cm_core *); int (*get)(struct nes_cm_core *); diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 7c49cc8..8f70ff2 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -2700,27 +2700,33 @@ void nes_nic_ce_handler(struct nes_device *nesdev, struct nes_hw_nic_cq *cq) pkt_type, (pkt_type & NES_PKT_TYPE_APBVT_MASK)); */ if ((pkt_type & NES_PKT_TYPE_APBVT_MASK) == NES_PKT_TYPE_APBVT_BITS) { - nes_cm_recv(rx_skb, nesvnic->netdev); + if (nes_cm_recv(rx_skb, nesvnic->netdev)) + rx_skb = NULL; + } + if (rx_skb == NULL) + goto skip_rx_indicate0; + + + if ((cqe_misc & NES_NIC_CQE_TAG_VALID) && + (nesvnic->vlan_grp != NULL)) { + vlan_tag = (u16)(le32_to_cpu( + cq->cq_vbase[head].cqe_words[NES_NIC_CQE_TAG_PKT_TYPE_IDX]) + >> 16); + nes_debug(NES_DBG_CQ, "%s: Reporting stripped VLAN packet. Tag = 0x%04X\n", + nesvnic->netdev->name, vlan_tag); + if (nes_use_lro) + lro_vlan_hwaccel_receive_skb(&nesvnic->lro_mgr, rx_skb, + nesvnic->vlan_grp, vlan_tag, NULL); + else + nes_vlan_rx(rx_skb, nesvnic->vlan_grp, vlan_tag); } else { - if ((cqe_misc & NES_NIC_CQE_TAG_VALID) && (nesvnic->vlan_grp != NULL)) { - vlan_tag = (u16)(le32_to_cpu( - cq->cq_vbase[head].cqe_words[NES_NIC_CQE_TAG_PKT_TYPE_IDX]) - >> 16); - nes_debug(NES_DBG_CQ, "%s: Reporting stripped VLAN packet. Tag = 0x%04X\n", - nesvnic->netdev->name, vlan_tag); - if (nes_use_lro) - lro_vlan_hwaccel_receive_skb(&nesvnic->lro_mgr, rx_skb, - nesvnic->vlan_grp, vlan_tag, NULL); - else - nes_vlan_rx(rx_skb, nesvnic->vlan_grp, vlan_tag); - } else { - if (nes_use_lro) - lro_receive_skb(&nesvnic->lro_mgr, rx_skb, NULL); - else - nes_netif_rx(rx_skb); - } + if (nes_use_lro) + lro_receive_skb(&nesvnic->lro_mgr, rx_skb, NULL); + else + nes_netif_rx(rx_skb); } +skip_rx_indicate0: nesvnic->netdev->last_rx = jiffies; /* nesvnic->netstats.rx_packets++; */ /* nesvnic->netstats.rx_bytes += rx_pkt_size; */ From chien.tin.tung at intel.com Fri Nov 21 12:50:52 2008 From: chien.tin.tung at intel.com (Chien Tung) Date: Fri, 21 Nov 2008 14:50:52 -0600 Subject: [ofa-general] [PATCH 06/10] RDMA/nes: Fix TCP complaiance test failures Message-ID: <20081121205052.GA6468@ctung-MOBL> From: Faisal Latif RDMA/nes: Fix TCP complaiance test failures >From ANVL testing, we are not handling all cm_node states during connection establishment. Add missing state handlers. Fixed sequence number Send reset in handle_tcp_options() Signed-off-by: Faisal Latif Signed-off-by: Chien Tung -- diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index fd2dba7..cc10da1 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -1466,7 +1466,7 @@ static void handle_syn_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb, int optionsize; optionsize = (tcph->doff << 2) - sizeof(struct tcphdr); - skb_pull(skb, tcph->doff << 2); + skb_trim(skb, 0); inc_sequence = ntohl(tcph->seq); switch (cm_node->state) { @@ -1499,6 +1499,10 @@ static void handle_syn_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb, cm_node->state = NES_CM_STATE_SYN_RCVD; send_syn(cm_node, 1, skb); break; + case NES_CM_STATE_CLOSED: + cleanup_retrans_entry(cm_node); + send_reset(cm_node, skb); + break; case NES_CM_STATE_TSA: case NES_CM_STATE_ESTABLISHED: case NES_CM_STATE_FIN_WAIT1: @@ -1507,7 +1511,6 @@ static void handle_syn_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb, case NES_CM_STATE_LAST_ACK: case NES_CM_STATE_CLOSING: case NES_CM_STATE_UNKNOWN: - case NES_CM_STATE_CLOSED: default: drop_packet(skb); break; @@ -1523,7 +1526,7 @@ static void handle_synack_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb, int optionsize; optionsize = (tcph->doff << 2) - sizeof(struct tcphdr); - skb_pull(skb, tcph->doff << 2); + skb_trim(skb, 0); inc_sequence = ntohl(tcph->seq); switch (cm_node->state) { case NES_CM_STATE_SYN_SENT: @@ -1547,6 +1550,12 @@ static void handle_synack_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb, /* passive open, so should not be here */ passive_open_err(cm_node, skb, 1); break; + case NES_CM_STATE_LISTENING: + case NES_CM_STATE_CLOSED: + cm_node->tcp_cntxt.loc_seq_num = ntohl(tcph->ack_seq); + cleanup_retrans_entry(cm_node); + send_reset(cm_node, skb); + break; case NES_CM_STATE_ESTABLISHED: case NES_CM_STATE_FIN_WAIT1: case NES_CM_STATE_FIN_WAIT2: @@ -1554,7 +1563,6 @@ static void handle_synack_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb, case NES_CM_STATE_TSA: case NES_CM_STATE_CLOSING: case NES_CM_STATE_UNKNOWN: - case NES_CM_STATE_CLOSED: case NES_CM_STATE_MPAREQ_SENT: default: drop_packet(skb); @@ -1569,6 +1577,13 @@ static void handle_ack_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb, u32 inc_sequence; u32 rem_seq_ack; u32 rem_seq; + int ret; + int optionsize; + u32 temp_seq = cm_node->tcp_cntxt.loc_seq_num; + + optionsize = (tcph->doff << 2) - sizeof(struct tcphdr); + cm_node->tcp_cntxt.loc_seq_num = ntohl(tcph->ack_seq); + if (check_seq(cm_node, tcph, skb)) return; @@ -1581,7 +1596,18 @@ static void handle_ack_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb, switch (cm_node->state) { case NES_CM_STATE_SYN_RCVD: /* Passive OPEN */ + ret = handle_tcp_options(cm_node, tcph, skb, optionsize, 1); + if (ret) + break; cm_node->tcp_cntxt.rem_ack_num = ntohl(tcph->ack_seq); + cm_node->tcp_cntxt.loc_seq_num = temp_seq; + if (cm_node->tcp_cntxt.rem_ack_num != + cm_node->tcp_cntxt.loc_seq_num) { + nes_debug(NES_DBG_CM, "rem_ack_num != loc_seq_num\n"); + cleanup_retrans_entry(cm_node); + send_reset(cm_node, skb); + return; + } cm_node->state = NES_CM_STATE_ESTABLISHED; if (datasize) { cm_node->tcp_cntxt.rcv_nxt = inc_sequence + datasize; @@ -1613,11 +1639,15 @@ static void handle_ack_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb, dev_kfree_skb_any(skb); } break; + case NES_CM_STATE_LISTENING: + case NES_CM_STATE_CLOSED: + cleanup_retrans_entry(cm_node); + send_reset(cm_node, skb); + break; case NES_CM_STATE_FIN_WAIT1: case NES_CM_STATE_SYN_SENT: case NES_CM_STATE_FIN_WAIT2: case NES_CM_STATE_TSA: - case NES_CM_STATE_CLOSED: case NES_CM_STATE_MPAREQ_RCVD: case NES_CM_STATE_LAST_ACK: case NES_CM_STATE_CLOSING: @@ -1640,9 +1670,9 @@ static int handle_tcp_options(struct nes_cm_node *cm_node, struct tcphdr *tcph, nes_debug(NES_DBG_CM, "%s: Node %p, Sending RESET\n", __func__, cm_node); if (passive) - passive_open_err(cm_node, skb, 0); + passive_open_err(cm_node, skb, 1); else - active_open_err(cm_node, skb, 0); + active_open_err(cm_node, skb, 1); return 1; } } From chien.tin.tung at intel.com Fri Nov 21 12:50:55 2008 From: chien.tin.tung at intel.com (Chien Tung) Date: Fri, 21 Nov 2008 14:50:55 -0600 Subject: [ofa-general] [PATCH 07/10] RDMA/nes: Check cqp_avail_reqs is empty after locking the list Message-ID: <20081121205055.GA4888@ctung-MOBL> From: Faisal Latif RDMA/nes: Check cqp_avail_reqs is empty after locking the list Between the first empty list check and locking the list, the list can change. Check it again after it is locked to make sure the list is not empty. Signed-off-by: Faisal Latif Signed-off-by: Chien Tung -- diff --git a/drivers/infiniband/hw/nes/nes_utils.c b/drivers/infiniband/hw/nes/nes_utils.c index fb8cbd7..5611a73 100644 --- a/drivers/infiniband/hw/nes/nes_utils.c +++ b/drivers/infiniband/hw/nes/nes_utils.c @@ -540,11 +540,14 @@ struct nes_cqp_request *nes_get_cqp_request(struct nes_device *nesdev) if (!list_empty(&nesdev->cqp_avail_reqs)) { spin_lock_irqsave(&nesdev->cqp.lock, flags); - cqp_request = list_entry(nesdev->cqp_avail_reqs.next, + if (!list_empty(&nesdev->cqp_avail_reqs)) { + cqp_request = list_entry(nesdev->cqp_avail_reqs.next, struct nes_cqp_request, list); - list_del_init(&cqp_request->list); + list_del_init(&cqp_request->list); + } spin_unlock_irqrestore(&nesdev->cqp.lock, flags); - } else { + } + if (cqp_request == NULL) { cqp_request = kzalloc(sizeof(struct nes_cqp_request), GFP_KERNEL); if (cqp_request) { cqp_request->dynamic = 1; From chien.tin.tung at intel.com Fri Nov 21 12:50:58 2008 From: chien.tin.tung at intel.com (Chien Tung) Date: Fri, 21 Nov 2008 14:50:58 -0600 Subject: [ofa-general] [PATCH 08/10] RDMA/nes: Change accept_pend_cnt to atomic Message-ID: <20081121205058.GA8184@ctung-MOBL> From: Faisal Latif RDMA/nes: Change accept_pend_cnt to atomic There is a race condition on accept_pend_cnt. Change it to atomic. Signed-off-by: Faisal Latif Signed-off-by: Chien Tung -- diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index cc10da1..0025a7e 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -976,7 +976,7 @@ static inline int mini_cm_accelerated(struct nes_cm_core *cm_core, u32 was_timer_set; cm_node->accelerated = 1; - if (cm_node->accept_pend) { + if (atomic_dec_and_test(&cm_node->accept_pend)) { BUG_ON(!cm_node->listener); atomic_dec(&cm_node->listener->pend_accepts_cnt); BUG_ON(atomic_read(&cm_node->listener->pend_accepts_cnt) < 0); @@ -1091,7 +1091,7 @@ static struct nes_cm_node *make_cm_node(struct nes_cm_core *cm_core, atomic_inc(&cm_core->node_cnt); cm_node->conn_type = cm_info->conn_type; cm_node->apbvt_set = 0; - cm_node->accept_pend = 0; + atomic_set(&cm_node->accept_pend, 0); cm_node->nesvnic = nesvnic; /* get some device handles, for arp lookup */ @@ -1156,7 +1156,7 @@ static int rem_ref_cm_node(struct nes_cm_core *cm_core, spin_unlock_irqrestore(&cm_node->cm_core->ht_lock, flags); /* if the node is destroyed before connection was accelerated */ - if (!cm_node->accelerated && cm_node->accept_pend) { + if (!cm_node->accelerated && atomic_read(&cm_node->accept_pend)) { BUG_ON(!cm_node->listener); atomic_dec(&cm_node->listener->pend_accepts_cnt); BUG_ON(atomic_read(&cm_node->listener->pend_accepts_cnt) < 0); @@ -1477,25 +1477,25 @@ static void handle_syn_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb, break; case NES_CM_STATE_LISTENING: /* Passive OPEN */ - cm_node->accept_pend = 1; - atomic_inc(&cm_node->listener->pend_accepts_cnt); if (atomic_read(&cm_node->listener->pend_accepts_cnt) > cm_node->listener->backlog) { nes_debug(NES_DBG_CM, "drop syn due to backlog " "pressure \n"); cm_backlog_drops++; - passive_open_err(cm_node, skb, 0); + rem_ref_cm_node(cm_node->cm_core, cm_node); + dev_kfree_skb_any(skb); break; } ret = handle_tcp_options(cm_node, tcph, skb, optionsize, 1); if (ret) { - passive_open_err(cm_node, skb, 0); - /* drop pkt */ break; } cm_node->tcp_cntxt.rcv_nxt = inc_sequence + 1; BUG_ON(cm_node->send_entry); + atomic_set(&cm_node->accept_pend, 1); + atomic_inc(&cm_node->listener->pend_accepts_cnt); + cm_node->state = NES_CM_STATE_SYN_RCVD; send_syn(cm_node, 1, skb); break; diff --git a/drivers/infiniband/hw/nes/nes_cm.h b/drivers/infiniband/hw/nes/nes_cm.h index fafa350..7600365 100644 --- a/drivers/infiniband/hw/nes/nes_cm.h +++ b/drivers/infiniband/hw/nes/nes_cm.h @@ -294,7 +294,7 @@ struct nes_cm_node { enum nes_cm_conn_type conn_type; struct nes_vnic *nesvnic; int apbvt_set; - int accept_pend; + atomic_t accept_pend; int freed; struct list_head timer_entry; struct list_head reset_entry; From chien.tin.tung at intel.com Fri Nov 21 12:51:01 2008 From: chien.tin.tung at intel.com (Chien Tung) Date: Fri, 21 Nov 2008 14:51:01 -0600 Subject: [ofa-general] [PATCH 09/10] RDMA/nes: Cleanup warnings Message-ID: <20081121205101.GA1492@ctung-MOBL> From: Chien Tung RDMA/nes: Cleanup warnings Wrapped NES_DEBUG and assert macros with do while (0) to avoid ambiguous else. No one is using sk_buff * returned from form_cm_frame, take it out. drop_packet() should not be incrementing reset counter on receiving a FIN. Signed-off-by: Chien Tung -- diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h index 1595dc7..13a5bb1 100644 --- a/drivers/infiniband/hw/nes/nes.h +++ b/drivers/infiniband/hw/nes/nes.h @@ -137,14 +137,18 @@ #ifdef CONFIG_INFINIBAND_NES_DEBUG #define nes_debug(level, fmt, args...) \ +do { \ if (level & nes_debug_level) \ - printk(KERN_ERR PFX "%s[%u]: " fmt, __func__, __LINE__, ##args) - -#define assert(expr) \ -if (!(expr)) { \ - printk(KERN_ERR PFX "Assertion failed! %s, %s, %s, line %d\n", \ - #expr, __FILE__, __func__, __LINE__); \ -} + printk(KERN_ERR PFX "%s[%u]: " fmt, __func__, __LINE__, ##args); \ +} while (0) + +#define assert(expr) \ +do { \ + if (!(expr)) { \ + printk(KERN_ERR PFX "Assertion failed! %s, %s, %s, line %d\n", \ + #expr, __FILE__, __func__, __LINE__); \ + } \ +} while (0) #define NES_EVENT_TIMEOUT 1200000 #else diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 0025a7e..24855ec 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -92,7 +92,7 @@ static int mini_cm_dealloc_core(struct nes_cm_core *); static int mini_cm_get(struct nes_cm_core *); static int mini_cm_set(struct nes_cm_core *, u32, u32); -static struct sk_buff *form_cm_frame(struct sk_buff *, struct nes_cm_node *, +static void form_cm_frame(struct sk_buff *, struct nes_cm_node *, void *, u32, void *, u32, u8); static struct sk_buff *get_free_pkt(u32); static int add_ref_cm_node(struct nes_cm_node *); @@ -251,7 +251,7 @@ static int parse_mpa(struct nes_cm_node *cm_node, u8 *buffer, u32 len) * form_cm_frame - get a free packet and build empty frame Use * node info to build. */ -static struct sk_buff *form_cm_frame(struct sk_buff *skb, +static void form_cm_frame(struct sk_buff *skb, struct nes_cm_node *cm_node, void *options, u32 optionsize, void *data, u32 datasize, u8 flags) { @@ -339,7 +339,6 @@ static struct sk_buff *form_cm_frame(struct sk_buff *skb, skb_shinfo(skb)->nr_frags = 0; cm_packets_created++; - return skb; } @@ -380,8 +379,6 @@ int schedule_nes_timer(struct nes_cm_node *cm_node, struct sk_buff *skb, int ret = 0; u32 was_timer_set; - if (!cm_node) - return -EINVAL; new_send = kzalloc(sizeof(*new_send), GFP_ATOMIC); if (!new_send) return -1; @@ -1280,7 +1277,6 @@ static void drop_packet(struct sk_buff *skb) static void handle_fin_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb, struct tcphdr *tcph) { - atomic_inc(&cm_resets_recvd); nes_debug(NES_DBG_CM, "Received FIN, cm_node = %p, state = %u. " "refcnt=%d\n", cm_node, cm_node->state, atomic_read(&cm_node->ref_count)); From chien.tin.tung at intel.com Fri Nov 21 12:51:04 2008 From: chien.tin.tung at intel.com (Chien Tung) Date: Fri, 21 Nov 2008 14:51:04 -0600 Subject: [ofa-general] [PATCH 10/10] RDMA/nes: Add loopback check to make_cm_node() Message-ID: <20081121205104.GA3720@ctung-MOBL> From: Chien Tung RDMA/nes: Add loopback check to make_cm_node() Check for loopback connection in make_cm_node() Signed-off-by: Chien Tung -- diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 24855ec..9cbea51 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -1097,7 +1097,10 @@ static struct nes_cm_node *make_cm_node(struct nes_cm_core *cm_core, cm_node->loopbackpartner = NULL; /* get the mac addr for the remote node */ - arpindex = nes_arp_table(nesdev, cm_node->rem_addr, NULL, NES_ARP_RESOLVE); + if (ipv4_is_loopback(htonl(cm_node->rem_addr))) + arpindex = nes_arp_table(nesdev, ntohl(nesvnic->local_ipaddr), NULL, NES_ARP_RESOLVE); + else + arpindex = nes_arp_table(nesdev, cm_node->rem_addr, NULL, NES_ARP_RESOLVE); if (arpindex < 0) { arpindex = nes_addr_resolve_neigh(nesvnic, cm_info->rem_addr); if (arpindex < 0) { From jackm at dev.mellanox.co.il Fri Nov 21 13:11:23 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Fri, 21 Nov 2008 23:11:23 +0200 Subject: [ofa-general] Re: Race condition in userspace libraries with create/destroy qp In-Reply-To: References: <200811201211.46527.jackm@dev.mellanox.co.il> <200811210902.03040.jackm@dev.mellanox.co.il> Message-ID: <200811212311.23665.jackm@dev.mellanox.co.il> On Friday 21 November 2008 20:44, Roland Dreier wrote: > expanding the region where > qp_table_mutex is held to avoid this bug makes perfect sense to me and > seems like a clean solution. > > - R. I'll send a patch on Sunday. From jackm at dev.mellanox.co.il Sat Nov 22 01:53:34 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sat, 22 Nov 2008 11:53:34 +0200 Subject: [ofa-general] [PATCH 0 of 2] Fix race condition in userspace libraries in create/destroy qp Message-ID: <200811221153.36089.jackm@dev.mellanox.co.il> The two patches in this series fix a race condition between create_qp and destroy_qp which results in a newly-created QP not being found by xxx_find_qp during CQ polling. The low-level create_qp and destroy_qp functions are not atomic WRT each other. If one thread is destroying a QP while another is creating a qp, there is a race hole. The destroying thread can lose its timesice after it has deleted the QP from kernel space, but before it has cleared it from userspace store (xxx_clear_qp). If the other thread creates a qp during this break, it gets the same QP base number and overwrites the destroyed QPs entry with xxx_store_qp(). When destroy_qp then deletes the qp number from the userspace store it deletes the newly-created qp number, resulting in that QP not being found in poll_cq. This patch series fixes Bugzilla 1389 for the libmlx4 and libmthca libraries. - Jack From jackm at dev.mellanox.co.il Sat Nov 22 01:53:48 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sat, 22 Nov 2008 11:53:48 +0200 Subject: [ofa-general] [PATCH 1 of 2] libmlx4: Fix race condition in create/destroy QP Message-ID: <200811221153.49156.jackm@dev.mellanox.co.il> Index: libmlx4/src/qp.c =================================================================== --- libmlx4.orig/src/qp.c 2008-11-20 11:46:58.000000000 +0200 +++ libmlx4/src/qp.c 2008-11-22 09:44:13.000000000 +0200 @@ -667,37 +667,25 @@ struct mlx4_qp *mlx4_find_qp(struct mlx4 int mlx4_store_qp(struct mlx4_context *ctx, uint32_t qpn, struct mlx4_qp *qp) { int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift; - int ret = 0; - - pthread_mutex_lock(&ctx->qp_table_mutex); if (!ctx->qp_table[tind].refcnt) { ctx->qp_table[tind].table = calloc(ctx->qp_table_mask + 1, sizeof (struct mlx4_qp *)); - if (!ctx->qp_table[tind].table) { - ret = -1; - goto out; - } + if (!ctx->qp_table[tind].table) + return -1; } ++ctx->qp_table[tind].refcnt; ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = qp; - -out: - pthread_mutex_unlock(&ctx->qp_table_mutex); - return ret; + return 0; } void mlx4_clear_qp(struct mlx4_context *ctx, uint32_t qpn) { int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift; - pthread_mutex_lock(&ctx->qp_table_mutex); - if (!--ctx->qp_table[tind].refcnt) free(ctx->qp_table[tind].table); else ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = NULL; - - pthread_mutex_unlock(&ctx->qp_table_mutex); } Index: libmlx4/src/verbs.c =================================================================== --- libmlx4.orig/src/verbs.c 2008-11-20 11:46:58.000000000 +0200 +++ libmlx4/src/verbs.c 2008-11-22 11:05:44.000000000 +0200 @@ -452,6 +452,8 @@ struct ibv_qp *mlx4_create_qp(struct ibv cmd.sq_no_prefetch = 0; /* OK for ABI 2: just a reserved field */ memset(cmd.reserved, 0, sizeof cmd.reserved); + pthread_mutex_lock(&to_mctx(pd->context)->qp_table_mutex); + ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr, &cmd.ibv_cmd, sizeof cmd, &resp, sizeof resp); if (ret) @@ -460,6 +462,7 @@ struct ibv_qp *mlx4_create_qp(struct ibv ret = mlx4_store_qp(to_mctx(pd->context), qp->ibv_qp.qp_num, qp); if (ret) goto err_destroy; + pthread_mutex_unlock(&to_mctx(pd->context)->qp_table_mutex); qp->rq.wqe_cnt = qp->rq.max_post = attr->cap.max_recv_wr; qp->rq.max_gs = attr->cap.max_recv_sge; @@ -477,6 +480,7 @@ err_destroy: ibv_cmd_destroy_qp(&qp->ibv_qp); err_rq_db: + pthread_mutex_unlock(&to_mctx(pd->context)->qp_table_mutex); if (!attr->srq) mlx4_free_db(to_mctx(pd->context), MLX4_DB_TYPE_RQ, qp->db); @@ -580,9 +584,12 @@ int mlx4_destroy_qp(struct ibv_qp *ibqp) struct mlx4_qp *qp = to_mqp(ibqp); int ret; + pthread_mutex_lock(&to_mctx(ibqp->context)->qp_table_mutex); ret = ibv_cmd_destroy_qp(ibqp); - if (ret) + if (ret) { + pthread_mutex_lock(&to_mctx(ibqp->context)->qp_table_mutex); return ret; + } mlx4_lock_cqs(ibqp); @@ -594,6 +601,7 @@ int mlx4_destroy_qp(struct ibv_qp *ibqp) mlx4_clear_qp(to_mctx(ibqp->context), ibqp->qp_num); mlx4_unlock_cqs(ibqp); + pthread_mutex_unlock(&to_mctx(ibqp->context)->qp_table_mutex); if (!ibqp->srq) mlx4_free_db(to_mctx(ibqp->context), MLX4_DB_TYPE_RQ, qp->db); From jackm at dev.mellanox.co.il Sat Nov 22 01:54:01 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sat, 22 Nov 2008 11:54:01 +0200 Subject: [ofa-general] [PATCH 2 of 2] libmthca: Fix race condition in create/destroy QP Message-ID: <200811221154.02427.jackm@dev.mellanox.co.il> Index: libmthca/src/verbs.c =================================================================== --- libmthca.orig/src/verbs.c 2008-11-22 10:33:08.000000000 +0200 +++ libmthca/src/verbs.c 2008-11-22 10:58:01.258153000 +0200 @@ -566,6 +566,7 @@ struct ibv_qp *mthca_create_qp(struct ib cmd.sq_db_index = cmd.rq_db_index = 0; } + pthread_mutex_lock(&to_mctx(pd->context)->qp_table_mutex); ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr, &cmd.ibv_cmd, sizeof cmd, &resp, sizeof resp); if (ret) @@ -579,6 +580,7 @@ struct ibv_qp *mthca_create_qp(struct ib ret = mthca_store_qp(to_mctx(pd->context), qp->ibv_qp.qp_num, qp); if (ret) goto err_destroy; + pthread_mutex_unlock(&to_mctx(pd->context)->qp_table_mutex); qp->sq.max = attr->cap.max_send_wr; qp->rq.max = attr->cap.max_recv_wr; @@ -592,6 +594,7 @@ err_destroy: ibv_cmd_destroy_qp(&qp->ibv_qp); err_rq_db: + pthread_mutex_unlock(&to_mctx(pd->context)->qp_table_mutex); if (mthca_is_memfree(pd->context)) mthca_free_db(to_mctx(pd->context)->db_tab, MTHCA_DB_TYPE_RQ, qp->rq.db_index); @@ -686,9 +689,12 @@ int mthca_destroy_qp(struct ibv_qp *qp) { int ret; + pthread_mutex_lock(&to_mctx(qp->context)->qp_table_mutex); ret = ibv_cmd_destroy_qp(qp); - if (ret) + if (ret) { + pthread_mutex_unlock(&to_mctx(qp->context)->qp_table_mutex); return ret; + } mthca_lock_cqs(qp); @@ -700,6 +706,7 @@ int mthca_destroy_qp(struct ibv_qp *qp) mthca_clear_qp(to_mctx(qp->context), qp->qp_num); mthca_unlock_cqs(qp); + pthread_mutex_unlock(&to_mctx(qp->context)->qp_table_mutex); if (mthca_is_memfree(qp->context)) { mthca_free_db(to_mctx(qp->context)->db_tab, MTHCA_DB_TYPE_RQ, Index: libmthca/src/qp.c =================================================================== --- libmthca.orig/src/qp.c 2008-11-22 10:33:08.000000000 +0200 +++ libmthca/src/qp.c 2008-11-22 10:55:33.313592000 +0200 @@ -909,39 +909,27 @@ struct mthca_qp *mthca_find_qp(struct mt int mthca_store_qp(struct mthca_context *ctx, uint32_t qpn, struct mthca_qp *qp) { int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift; - int ret = 0; - - pthread_mutex_lock(&ctx->qp_table_mutex); if (!ctx->qp_table[tind].refcnt) { ctx->qp_table[tind].table = calloc(ctx->qp_table_mask + 1, sizeof (struct mthca_qp *)); - if (!ctx->qp_table[tind].table) { - ret = -1; - goto out; - } + if (!ctx->qp_table[tind].table) + return -1; } ++ctx->qp_table[tind].refcnt; ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = qp; - -out: - pthread_mutex_unlock(&ctx->qp_table_mutex); - return ret; + return 0; } void mthca_clear_qp(struct mthca_context *ctx, uint32_t qpn) { int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift; - pthread_mutex_lock(&ctx->qp_table_mutex); - if (!--ctx->qp_table[tind].refcnt) free(ctx->qp_table[tind].table); else ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = NULL; - - pthread_mutex_unlock(&ctx->qp_table_mutex); } int mthca_free_err_wqe(struct mthca_qp *qp, int is_send, From vlad at lists.openfabrics.org Sat Nov 22 03:22:06 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sat, 22 Nov 2008 03:22:06 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081122-0200 daily build status Message-ID: <20081122112206.1A044E60BD5@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From sashak at voltaire.com Sat Nov 22 03:51:33 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 22 Nov 2008 13:51:33 +0200 Subject: [ofa-general] [PATCH] opensm/osm_sa_link_record: prevent potential endless recursion Message-ID: <20081122115133.GI8310@sashak.voltaire.com> This patch eliminates osm_node_get_any_physp_ptr() use which can return invalid port in case of "port moving". In this case SA LinkRecord query issued without source and destination LIDs will cause to endless recursion and OpenSM crash. The problem is easily reproducible for example when two ports HCA originally connected by one port to a fabric will be reconnected quickly (in less than OpenSM discovery cycle time) by another port and then (after OpenSM sweep is finished) we will run 'saquery LinkRecord'. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_sa_link_record.c | 24 ++++++++++++------------ 1 files changed, 12 insertions(+), 12 deletions(-) diff --git a/opensm/opensm/osm_sa_link_record.c b/opensm/opensm/osm_sa_link_record.c index c48df14..b92845e 100644 --- a/opensm/opensm/osm_sa_link_record.c +++ b/opensm/opensm/osm_sa_link_record.c @@ -342,18 +342,18 @@ __osm_lr_rcv_get_port_links(IN osm_sa_t * sa, p_node = (osm_node_t *)cl_qmap_head(p_node_tbl); while (p_node != (osm_node_t *)cl_qmap_end(p_node_tbl)) { - /* - Get only one port for each node. - After the recursive call, this function will - scan all the ports of this node anyway. - */ - p_src_physp = osm_node_get_any_physp_ptr(p_node); - p_src_port = osm_get_port_by_guid(sa->p_subn, - osm_physp_get_port_guid(p_src_physp)); - __osm_lr_rcv_get_port_links(sa, p_lr, - p_src_port, NULL, - comp_mask, p_list, - p_req_physp); + num_ports = osm_node_get_num_physp(p_node); + for (port_num = 1; port_num < num_ports; + port_num++) { + p_src_physp = + osm_node_get_physp_ptr(p_node, + port_num); + if (p_src_physp) + __osm_lr_rcv_get_physp_link + (sa, p_lr, p_src_physp, + NULL, comp_mask, p_list, + p_req_physp); + } p_node = (osm_node_t *) cl_qmap_next(&p_node-> map_item); } -- 1.6.0.3.517.g759a From michael at ellerman.id.au Fri Nov 21 19:41:08 2008 From: michael at ellerman.id.au (Michael Ellerman) Date: Sat, 22 Nov 2008 14:41:08 +1100 Subject: [ofa-general] Re: [PATCH] IB/ehca: Fix lockdep failures for shca_list_lock In-Reply-To: <1227283347.3599.8.camel@johannes.berg> References: <200806061835.43802.fenkes@de.ibm.com> <48499C11.7030504@gmail.com> <200811211637.15300.fenkes@de.ibm.com> <1227283347.3599.8.camel@johannes.berg> Message-ID: <1227325268.10134.2.camel@localhost> On Fri, 2008-11-21 at 17:02 +0100, Johannes Berg wrote: > On Fri, 2008-11-21 at 16:37 +0100, Joachim Fenkes wrote: > > > + u64 flags; > > > - spin_lock(&shca_list_lock); > > + spin_lock_irqsave(&shca_list_lock, flags); > > That's wrong and I think will give a warning on all machines where > u64 != unsigned long. Might not particularly matter in this case. Crud, sorry. > Also, generally it seems wrong to say "fix lockdep failure" when the > patch really fixes a bug that lockdep happened to find. True. I guess it should be "fix locking error found with lockdep", to make it clear no one has actually hit the bug. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: From sashak at voltaire.com Sat Nov 22 07:41:48 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 22 Nov 2008 17:41:48 +0200 Subject: [ofa-general] [PATCH] opensm/osm_sw_info_rcv: eliminate osm_node_get_any_physp_ptr() use Message-ID: <20081122154148.GJ8310@sashak.voltaire.com> The function osm_node_get_any_physp_ptr() is dangerous because it uses potentially outdated local port number from NodeInfo. It is wrongly used in commented out functions __osm_si_rcv_get_fwd_tbl() and __osm_si_rcv_get_mcast_fwd_tbl() for direct path determination. In __osm_ni_rcv_process_switch() function the usage is safe (for port GUID only), but due to potential outdate DR path was extracted from MAD. In order to unify all this stuff we will update DR path of switch port 0 on NodeInfo receive and will use it later in discovery process (instead of potentially outdated DR path extracted from port returned by osm_node_get_any_physp_ptr()). Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_node_info_rcv.c | 9 +++++---- opensm/opensm/osm_sw_info_rcv.c | 32 +++++++++----------------------- 2 files changed, 14 insertions(+), 27 deletions(-) diff --git a/opensm/opensm/osm_node_info_rcv.c b/opensm/opensm/osm_node_info_rcv.c index 20b16d1..c52c0d5 100644 --- a/opensm/opensm/osm_node_info_rcv.c +++ b/opensm/opensm/osm_node_info_rcv.c @@ -501,15 +501,16 @@ __osm_ni_rcv_process_switch(IN osm_sm_t * sm, { ib_api_status_t status = IB_SUCCESS; osm_madw_context_t context; - osm_dr_path_t dr_path; + osm_dr_path_t *path; ib_smp_t *p_smp; OSM_LOG_ENTER(sm->p_log); p_smp = osm_madw_get_smp_ptr(p_madw); - osm_dr_path_init(&dr_path, - osm_madw_get_bind_handle(p_madw), + /* update DR path of already initialized switch port 0 */ + path = osm_physp_get_dr_path_ptr(osm_node_get_physp_ptr(p_node, 0)); + osm_dr_path_init(path, osm_madw_get_bind_handle(p_madw), p_smp->hop_count, p_smp->initial_path); context.si_context.node_guid = osm_node_get_node_guid(p_node); @@ -517,7 +518,7 @@ __osm_ni_rcv_process_switch(IN osm_sm_t * sm, context.si_context.light_sweep = FALSE; /* Request a SwitchInfo attribute */ - status = osm_req_get(sm, &dr_path, IB_MAD_ATTR_SWITCH_INFO, + status = osm_req_get(sm, path, IB_MAD_ATTR_SWITCH_INFO, 0, CL_DISP_MSGID_NONE, &context); if (status != IB_SUCCESS) /* continue despite error */ diff --git a/opensm/opensm/osm_sw_info_rcv.c b/opensm/opensm/osm_sw_info_rcv.c index e9973e3..ce86adb 100644 --- a/opensm/opensm/osm_sw_info_rcv.c +++ b/opensm/opensm/osm_sw_info_rcv.c @@ -59,17 +59,13 @@ The plock must be held before calling this function. **********************************************************************/ static void -__osm_si_rcv_get_port_info(IN osm_sm_t * sm, - IN osm_switch_t * const p_sw, - IN const osm_madw_t * const p_madw) +__osm_si_rcv_get_port_info(IN osm_sm_t * sm, IN osm_switch_t * const p_sw) { osm_madw_context_t context; uint8_t port_num; osm_physp_t *p_physp; osm_node_t *p_node; uint8_t num_ports; - osm_dr_path_t dr_path; - const ib_smp_t *p_smp; ib_api_status_t status = IB_SUCCESS; OSM_LOG_ENTER(sm->p_log); @@ -77,19 +73,13 @@ __osm_si_rcv_get_port_info(IN osm_sm_t * sm, CL_ASSERT(p_sw); p_node = p_sw->p_node; - p_smp = osm_madw_get_smp_ptr(p_madw); CL_ASSERT(osm_node_get_type(p_node) == IB_NODE_TYPE_SWITCH); /* Request PortInfo attribute for each port on the switch. - Don't trust the port's own DR Path, since it may no longer - be a legitimate path through the subnet. - Build a path from the mad instead, since we know that path works. - The port's DR Path info gets updated when the PortInfo - attribute is received. */ - p_physp = osm_node_get_any_physp_ptr(p_node); + p_physp = osm_node_get_physp_ptr(p_node, 0); context.pi_context.node_guid = osm_node_get_node_guid(p_node); context.pi_context.port_guid = osm_physp_get_port_guid(p_physp); @@ -98,12 +88,10 @@ __osm_si_rcv_get_port_info(IN osm_sm_t * sm, context.pi_context.active_transition = FALSE; num_ports = osm_node_get_num_physp(p_node); - osm_dr_path_init(&dr_path, osm_madw_get_bind_handle(p_madw), - p_smp->hop_count, p_smp->initial_path); for (port_num = 0; port_num < num_ports; port_num++) { - status = osm_req_get(sm, &dr_path, IB_MAD_ATTR_PORT_INFO, - cl_hton32(port_num), + status = osm_req_get(sm, osm_physp_get_dr_path_ptr(p_physp), + IB_MAD_ATTR_PORT_INFO, cl_hton32(port_num), CL_DISP_MSGID_NONE, &context); if (status != IB_SUCCESS) /* continue the loop despite the error */ @@ -138,13 +126,12 @@ __osm_si_rcv_get_fwd_tbl(IN osm_sm_t * sm, IN osm_switch_t * const p_sw) CL_ASSERT(osm_node_get_type(p_node) == IB_NODE_TYPE_SWITCH); - p_physp = osm_node_get_any_physp_ptr(p_node); - context.lft_context.node_guid = osm_node_get_node_guid(p_node); context.lft_context.set_method = FALSE; max_block_id_ho = osm_switch_get_max_block_id_in_use(p_sw); + p_physp = osm_node_get_physp_ptr(p_node, 0); p_dr_path = osm_physp_get_dr_path_ptr(p_physp); for (block_id_ho = 0; block_id_ho <= max_block_id_ho; block_id_ho++) { @@ -197,12 +184,10 @@ __osm_si_rcv_get_mcast_fwd_tbl(IN osm_sm_t * sm, IN osm_switch_t * const p_sw) goto Exit; } - p_physp = osm_node_get_any_physp_ptr(p_node); - p_tbl = osm_switch_get_mcast_tbl_ptr(p_sw); - context.mft_context.node_guid = osm_node_get_node_guid(p_node); context.mft_context.set_method = FALSE; + p_tbl = osm_switch_get_mcast_tbl_ptr(p_sw); max_block_id_ho = osm_mcast_tbl_get_max_block(p_tbl); if (max_block_id_ho > IB_MCAST_MAX_BLOCK_ID) { @@ -221,6 +206,7 @@ __osm_si_rcv_get_mcast_fwd_tbl(IN osm_sm_t * sm, IN osm_switch_t * const p_sw) "Max MFT block = %u, Max position = %u\n", max_block_id_ho, max_position); + p_physp = osm_node_get_physp_ptr(p_node, 0); p_dr_path = osm_physp_get_dr_path_ptr(p_physp); for (block_id_ho = 0; block_id_ho <= max_block_id_ho; block_id_ho++) { @@ -331,7 +317,7 @@ __osm_si_rcv_process_new(IN osm_sm_t * sm, /* Get the PortInfo attribute for every port. */ - __osm_si_rcv_get_port_info(sm, p_sw, p_madw); + __osm_si_rcv_get_port_info(sm, p_sw); /* Don't bother retrieving the current unicast and multicast tables @@ -426,7 +412,7 @@ __osm_si_rcv_process_existing(IN osm_sm_t * sm, /* If this is the first discovery - then get the port_info */ if (p_sw->discovery_count == 1) - __osm_si_rcv_get_port_info(sm, p_sw, p_madw); + __osm_si_rcv_get_port_info(sm, p_sw); else OSM_LOG(sm->p_log, OSM_LOG_DEBUG, "Not discovering again through switch:0x%" -- 1.6.0.4.766.g6fc4a From ogerlitz at voltaire.com Sat Nov 22 23:22:48 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 23 Nov 2008 09:22:48 +0200 Subject: [ofa-general] infiniband problem, no NICs In-Reply-To: <4925BD78.4030003@tu-berlin.de> References: <4925BD78.4030003@tu-berlin.de> Message-ID: <492904C8.7000402@voltaire.com> Michael Oevermann wrote: > However, when I directly start a mpi job (without using a scheduler) via: > /usr/mpi/gcc4/openmpi-1.2.2-1/bin/mpirun -np 4 -hostfile > /home/sysgen/infiniband-mpi-test/machine/usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1 > > > I get the error message: > > 0,1,0]: uDAPL on host n01 was unable to find any NICs. Another > transport will be used instead, although this may result in lower > performance. The BTL you are working with uses a library named udapl and this library relies on the IPoIB (IP over Infiniband) NICs (e.g ib0, ib1) existence. Assuming these nics are not configured on your system, you can either configure them (modprobe ib_ipoib / ifconfig ib0 x.y.z.w) or use a verb (native IB access layer) BTL which does not reply on operative ipoib. Or. From sashak at voltaire.com Sun Nov 23 00:34:05 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 23 Nov 2008 10:34:05 +0200 Subject: [ofa-general] [PATCH] opensm: remove osm_node_get_any_dr_part_ptr() function Message-ID: <20081123083405.GD21967@sashak.voltaire.com> The function osm_node_get_any_dr_path_ptr() is dangerous because it uses potentially outdated local port number from NodeInfo. The port moving in combination with PortInfo Get failure may cause that wrong DR path will be used and subnet will never up (the issue was simulated with ibsim). This patch removes this funtion completely and instead uses DR path of switch port 0 which is always up to date. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_node.h | 39 -------------------------------------- opensm/opensm/osm_mcast_mgr.c | 4 +-- opensm/opensm/osm_state_mgr.c | 2 +- opensm/opensm/osm_ucast_mgr.c | 4 +-- 4 files changed, 3 insertions(+), 46 deletions(-) diff --git a/opensm/include/opensm/osm_node.h b/opensm/include/opensm/osm_node.h index 24e399e..8d90f88 100644 --- a/opensm/include/opensm/osm_node.h +++ b/opensm/include/opensm/osm_node.h @@ -272,45 +272,6 @@ static inline osm_physp_t *osm_node_get_any_physp_ptr(IN const osm_node_t * * Node object *********/ -/****f* OpenSM: Node/osm_node_get_any_path -* NAME -* osm_node_get_any_path -* -* DESCRIPTION -* Returns a pointer to the physical port object at the -* specified local port number. -* -* SYNOPSIS -*/ -static inline osm_dr_path_t *osm_node_get_any_dr_path_ptr(IN const osm_node_t * - const p_node) -{ - CL_ASSERT(p_node); - return (osm_physp_get_dr_path_ptr - (&p_node-> - physp_table[ib_node_info_get_local_port_num - (&p_node->node_info)])); -} - -/* -* PARAMETERS -* p_node -* [in] Pointer to an osm_node_t object. -* -* port_num -* [in] Local port number. -* -* RETURN VALUES -* Returns a pointer to the physical port object at the -* specified local port number. -* A return value of zero means the port number was out of range. -* -* NOTES -* -* SEE ALSO -* Node object -*********/ - /****f* OpenSM: Node/osm_node_get_type * NAME * osm_node_get_type diff --git a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c index 6d26694..2f9cb5e 100644 --- a/opensm/opensm/osm_mcast_mgr.c +++ b/opensm/opensm/osm_mcast_mgr.c @@ -356,9 +356,7 @@ __osm_mcast_mgr_set_tbl(osm_sm_t * sm, IN osm_switch_t * const p_sw) CL_ASSERT(p_node); - p_path = osm_node_get_any_dr_path_ptr(p_node); - - CL_ASSERT(p_path); + p_path = osm_physp_get_dr_path_ptr(osm_node_get_physp_ptr(p_node, 0)); /* Send multicast forwarding table blocks to the switch diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index 9404e24..599af0a 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -135,7 +135,7 @@ static void __osm_state_mgr_get_sw_info(IN cl_map_item_t * const p_object, OSM_LOG_ENTER(sm->p_log); p_node = p_sw->p_node; - p_dr_path = osm_node_get_any_dr_path_ptr(p_node); + p_dr_path = osm_physp_get_dr_path_ptr(osm_node_get_physp_ptr(p_node, 0)); memset(&mad_context, 0, sizeof(mad_context)); diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c index 175817c..1409e15 100644 --- a/opensm/opensm/osm_ucast_mgr.c +++ b/opensm/opensm/osm_ucast_mgr.c @@ -336,9 +336,7 @@ int osm_ucast_mgr_set_fwd_table(IN osm_ucast_mgr_t * const p_mgr, CL_ASSERT(p_node); - p_path = osm_node_get_any_dr_path_ptr(p_node); - - CL_ASSERT(p_path); + p_path = osm_physp_get_dr_path_ptr(osm_node_get_physp_ptr(p_node, 0)); /* Set the top of the unicast forwarding table. -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sun Nov 23 01:05:57 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 23 Nov 2008 11:05:57 +0200 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_state_mgr.c: Add check for valid physical port before using pointer. In-Reply-To: <20081118140608.19ac0963.weiny2@llnl.gov> References: <20081104095744.35893d4a.weiny2@llnl.gov> <20081110201333.GM313@sashak.voltaire.com> <20081110131140.52561f42.weiny2@llnl.gov> <20081112185457.GD27271@sashak.voltaire.com> <20081118123000.GO10251@sashak.voltaire.com> <20081118140608.19ac0963.weiny2@llnl.gov> Message-ID: <20081123090557.GF21967@sashak.voltaire.com> Hi Ira, On 14:06 Tue 18 Nov , Ira Weiny wrote: > I am not sure this will fix my bug. > > The stack trace in my bug ended with: > > #0 osm_vendor_get (h_bind=0x0, mad_size=256, p_vw=0x69bbe8) at > > The h_bind was being extracted from the osm_physp_t object. Would this fix > ensure that the h_bind pointer was valid in the osm_physp_t object returned? Not always :(. It will protect against port moving, but may not help in case of PortInfo Get failure (as far as I understand now it is your case). Finally I just removed osm_node_get_any_physp_ptr() (as well as osm_node_get_any_dr_path() which uses similar assumption about local port number in NodeInfo) in all places where is was used. I think we can do the same in __osm_state_mgr_get_node_desc() and remove osm_node_get_any_physp_ptr() completely. The patch shortly. Sasha From alekseys at voltaire.com Sun Nov 23 01:16:34 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Sun, 23 Nov 2008 09:16:34 +0000 Subject: [ofa-general] RDMA CM and IPv6 support Message-ID: <1227431794.4180.7.camel@alst60.voltaire.com> Hi, Roland. There was a set of kernel patches written by me and approved by Sean for RDMA CM to support IPv6 protocol. Is there any reason why it not applied? I'll be glad fix them. Here is the reference to this thread. http://lists.openfabrics.org/pipermail/general/2008-August/053663.html From ogerlitz at voltaire.com Sun Nov 23 01:23:37 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 23 Nov 2008 11:23:37 +0200 Subject: [ofa-general] [PATCH] iser: avoid recv buf exhaustion In-Reply-To: <1227247845-16023-1-git-send-email-ddiss@sgi.com> References: <1227247845-16023-1-git-send-email-ddiss@sgi.com> Message-ID: <49292119.9080105@voltaire.com> David Disseldorp wrote: > iSCSI/iSER targets may send PDUs without a prior request from the initiator, RFC 5046 refers to these PDUs as "unexpected". NOP-In PDUs with itt=RESERVED and Asynchronous Message PDUs occupy this category. Currently when an iSER target sends an "unexpected" PDU, the initiators recv buffer consumed by the PDU is not replaced. If over initial_post_recv_bufs_num "unexpected" PDUs are received then the receive queue will run out of receive work requests. Assuming these target initiated NOP-Ins are echoed back by the initiator, the current code of iser_send_control would post a receive buffer when sending the NOP-Out which will account for the buffer consumed by the NOP-In. So we are remained with the Asynchronous PDUs for which your patch indeed seems to fix a hole in the implementation. > > This patch ensures recv buffers consumed by "unexpected" PDUs are replaced prior to sending the next control-type PDU. The practice used by the patch is account unexpected receives and refill the receive buffer queue when ever possible with as many as unexpected receives that took place since the last refill attempt. To ease with future maintainance and debugging / simplicity of the code, I would prefer a patch with zero foot-print at the iser_send_xxx functions, something like account --async-- receives and when calling iser_post_receive_control fill-in the missing buffers. > @@ -586,6 +635,21 @@ void iser_rcv_completion(struct iser_desc *rx_desc, > * parallel to the execution of iser_conn_term. So the code that waits * > * for the posted rx bufs refcount to become zero handles everything */ > atomic_dec(&conn->ib_conn->post_recv_buf_count); > + > + /* > + * if an unexpected PDU was received then the recv wr consumed must > + * be replaced, this is done in the next send of a control-type PDU > + */ > + if ((opcode == ISCSI_OP_NOOP_IN) > + && (hdr->itt == RESERVED_ITT)) { > + /* nop-in with itt = 0xffffffff */ > + atomic_inc(&conn->ib_conn->unexpected_pdu_count); > + } As I wrote above, this seems to be unneeded Or. From sashak at voltaire.com Sun Nov 23 01:32:08 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 23 Nov 2008 11:32:08 +0200 Subject: [ofa-general] [PATCH] opensm: remove osm_node_get_any_physp_ptr() function Message-ID: <20081123093208.GG21967@sashak.voltaire.com> The function osm_node_get_any_physp_ptr() is dangerous because it uses potentially outdated local port number from NodeInfo. The port moving and/or PortInfo Get failures may cause that pointer to a wrong (unintialized) port will be returned. This patch removes this funtion completely. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_node.h | 36 ------------------------------------ opensm/opensm/osm_state_mgr.c | 13 +++++++++---- 2 files changed, 9 insertions(+), 40 deletions(-) diff --git a/opensm/include/opensm/osm_node.h b/opensm/include/opensm/osm_node.h index 8d90f88..50b3598 100644 --- a/opensm/include/opensm/osm_node.h +++ b/opensm/include/opensm/osm_node.h @@ -236,42 +236,6 @@ static inline osm_physp_t *osm_node_get_physp_ptr(IN osm_node_t * const p_node, * Node object *********/ -/****f* OpenSM: Node/osm_node_get_any_physp_ptr -* NAME -* osm_node_get_any_physp_ptr -* -* DESCRIPTION -* Returns a pointer to any valid physical port object associated -* with this node. This operation is mostly meaningful for switches, -* in which case all the Physical Ports share the same GUID. -* -* SYNOPSIS -*/ -static inline osm_physp_t *osm_node_get_any_physp_ptr(IN const osm_node_t * - const p_node) -{ - CL_ASSERT(p_node); - return ((osm_physp_t *) & p_node-> - physp_table[ib_node_info_get_local_port_num - (&p_node->node_info)]); -} - -/* -* PARAMETERS -* p_node -* [in] Pointer to an osm_node_t object. -* -* RETURN VALUES -* Returns a pointer to any valid physical port object associated -* with this node. This operation is mostly meaningful for switches, -* in which case all the Physical Ports share the same GUID. -* -* NOTES -* -* SEE ALSO -* Node object -*********/ - /****f* OpenSM: Node/osm_node_get_type * NAME * osm_node_get_type diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index 599af0a..56212fe 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -524,7 +524,8 @@ static void __osm_state_mgr_get_node_desc(IN cl_map_item_t * const p_object, osm_madw_context_t mad_context; osm_node_t *const p_node = (osm_node_t *) p_object; osm_sm_t *sm = context; - osm_physp_t *p_physp; + osm_physp_t *p_physp = NULL; + unsigned i, num_ports; ib_api_status_t status; OSM_LOG_ENTER(sm->p_log); @@ -541,10 +542,14 @@ static void __osm_state_mgr_get_node_desc(IN cl_map_item_t * const p_object, cl_ntoh64(osm_node_get_node_guid (p_node))); /* get a physp to request from. */ - p_physp = osm_node_get_any_physp_ptr(p_node); - if (!osm_physp_is_valid(p_physp)) { + num_ports = osm_node_get_num_physp(p_node); + for (i = 0; i < num_ports; i++) + if ((p_physp = osm_node_get_physp_ptr(p_node, i))) + break; + + if (!p_physp) { OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 331C: " - "Failed to get valid physical port object\n"); + "Failed to find any valid physical port object.\n"); goto exit; } -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sun Nov 23 03:05:56 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 23 Nov 2008 13:05:56 +0200 Subject: [ofa-general] Re: [PATCH v2] opensm: free lft_buf if it matches switch's lft In-Reply-To: <49255C13.5030503@dev.mellanox.co.il> References: <4909DAC8.4040602@dev.mellanox.co.il> <20081030214519.GN7502@sashak.voltaire.com> <490A2C5D.4080309@dev.mellanox.co.il> <20081031043226.GH16455@sashak.voltaire.com> <49255C13.5030503@dev.mellanox.co.il> Message-ID: <20081123110556.GH21967@sashak.voltaire.com> Hi Yevgeny, On 14:46 Thu 20 Nov , Yevgeny Kliteynik wrote: > > I can do something like the following patch, but I have > some strange feeling that I'm missing something... I cannot see any errors here. But probably you can use simpler approach - just cleanup all switch's lft_buf separately after ucast_mgr is finished (including wait_for_pending_transactions()). Something like below (if it is fine for you I can just apply this patch). BTW, what about to rename lft_buf to new_lft (to improve readability)? Sasha diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index 56212fe..c810106 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -1001,6 +1001,23 @@ static void __osm_state_mgr_check_tbl_consistency(IN osm_sm_t * sm) OSM_LOG_EXIT(sm->p_log); } +static void cleanup_switch(cl_map_item_t *item, void *log) +{ + osm_switch_t *sw = (osm_switch_t *)item; + + if (!sw->lft_buf) + return; + + if (memcmp(sw->lft, sw->lft_buf, IB_LID_UCAST_END_HO + 1)) + osm_log(log, OSM_LOG_ERROR, "ERR 331D: " + "LFT of switch 0x%016" PRIx64 " is not up to date.\n", + cl_ntoh64(sw->p_node->node_info.node_guid)); + else { + free(sw->lft_buf); + sw->lft_buf = NULL; + } +} + /********************************************************************** **********************************************************************/ int wait_for_pending_transactions(osm_stats_t * stats) @@ -1254,6 +1271,9 @@ _repeat_discovery: if (wait_for_pending_transactions(&sm->p_subn->p_osm->stats)) return; + /* cleanup switch lft buffers */ + cl_qmap_apply_func(&sm->p_subn->sw_guid_tbl, cleanup_switch, sm->p_log); + /* We are done setting all LFTs so clear the ignore existing. * From now on, as long as we are still master, we want to * take into account these lfts. */ diff --git a/opensm/opensm/osm_switch.c b/opensm/opensm/osm_switch.c index 642dcd1..c446f4f 100644 --- a/opensm/opensm/osm_switch.c +++ b/opensm/opensm/osm_switch.c @@ -114,13 +114,6 @@ osm_switch_init(IN osm_switch_t * const p_sw, /* Initialize the table to OSM_NO_PATH, which is "invalid port" */ memset(p_sw->lft, OSM_NO_PATH, IB_LID_UCAST_END_HO + 1); - p_sw->lft_buf = malloc(IB_LID_UCAST_END_HO + 1); - if (!p_sw->lft_buf) { - status = IB_INSUFFICIENT_MEMORY; - goto Exit; - } - memset(p_sw->lft_buf, OSM_NO_PATH, IB_LID_UCAST_END_HO + 1); - p_sw->p_prof = malloc(sizeof(*p_sw->p_prof) * num_ports); if (p_sw->p_prof == NULL) { status = IB_INSUFFICIENT_MEMORY; diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c index 1409e15..3d47640 100644 --- a/opensm/opensm/osm_ucast_mgr.c +++ b/opensm/opensm/osm_ucast_mgr.c @@ -397,13 +397,6 @@ int osm_ucast_mgr_set_fwd_table(IN osm_ucast_mgr_t * const p_mgr, goto Exit; } - if (!p_sw->need_update && - !memcmp(p_sw->lft, p_sw->lft_buf, IB_LID_UCAST_END_HO + 1)) { - free(p_sw->lft_buf); - p_sw->lft_buf = NULL; - goto Exit; - } - for (block_id_ho = 0; osm_switch_get_lft_block(p_sw, block_id_ho, block); block_id_ho++) { From vlad at lists.openfabrics.org Sun Nov 23 03:22:02 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sun, 23 Nov 2008 03:22:02 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081123-0200 daily build status Message-ID: <20081123112202.4115AE60CD2@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From kliteyn at dev.mellanox.co.il Sun Nov 23 03:58:20 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 23 Nov 2008 13:58:20 +0200 Subject: [ofa-general] Re: [PATCH v2] opensm: free lft_buf if it matches switch's lft In-Reply-To: <20081123110556.GH21967@sashak.voltaire.com> References: <4909DAC8.4040602@dev.mellanox.co.il> <20081030214519.GN7502@sashak.voltaire.com> <490A2C5D.4080309@dev.mellanox.co.il> <20081031043226.GH16455@sashak.voltaire.com> <49255C13.5030503@dev.mellanox.co.il> <20081123110556.GH21967@sashak.voltaire.com> Message-ID: <4929455C.2080407@dev.mellanox.co.il> Hi Sasha, Sasha Khapyorsky wrote: > Hi Yevgeny, > > On 14:46 Thu 20 Nov , Yevgeny Kliteynik wrote: >> I can do something like the following patch, but I have >> some strange feeling that I'm missing something... > > I cannot see any errors here. But probably you can use simpler approach > - just cleanup all switch's lft_buf separately after ucast_mgr is > finished (including wait_for_pending_transactions()). Something like > below (if it is fine for you I can just apply this patch). In general, looks good. See below. > BTW, what about to rename lft_buf to new_lft (to improve readability)? Sure, why not. > Sasha > > > diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c > index 56212fe..c810106 100644 > --- a/opensm/opensm/osm_state_mgr.c > +++ b/opensm/opensm/osm_state_mgr.c > @@ -1001,6 +1001,23 @@ static void __osm_state_mgr_check_tbl_consistency(IN osm_sm_t * sm) > OSM_LOG_EXIT(sm->p_log); > } > > +static void cleanup_switch(cl_map_item_t *item, void *log) > +{ > + osm_switch_t *sw = (osm_switch_t *)item; > + > + if (!sw->lft_buf) > + return; > + > + if (memcmp(sw->lft, sw->lft_buf, IB_LID_UCAST_END_HO + 1)) Should it turn on the p_subn->subnet_initialization_error flag? > + osm_log(log, OSM_LOG_ERROR, "ERR 331D: " > + "LFT of switch 0x%016" PRIx64 " is not up to date.\n", > + cl_ntoh64(sw->p_node->node_info.node_guid)); > + else { > + free(sw->lft_buf); > + sw->lft_buf = NULL; > + } > +} > + > /********************************************************************** > **********************************************************************/ > int wait_for_pending_transactions(osm_stats_t * stats) > @@ -1254,6 +1271,9 @@ _repeat_discovery: > if (wait_for_pending_transactions(&sm->p_subn->p_osm->stats)) > return; > > + /* cleanup switch lft buffers */ > + cl_qmap_apply_func(&sm->p_subn->sw_guid_tbl, cleanup_switch, sm->p_log); > + > /* We are done setting all LFTs so clear the ignore existing. > * From now on, as long as we are still master, we want to > * take into account these lfts. */ > diff --git a/opensm/opensm/osm_switch.c b/opensm/opensm/osm_switch.c > index 642dcd1..c446f4f 100644 > --- a/opensm/opensm/osm_switch.c > +++ b/opensm/opensm/osm_switch.c > @@ -114,13 +114,6 @@ osm_switch_init(IN osm_switch_t * const p_sw, > /* Initialize the table to OSM_NO_PATH, which is "invalid port" */ > memset(p_sw->lft, OSM_NO_PATH, IB_LID_UCAST_END_HO + 1); > > - p_sw->lft_buf = malloc(IB_LID_UCAST_END_HO + 1); > - if (!p_sw->lft_buf) { > - status = IB_INSUFFICIENT_MEMORY; > - goto Exit; > - } > - memset(p_sw->lft_buf, OSM_NO_PATH, IB_LID_UCAST_END_HO + 1); > - This part is relevant even w/o the rest of the patch, right? -- Yevgeny > p_sw->p_prof = malloc(sizeof(*p_sw->p_prof) * num_ports); > if (p_sw->p_prof == NULL) { > status = IB_INSUFFICIENT_MEMORY; > diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c > index 1409e15..3d47640 100644 > --- a/opensm/opensm/osm_ucast_mgr.c > +++ b/opensm/opensm/osm_ucast_mgr.c > @@ -397,13 +397,6 @@ int osm_ucast_mgr_set_fwd_table(IN osm_ucast_mgr_t * const p_mgr, > goto Exit; > } > > - if (!p_sw->need_update && > - !memcmp(p_sw->lft, p_sw->lft_buf, IB_LID_UCAST_END_HO + 1)) { > - free(p_sw->lft_buf); > - p_sw->lft_buf = NULL; > - goto Exit; > - } > - > for (block_id_ho = 0; > osm_switch_get_lft_block(p_sw, block_id_ho, block); > block_id_ho++) { > From kliteyn at dev.mellanox.co.il Sun Nov 23 04:03:57 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 23 Nov 2008 14:03:57 +0200 Subject: [ofa-general] Re: [PATCH v2] opensm: free lft_buf if it matches switch's lft In-Reply-To: <20081123110556.GH21967@sashak.voltaire.com> References: <4909DAC8.4040602@dev.mellanox.co.il> <20081030214519.GN7502@sashak.voltaire.com> <490A2C5D.4080309@dev.mellanox.co.il> <20081031043226.GH16455@sashak.voltaire.com> <49255C13.5030503@dev.mellanox.co.il> <20081123110556.GH21967@sashak.voltaire.com> Message-ID: <492946AD.5090308@dev.mellanox.co.il> Sasha, Sasha Khapyorsky wrote: > Hi Yevgeny, > > On 14:46 Thu 20 Nov , Yevgeny Kliteynik wrote: >> I can do something like the following patch, but I have >> some strange feeling that I'm missing something... > > I cannot see any errors here. But probably you can use simpler approach > - just cleanup all switch's lft_buf separately after ucast_mgr is > finished (including wait_for_pending_transactions()). I've been doing some thinking... Basically, what you're saying is that at the end of each and every heavy sweep you will free ALL the lft_buf arrays, unless there was some error, that should trigger a new heavy sweep anyway. So what's the point of having lft_buf in the first place? It was relevant in the beginning of ucast cache implementation, but now after all the lft simplifications, I don't see how it is used. Am I missing something here, or should we just remove all these lft_buf and go back to single ucast_mgr_t.lft_buf? -- Yevgeny From sashak at voltaire.com Sun Nov 23 04:17:38 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 23 Nov 2008 14:17:38 +0200 Subject: [ofa-general] Re: [PATCH v2] opensm: free lft_buf if it matches switch's lft In-Reply-To: <4929455C.2080407@dev.mellanox.co.il> References: <4909DAC8.4040602@dev.mellanox.co.il> <20081030214519.GN7502@sashak.voltaire.com> <490A2C5D.4080309@dev.mellanox.co.il> <20081031043226.GH16455@sashak.voltaire.com> <49255C13.5030503@dev.mellanox.co.il> <20081123110556.GH21967@sashak.voltaire.com> <4929455C.2080407@dev.mellanox.co.il> Message-ID: <20081123121738.GJ21967@sashak.voltaire.com> On 13:58 Sun 23 Nov , Yevgeny Kliteynik wrote: > Hi Sasha, > > Sasha Khapyorsky wrote: >> Hi Yevgeny, >> On 14:46 Thu 20 Nov , Yevgeny Kliteynik wrote: >>> I can do something like the following patch, but I have >>> some strange feeling that I'm missing something... >> I cannot see any errors here. But probably you can use simpler approach >> - just cleanup all switch's lft_buf separately after ucast_mgr is >> finished (including wait_for_pending_transactions()). Something like >> below (if it is fine for you I can just apply this patch). > > In general, looks good. See below. > >> BTW, what about to rename lft_buf to new_lft (to improve readability)? > > Sure, why not. > >> Sasha >> diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c >> index 56212fe..c810106 100644 >> --- a/opensm/opensm/osm_state_mgr.c >> +++ b/opensm/opensm/osm_state_mgr.c >> @@ -1001,6 +1001,23 @@ static void >> __osm_state_mgr_check_tbl_consistency(IN osm_sm_t * sm) >> OSM_LOG_EXIT(sm->p_log); >> } >> +static void cleanup_switch(cl_map_item_t *item, void *log) >> +{ >> + osm_switch_t *sw = (osm_switch_t *)item; >> + >> + if (!sw->lft_buf) >> + return; >> + >> + if (memcmp(sw->lft, sw->lft_buf, IB_LID_UCAST_END_HO + 1)) > > Should it turn on the p_subn->subnet_initialization_error flag? Maybe, but I'm not sure - this is more for bug#1401 materials :), basically I would expect subnet_initialization_error flag setup when LFT Set fails. > >> + osm_log(log, OSM_LOG_ERROR, "ERR 331D: " >> + "LFT of switch 0x%016" PRIx64 " is not up to date.\n", >> + cl_ntoh64(sw->p_node->node_info.node_guid)); >> + else { >> + free(sw->lft_buf); >> + sw->lft_buf = NULL; >> + } >> +} >> + >> /********************************************************************** >> **********************************************************************/ >> int wait_for_pending_transactions(osm_stats_t * stats) >> @@ -1254,6 +1271,9 @@ _repeat_discovery: >> if (wait_for_pending_transactions(&sm->p_subn->p_osm->stats)) >> return; >> + /* cleanup switch lft buffers */ >> + cl_qmap_apply_func(&sm->p_subn->sw_guid_tbl, cleanup_switch, sm->p_log); >> + >> /* We are done setting all LFTs so clear the ignore existing. >> * From now on, as long as we are still master, we want to >> * take into account these lfts. */ >> diff --git a/opensm/opensm/osm_switch.c b/opensm/opensm/osm_switch.c >> index 642dcd1..c446f4f 100644 >> --- a/opensm/opensm/osm_switch.c >> +++ b/opensm/opensm/osm_switch.c >> @@ -114,13 +114,6 @@ osm_switch_init(IN osm_switch_t * const p_sw, >> /* Initialize the table to OSM_NO_PATH, which is "invalid port" */ >> memset(p_sw->lft, OSM_NO_PATH, IB_LID_UCAST_END_HO + 1); >> - p_sw->lft_buf = malloc(IB_LID_UCAST_END_HO + 1); >> - if (!p_sw->lft_buf) { >> - status = IB_INSUFFICIENT_MEMORY; >> - goto Exit; >> - } >> - memset(p_sw->lft_buf, OSM_NO_PATH, IB_LID_UCAST_END_HO + 1); >> - > > This part is relevant even w/o the rest of the patch, right? Yes. Sasha > > -- Yevgeny > >> p_sw->p_prof = malloc(sizeof(*p_sw->p_prof) * num_ports); >> if (p_sw->p_prof == NULL) { >> status = IB_INSUFFICIENT_MEMORY; >> diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c >> index 1409e15..3d47640 100644 >> --- a/opensm/opensm/osm_ucast_mgr.c >> +++ b/opensm/opensm/osm_ucast_mgr.c >> @@ -397,13 +397,6 @@ int osm_ucast_mgr_set_fwd_table(IN osm_ucast_mgr_t * >> const p_mgr, >> goto Exit; >> } >> - if (!p_sw->need_update && >> - !memcmp(p_sw->lft, p_sw->lft_buf, IB_LID_UCAST_END_HO + 1)) { >> - free(p_sw->lft_buf); >> - p_sw->lft_buf = NULL; >> - goto Exit; >> - } >> - >> for (block_id_ho = 0; >> osm_switch_get_lft_block(p_sw, block_id_ho, block); >> block_id_ho++) { > From kliteyn at dev.mellanox.co.il Sun Nov 23 04:20:37 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 23 Nov 2008 14:20:37 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_switch.h: use updated LFT for routing In-Reply-To: <20081121192428.GB8310@sashak.voltaire.com> References: <492550E3.90805@dev.mellanox.co.il> <20081121192428.GB8310@sashak.voltaire.com> Message-ID: <49294A95.3060100@dev.mellanox.co.il> Sasha, Sasha Khapyorsky wrote: > Hi Yevgeny, > > On 13:58 Thu 20 Nov , Yevgeny Kliteynik wrote: >> Function osm_switch_get_port_by_lid() was using the switch's >> LFT, so this LFT might not be updated to recent routing. > > I guess it could be only with 'subnet_initialization_error' flag up > (failed LinFwdTbl set will trigger this flag). >> I think that this was also relevant before the LFT simplification. > > Yes, logically it should be so, but... > >> One immediate outcome of this bug is opensm.fdbs file - when it >> is dumped from the switch LFT (and not from lft_buf), > > Why this bug is triggered only now? I had sometimes errors in simulations, and after aome analysis I decided that they are timing problems with the tests. Now that I did some stress testing of ucast cache, I started to see more of these errors. >> it sometimes >> doesn't match the lst file. > > What this "sometimes" mean? I think the case should be investigated > deeper. By such patch we are just trying to hide a possible issue. > > As far as I understand opensm.fdbs (and other routing dump) are > generated only after all LinFwdTbl responses are arrived, when some of > them failed 'subnet_initialization_error' flag is up and OpenSM will > resweep. If so why is 'opensm.fdbs' broken? It is not immediately > clear for me. I didn't see 'subnet_initialization_error' in such cases. Anyway, here's what I can do: at the end of each ucast_mgr_process I'll compare lft and lft_buf (something that the other patch is doing, the one that frees lft_buf), and if there is a difference, then we have a problem. In not - then I'll look for the cause elsewhere. -- Yevgeny > Sasha > >> Signed-off-by: Yevgeny Kliteynik >> --- >> opensm/include/opensm/osm_switch.h | 6 +++++- >> 1 files changed, 5 insertions(+), 1 deletions(-) >> >> diff --git a/opensm/include/opensm/osm_switch.h b/opensm/include/opensm/osm_switch.h >> index caa0bc5..f06931c 100644 >> --- a/opensm/include/opensm/osm_switch.h >> +++ b/opensm/include/opensm/osm_switch.h >> @@ -411,7 +411,11 @@ osm_switch_get_port_by_lid(IN const osm_switch_t * const p_sw, >> { >> if (lid_ho == 0 || lid_ho > IB_LID_UCAST_END_HO) >> return OSM_NO_PATH; >> - return p_sw->lft[lid_ho]; >> + >> + if (p_sw->lft_buf) >> + return p_sw->lft_buf[lid_ho]; >> + else >> + return p_sw->lft[lid_ho]; >> } >> /* >> * PARAMETERS >> -- >> 1.5.1.4 >> >> > From sashak at voltaire.com Sun Nov 23 04:24:07 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 23 Nov 2008 14:24:07 +0200 Subject: [ofa-general] Re: [PATCH v2] opensm: free lft_buf if it matches switch's lft In-Reply-To: <492946AD.5090308@dev.mellanox.co.il> References: <4909DAC8.4040602@dev.mellanox.co.il> <20081030214519.GN7502@sashak.voltaire.com> <490A2C5D.4080309@dev.mellanox.co.il> <20081031043226.GH16455@sashak.voltaire.com> <49255C13.5030503@dev.mellanox.co.il> <20081123110556.GH21967@sashak.voltaire.com> <492946AD.5090308@dev.mellanox.co.il> Message-ID: <20081123122407.GK21967@sashak.voltaire.com> On 14:03 Sun 23 Nov , Yevgeny Kliteynik wrote: > Sasha, > > Sasha Khapyorsky wrote: >> Hi Yevgeny, >> On 14:46 Thu 20 Nov , Yevgeny Kliteynik wrote: >>> I can do something like the following patch, but I have >>> some strange feeling that I'm missing something... >> I cannot see any errors here. But probably you can use simpler approach >> - just cleanup all switch's lft_buf separately after ucast_mgr is >> finished (including wait_for_pending_transactions()). > > I've been doing some thinking... > Basically, what you're saying is that at the end of each and > every heavy sweep you will free ALL the lft_buf arrays, unless > there was some error, that should trigger a new heavy sweep > anyway. So what's the point of having lft_buf in the first place? > > It was relevant in the beginning of ucast cache implementation, > but now after all the lft simplifications, I don't see how it > is used. Am I missing something here, or should we just remove > all these lft_buf and go back to single ucast_mgr_t.lft_buf? As far as I remember it was your idea to use newly generated lft_buf in cache regarless to the state of current LFTs. No? Also we have strange bug#1406 yet... Sasha From sashak at voltaire.com Sun Nov 23 04:25:09 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 23 Nov 2008 14:25:09 +0200 Subject: [ofa-general] Re: [PATCH v2] opensm: free lft_buf if it matches switch's lft In-Reply-To: <20081123121738.GJ21967@sashak.voltaire.com> References: <4909DAC8.4040602@dev.mellanox.co.il> <20081030214519.GN7502@sashak.voltaire.com> <490A2C5D.4080309@dev.mellanox.co.il> <20081031043226.GH16455@sashak.voltaire.com> <49255C13.5030503@dev.mellanox.co.il> <20081123110556.GH21967@sashak.voltaire.com> <4929455C.2080407@dev.mellanox.co.il> <20081123121738.GJ21967@sashak.voltaire.com> Message-ID: <20081123122509.GL21967@sashak.voltaire.com> On 14:17 Sun 23 Nov , Sasha Khapyorsky wrote: > >> + if (!sw->lft_buf) > >> + return; > >> + > >> + if (memcmp(sw->lft, sw->lft_buf, IB_LID_UCAST_END_HO + 1)) > > > > Should it turn on the p_subn->subnet_initialization_error flag? > > Maybe, but I'm not sure - this is more for bug#1401 materials :), bug#1406 Sasha From sashak at voltaire.com Sun Nov 23 04:33:00 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 23 Nov 2008 14:33:00 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_switch.h: use updated LFT for routing In-Reply-To: <49294A95.3060100@dev.mellanox.co.il> References: <492550E3.90805@dev.mellanox.co.il> <20081121192428.GB8310@sashak.voltaire.com> <49294A95.3060100@dev.mellanox.co.il> Message-ID: <20081123123300.GN21967@sashak.voltaire.com> On 14:20 Sun 23 Nov , Yevgeny Kliteynik wrote: >>> One immediate outcome of this bug is opensm.fdbs file - when it >>> is dumped from the switch LFT (and not from lft_buf), >> Why this bug is triggered only now? > > I had sometimes errors in simulations, and after aome analysis > I decided that they are timing problems with the tests. > Now that I did some stress testing of ucast cache, I started > to see more of these errors. If you are sure that this is simulator or test problems then just close #1406 as invalid. Obviously we don't need such patch then. > >>> it sometimes >>> doesn't match the lst file. >> What this "sometimes" mean? I think the case should be investigated >> deeper. By such patch we are just trying to hide a possible issue. >> As far as I understand opensm.fdbs (and other routing dump) are >> generated only after all LinFwdTbl responses are arrived, when some of >> them failed 'subnet_initialization_error' flag is up and OpenSM will >> resweep. If so why is 'opensm.fdbs' broken? It is not immediately >> clear for me. > > I didn't see 'subnet_initialization_error' in such cases. > Anyway, here's what I can do: at the end of each ucast_mgr_process > I'll compare lft and lft_buf (something that the other patch is > doing, the one that frees lft_buf), and if there is a difference, > then we have a problem. In not - then I'll look for the cause > elsewhere. Yes, seems deeper investigation is needed here. Thanks. Sasha From kliteyn at dev.mellanox.co.il Sun Nov 23 05:24:37 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 23 Nov 2008 15:24:37 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_switch.h: use updated LFT for routing In-Reply-To: <20081123123300.GN21967@sashak.voltaire.com> References: <492550E3.90805@dev.mellanox.co.il> <20081121192428.GB8310@sashak.voltaire.com> <49294A95.3060100@dev.mellanox.co.il> <20081123123300.GN21967@sashak.voltaire.com> Message-ID: <49295995.7080304@dev.mellanox.co.il> Sasha, Sasha Khapyorsky wrote: > On 14:20 Sun 23 Nov , Yevgeny Kliteynik wrote: >>>> One immediate outcome of this bug is opensm.fdbs file - when it >>>> is dumped from the switch LFT (and not from lft_buf), >>> Why this bug is triggered only now? >> I had sometimes errors in simulations, and after aome analysis >> I decided that they are timing problems with the tests. >> Now that I did some stress testing of ucast cache, I started >> to see more of these errors. > > If you are sure that this is simulator or test problems then just close > #1406 as invalid. Obviously we don't need such patch then. No, I'm not sure. My original patch has eliminated this problem. I any case, deeper investigation is needed. -- Yevgeny >>>> it sometimes >>>> doesn't match the lst file. >>> What this "sometimes" mean? I think the case should be investigated >>> deeper. By such patch we are just trying to hide a possible issue. >>> As far as I understand opensm.fdbs (and other routing dump) are >>> generated only after all LinFwdTbl responses are arrived, when some of >>> them failed 'subnet_initialization_error' flag is up and OpenSM will >>> resweep. If so why is 'opensm.fdbs' broken? It is not immediately >>> clear for me. >> I didn't see 'subnet_initialization_error' in such cases. >> Anyway, here's what I can do: at the end of each ucast_mgr_process >> I'll compare lft and lft_buf (something that the other patch is >> doing, the one that frees lft_buf), and if there is a difference, >> then we have a problem. In not - then I'll look for the cause >> elsewhere. > > Yes, seems deeper investigation is needed here. Thanks. > > Sasha > From sashak at voltaire.com Sun Nov 23 06:16:54 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 23 Nov 2008 16:16:54 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_switch.h: use updated LFT for routing In-Reply-To: <49295995.7080304@dev.mellanox.co.il> References: <492550E3.90805@dev.mellanox.co.il> <20081121192428.GB8310@sashak.voltaire.com> <49294A95.3060100@dev.mellanox.co.il> <20081123123300.GN21967@sashak.voltaire.com> <49295995.7080304@dev.mellanox.co.il> Message-ID: <20081123141654.GP21967@sashak.voltaire.com> On 15:24 Sun 23 Nov , Yevgeny Kliteynik wrote: >> If you are sure that this is simulator or test problems then just close >> #1406 as invalid. Obviously we don't need such patch then. > > No, I'm not sure. My original patch has eliminated this problem. So? Should we workaround simulator/test bugs in OpenSM code? I think we shouldn't. > I any case, deeper investigation is needed. Ok. Sasha From sashak at voltaire.com Sun Nov 23 10:27:41 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 23 Nov 2008 20:27:41 +0200 Subject: [ofa-general] Re: [PATCH 0/3] ibnetdiscover library "libibnetdisc" In-Reply-To: <20081120163809.26a3c499.weiny2@llnl.gov> References: <20081120163809.26a3c499.weiny2@llnl.gov> Message-ID: <20081123182741.GS21967@sashak.voltaire.com> Hi Ira, On 16:38 Thu 20 Nov , Ira Weiny wrote: > The following 3 patches implement "libibnetdisc" which provides the > functionality of ibnetdiscover in a C library. > > I mentioned this to Sasha at the last Sonoma conference and posted the bulk of > this code to the list a few months ago. This libary is still providing the 85% > performance speed up of iblinkinfo.pl on our clusters. This is great! Do not you think this library should be rather part of infiniband-diags, rather than separate package/management sub-project? Personally I would prefer to have this as part of infiniband-diags. Sasha From sashak at voltaire.com Sun Nov 23 10:35:17 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 23 Nov 2008 20:35:17 +0200 Subject: [ofa-general] Re: [PATCH 3/3] Convert ibnetdiscover to use new ibnetdisc library. In-Reply-To: <20081120163815.5cd110fb.weiny2@llnl.gov> References: <20081120163815.5cd110fb.weiny2@llnl.gov> Message-ID: <20081123183517.GT21967@sashak.voltaire.com> Hi Ira, On 16:38 Thu 20 Nov , Ira Weiny wrote: > From e2b8bac5d651c2278719d511dee2ab2e8ad05706 Mon Sep 17 00:00:00 2001 > From: Ira Weiny > Date: Thu, 20 Nov 2008 09:29:57 -0800 > Subject: [PATCH] Convert ibnetdiscover to use new ibnetdisc library. > > Removed -e and -v since they were somewhat redundant with the -d option. I think it would be better to preserve an options for backward compatibility. At least '-v' is used in dump_ftts.sh. It can be used in other scripts... Sasha > > All other functionality is preserved > > Signed-off-by: Ira Weiny > --- > infiniband-diags/Makefile.am | 4 +- > infiniband-diags/man/ibnetdiscover.8 | 10 +- > infiniband-diags/src/ibnetdiscover.c | 910 ++++++++++------------------------ > 3 files changed, 254 insertions(+), 670 deletions(-) > > diff --git a/infiniband-diags/Makefile.am b/infiniband-diags/Makefile.am > index 8f26749..420c69e 100644 > --- a/infiniband-diags/Makefile.am > +++ b/infiniband-diags/Makefile.am > @@ -35,9 +35,9 @@ sbin_SCRIPTS = scripts/ibcheckerrs scripts/ibchecknet scripts/ibchecknode \ > src_ibaddr_SOURCES = src/ibaddr.c src/ibdiag_common.c > src_ibaddr_CFLAGS = -Wall $(DBGFLAGS) > > -src_ibnetdiscover_SOURCES = src/ibnetdiscover.c src/grouping.c src/ibdiag_common.c > +src_ibnetdiscover_SOURCES = src/ibnetdiscover.c src/ibdiag_common.c > src_ibnetdiscover_CFLAGS = -Wall $(DBGFLAGS) > -src_ibnetdiscover_LDFLAGS = -Wl,--rpath -Wl,$(libdir) > +src_ibnetdiscover_LDFLAGS = -Wl,--rpath -Wl,$(libdir) -libnetdisc > > src_iblinkinfo_pl_SOURCES = src/iblinkinfo.c > src_iblinkinfo_pl_CFLAGS = -Wall $(DBGFLAGS) > diff --git a/infiniband-diags/man/ibnetdiscover.8 b/infiniband-diags/man/ibnetdiscover.8 > index 958efa9..768d392 100644 > --- a/infiniband-diags/man/ibnetdiscover.8 > +++ b/infiniband-diags/man/ibnetdiscover.8 > @@ -5,7 +5,7 @@ ibnetdiscover \- discover InfiniBand topology > > .SH SYNOPSIS > .B ibnetdiscover > -[\-d(ebug)] [\-e(rr_show)] [\-v(erbose)] [\-s(how)] [\-l(ist)] [\-g(rouping)] [\-H(ca_list)] [\-S(witch_list)] [\-R(outer_list)] [\-C ca_name] [\-P ca_port] [\-t(imeout) timeout_ms] [\-V(ersion)] [\--node-name-map ] [\-p(orts)] [\-h(elp)] [] > +[\-d(ebug)] [\-s(how)] [\-l(ist)] [\-g(rouping)] [\-H(ca_list)] [\-S(witch_list)] [\-R(outer_list)] [\-C ca_name] [\-P ca_port] [\-t(imeout) timeout_ms] [\-V(ersion)] [\--node-name-map ] [\-p(orts)] [\-h(elp)] [] > > .SH DESCRIPTION > .PP > @@ -37,7 +37,7 @@ List of connected switches > List of connected routers > .TP > \fB\-s\fR, \fB\-\-show\fR > -Show more information > +Show progress information during discovery. > .TP > \fB\-\-node\-name\-map\fR > Specify a node name map. The node name map file maps GUIDs to more user friendly > @@ -57,15 +57,9 @@ using the util_name -h syntax. > # Debugging flags > .PP > \-d raise the IB debugging level. > - May be used several times (-ddd or -d -d -d). > -.PP > -\-e show send and receive errors (timeouts and others) > .PP > \-h show the usage message > .PP > -\-v increase the application verbosity level. > - May be used several times (-vv or -v -v -v) > -.PP > \-V show the version info. > > # Other common flags: > diff --git a/infiniband-diags/src/ibnetdiscover.c b/infiniband-diags/src/ibnetdiscover.c > index 2cfaa8a..d8ead48 100644 > --- a/infiniband-diags/src/ibnetdiscover.c > +++ b/infiniband-diags/src/ibnetdiscover.c > @@ -1,6 +1,7 @@ > /* > * Copyright (c) 2004-2008 Voltaire Inc. All rights reserved. > * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > @@ -47,483 +48,108 @@ > #include > #include > > -#include > -#include > -#include > #include > +#include > +#include > > -#include "ibnetdiscover.h" > -#include "grouping.h" > #include "ibdiag_common.h" > > -static char *node_type_str[] = { > - "???", > - "ca", > - "switch", > - "router", > - "iwarp rnic" > -}; > - > -static char *linkwidth_str[] = { > - "??", > - "1x", > - "4x", > - "??", > - "8x", > - "??", > - "??", > - "??", > - "12x" > -}; > - > -static char *linkspeed_str[] = { > - "???", > - "SDR", > - "DDR", > - "???", > - "QDR" > -}; > - > -static int timeout = 2000; /* ms */ > -static int dumplevel = 0; > +static int debug; > static int verbose; > -static FILE *f; > +#define LIST_CA_NODE (1 << IBND_CA_NODE) > +#define LIST_SWITCH_NODE (1 << IBND_SWITCH_NODE) > +#define LIST_ROUTER_NODE (1 << IBND_ROUTER_NODE) > > char *argv0 = "ibnetdiscover"; > +static FILE *f; > > static char *node_name_map_file = NULL; > static nn_map_t *node_name_map = NULL; > > -Node *nodesdist[MAXHOPS+1]; /* last is Ca list */ > -Node *mynode; > -int maxhops_discovered = 0; > - > -struct ChassisList *chassis = NULL; > - > -static char * > -get_linkwidth_str(int linkwidth) > -{ > - if (linkwidth > 8) > - return linkwidth_str[0]; > - else > - return linkwidth_str[linkwidth]; > -} > - > -static char * > -get_linkspeed_str(int linkspeed) > -{ > - if (linkspeed > 4) > - return linkspeed_str[0]; > - else > - return linkspeed_str[linkspeed]; > -} > - > -static inline const char* > -node_type_str2(Node *node) > -{ > - switch(node->type) { > - case SWITCH_NODE: return "SW"; > - case CA_NODE: return "CA"; > - case ROUTER_NODE: return "RT"; > - } > - return "??"; > -} > - > -void > -decode_port_info(void *pi, Port *port) > -{ > - mad_decode_field(pi, IB_PORT_LID_F, &port->lid); > - mad_decode_field(pi, IB_PORT_LMC_F, &port->lmc); > - mad_decode_field(pi, IB_PORT_STATE_F, &port->state); > - mad_decode_field(pi, IB_PORT_PHYS_STATE_F, &port->physstate); > - mad_decode_field(pi, IB_PORT_LINK_WIDTH_ACTIVE_F, &port->linkwidth); > - mad_decode_field(pi, IB_PORT_LINK_SPEED_ACTIVE_F, &port->linkspeed); > -} > - > - > -int > -get_port(Port *port, int portnum, ib_portid_t *portid) > -{ > - char portinfo[64]; > - void *pi = portinfo; > - > - port->portnum = portnum; > - > - if (!smp_query(pi, portid, IB_ATTR_PORT_INFO, portnum, timeout)) > - return -1; > - decode_port_info(pi, port); > - > - DEBUG("portid %s portnum %d: lid %d state %d physstate %d %s %s", > - portid2str(portid), portnum, port->lid, port->state, port->physstate, get_linkwidth_str(port->linkwidth), get_linkspeed_str(port->linkspeed)); > - return 1; > -} > -/* > - * Returns 0 if non switch node is found, 1 if switch is found, -1 if error. > - */ > -int > -get_node(Node *node, Port *port, ib_portid_t *portid) > -{ > - char portinfo[64]; > - char switchinfo[64]; > - void *pi = portinfo, *ni = node->nodeinfo, *nd = node->nodedesc; > - void *si = switchinfo; > - > - if (!smp_query(ni, portid, IB_ATTR_NODE_INFO, 0, timeout)) > - return -1; > - > - mad_decode_field(ni, IB_NODE_GUID_F, &node->nodeguid); > - mad_decode_field(ni, IB_NODE_TYPE_F, &node->type); > - mad_decode_field(ni, IB_NODE_NPORTS_F, &node->numports); > - mad_decode_field(ni, IB_NODE_DEVID_F, &node->devid); > - mad_decode_field(ni, IB_NODE_VENDORID_F, &node->vendid); > - mad_decode_field(ni, IB_NODE_SYSTEM_GUID_F, &node->sysimgguid); > - mad_decode_field(ni, IB_NODE_PORT_GUID_F, &node->portguid); > - mad_decode_field(ni, IB_NODE_LOCAL_PORT_F, &node->localport); > - port->portnum = node->localport; > - port->portguid = node->portguid; > - > - if (!smp_query(nd, portid, IB_ATTR_NODE_DESC, 0, timeout)) > - return -1; > - > - if (!smp_query(pi, portid, IB_ATTR_PORT_INFO, 0, timeout)) > - return -1; > - decode_port_info(pi, port); > - > - if (node->type != SWITCH_NODE) > - return 0; > - > - node->smalid = port->lid; > - node->smalmc = port->lmc; > - > - /* after we have the sma information find out the real PortInfo for this port */ > - if (!smp_query(pi, portid, IB_ATTR_PORT_INFO, node->localport, timeout)) > - return -1; > - decode_port_info(pi, port); > - > - if (!smp_query(si, portid, IB_ATTR_SWITCH_INFO, 0, timeout)) > - node->smaenhsp0 = 0; /* assume base SP0 */ > - else > - mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, &node->smaenhsp0); > - > - DEBUG("portid %s: got switch node %" PRIx64 " '%s'", > - portid2str(portid), node->nodeguid, node->nodedesc); > - return 1; > -} > - > -static int > -extend_dpath(ib_dr_path_t *path, int nextport) > -{ > - if (path->cnt+2 >= sizeof(path->p)) > - return -1; > - ++path->cnt; > - if (path->cnt > maxhops_discovered) > - maxhops_discovered = path->cnt; > - path->p[path->cnt] = nextport; > - return path->cnt; > -} > - > -static void > -dump_endnode(ib_portid_t *path, char *prompt, Node *node, Port *port) > -{ > - if (!dumplevel) > - return; > - > - fprintf(f, "%s -> %s %s {%016" PRIx64 "} portnum %d lid %d-%d\"%s\"\n", > - portid2str(path), prompt, > - (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), > - node->nodeguid, node->type == SWITCH_NODE ? 0 : port->portnum, > - port->lid, port->lid + (1 << port->lmc) - 1, > - clean_nodedesc(node->nodedesc)); > -} > - > -#define HASHGUID(guid) ((uint32_t)(((uint32_t)(guid) * 101) ^ ((uint32_t)((guid) >> 32) * 103))) > -#define HTSZ 137 > - > -static Node *nodestbl[HTSZ]; > - > -static Node * > -find_node(Node *new) > -{ > - int hash = HASHGUID(new->nodeguid) % HTSZ; > - Node *node; > - > - for (node = nodestbl[hash]; node; node = node->htnext) > - if (node->nodeguid == new->nodeguid) > - return node; > - > - return NULL; > -} > - > -static Node * > -create_node(Node *temp, ib_portid_t *path, int dist) > -{ > - Node *node; > - int hash = HASHGUID(temp->nodeguid) % HTSZ; > - > - node = malloc(sizeof(*node)); > - if (!node) > - return NULL; > - > - memcpy(node, temp, sizeof(*node)); > - node->dist = dist; > - node->path = *path; > - > - node->htnext = nodestbl[hash]; > - nodestbl[hash] = node; > - > - if (node->type != SWITCH_NODE) > - dist = MAXHOPS; /* special Ca list */ > - > - node->dnext = nodesdist[dist]; > - nodesdist[dist] = node; > - > - return node; > -} > - > -static Port * > -find_port(Node *node, Port *port) > -{ > - Port *old; > - > - for (old = node->ports; old; old = old->next) > - if (old->portnum == port->portnum) > - return old; > - > - return NULL; > -} > - > -static Port * > -create_port(Node *node, Port *temp) > -{ > - Port *port; > - > - port = malloc(sizeof(*port)); > - if (!port) > - return NULL; > - > - memcpy(port, temp, sizeof(*port)); > - port->node = node; > - port->next = node->ports; > - node->ports = port; > - > - return port; > -} > - > -static void > -link_ports(Node *node, Port *port, Node *remotenode, Port *remoteport) > -{ > - DEBUG("linking: 0x%" PRIx64 " %p->%p:%u and 0x%" PRIx64 " %p->%p:%u", > - node->nodeguid, node, port, port->portnum, > - remotenode->nodeguid, remotenode, remoteport, remoteport->portnum); > - if (port->remoteport) > - port->remoteport->remoteport = NULL; > - if (remoteport->remoteport) > - remoteport->remoteport->remoteport = NULL; > - port->remoteport = remoteport; > - remoteport->remoteport = port; > -} > - > -static int > -handle_port(Node *node, Port *port, ib_portid_t *path, int portnum, int dist) > -{ > - Node node_buf; > - Port port_buf; > - Node *remotenode, *oldnode; > - Port *remoteport, *oldport; > - > - memset(&node_buf, 0, sizeof(node_buf)); > - memset(&port_buf, 0, sizeof(port_buf)); > - > - DEBUG("handle node %p port %p:%d dist %d", node, port, portnum, dist); > - if (port->physstate != 5) /* LinkUp */ > - return -1; > - > - if (extend_dpath(&path->drpath, portnum) < 0) > - return -1; > - > - if (get_node(&node_buf, &port_buf, path) < 0) { > - IBWARN("NodeInfo on %s failed, skipping port", > - portid2str(path)); > - path->drpath.cnt--; /* restore path */ > - return -1; > - } > - > - oldnode = find_node(&node_buf); > - if (oldnode) > - remotenode = oldnode; > - else if (!(remotenode = create_node(&node_buf, path, dist + 1))) > - IBERROR("no memory"); > - > - oldport = find_port(remotenode, &port_buf); > - if (oldport) { > - remoteport = oldport; > - if (node != remotenode || port != remoteport) > - IBWARN("port moving..."); > - } else if (!(remoteport = create_port(remotenode, &port_buf))) > - IBERROR("no memory"); > - > - dump_endnode(path, oldnode ? "known remote" : "new remote", > - remotenode, remoteport); > - > - link_ports(node, port, remotenode, remoteport); > - > - path->drpath.cnt--; /* restore path */ > - return 0; > -} > - > -/* > - * Return 1 if found, 0 if not, -1 on errors. > - */ > -static int > -discover(ib_portid_t *from) > -{ > - Node node_buf; > - Port port_buf; > - Node *node; > - Port *port; > - int i; > - int dist = 0; > - ib_portid_t *path; > - > - DEBUG("from %s", portid2str(from)); > - > - memset(&node_buf, 0, sizeof(node_buf)); > - memset(&port_buf, 0, sizeof(port_buf)); > - > - if (get_node(&node_buf, &port_buf, from) < 0) { > - IBWARN("can't reach node %s", portid2str(from)); > - return -1; > - } > - > - node = create_node(&node_buf, from, 0); > - if (!node) > - IBERROR("out of memory"); > - > - mynode = node; > - > - port = create_port(node, &port_buf); > - if (!port) > - IBERROR("out of memory"); > - > - if (node->type != SWITCH_NODE && > - handle_port(node, port, from, node->localport, 0) < 0) > - return 0; > - > - for (dist = 0; dist < MAXHOPS; dist++) { > - > - for (node = nodesdist[dist]; node; node = node->dnext) { > - > - path = &node->path; > - > - DEBUG("dist %d node %p", dist, node); > - dump_endnode(path, "processing", node, port); > - > - for (i = 1; i <= node->numports; i++) { > - if (i == node->localport) > - continue; > - > - if (get_port(&port_buf, i, path) < 0) { > - IBWARN("can't reach node %s port %d", portid2str(path), i); > - continue; > - } > - > - port = find_port(node, &port_buf); > - if (port) > - continue; > - > - port = create_port(node, &port_buf); > - if (!port) > - IBERROR("out of memory"); > - > - /* If switch, set port GUID to node GUID */ > - if (node->type == SWITCH_NODE) > - port->portguid = node->portguid; > - > - handle_port(node, port, path, i, dist); > - } > - } > - } > +static int timeout_ms = 2000; > +static int dumplevel = 0; > > - return 0; > -} > > char * > -node_name(Node *node) > +node_name(ibnd_node_t *node) > { > static char buf[256]; > > - switch(node->type) { > - case SWITCH_NODE: > - sprintf(buf, "\"%s", "S"); > - break; > - case CA_NODE: > + switch(node->info.type) { > + case IBND_CA_NODE: > sprintf(buf, "\"%s", "H"); > break; > - case ROUTER_NODE: > + case IBND_SWITCH_NODE: > + sprintf(buf, "\"%s", "S"); > + break; > + case IBND_ROUTER_NODE: > sprintf(buf, "\"%s", "R"); > break; > default: > sprintf(buf, "\"%s", "?"); > break; > } > - sprintf(buf+2, "-%016" PRIx64 "\"", node->nodeguid); > + sprintf(buf+2, "-%016" PRIx64 "\"", node->info.nodeguid); > > return buf; > } > > void > -list_node(Node *node) > +list_node(ibnd_node_t *node, void *user_data) > { > - char *node_type; > - char *nodename = remap_node_name(node_name_map, node->nodeguid, > + char *nodename = remap_node_name(node_name_map, node->info.nodeguid, > node->nodedesc); > > - switch(node->type) { > - case SWITCH_NODE: > - node_type = "Switch"; > - break; > - case CA_NODE: > - node_type = "Ca"; > - break; > - case ROUTER_NODE: > - node_type = "Router"; > - break; > - default: > - node_type = "???"; > - break; > - } > fprintf(f, "%s\t : 0x%016" PRIx64 " ports %d devid 0x%x vendid 0x%x \"%s\"\n", > - node_type, > - node->nodeguid, node->numports, node->devid, node->vendid, > + ibnd_node_type_str(node), > + node->info.nodeguid, node->info.numports, node->info.devid, > + node->info.vendid, > nodename); > > free(nodename); > } > > void > -out_ids(Node *node, int group, char *chname) > +list_nodes(ibnd_fabric_t *fabric, int list) > +{ > + if (list & LIST_CA_NODE) { > + ibnd_iter_nodes_type(fabric, list_node, IBND_CA_NODE, NULL); > + } > + if (list & LIST_SWITCH_NODE) { > + ibnd_iter_nodes_type(fabric, list_node, IBND_SWITCH_NODE, NULL); > + } > + if (list & LIST_ROUTER_NODE) { > + ibnd_iter_nodes_type(fabric, list_node, IBND_ROUTER_NODE, NULL); > + } > +} > + > +void > +out_ids(ibnd_node_t *node, int group, char *chname) > { > - fprintf(f, "\nvendid=0x%x\ndevid=0x%x\n", node->vendid, node->devid); > - if (node->sysimgguid) > - fprintf(f, "sysimgguid=0x%" PRIx64, node->sysimgguid); > + fprintf(f, "\nvendid=0x%x\ndevid=0x%x\n", node->info.vendid, node->info.devid); > + if (node->info.sysimgguid) > + fprintf(f, "sysimgguid=0x%" PRIx64, node->info.sysimgguid); > if (group > && node->chrecord && node->chrecord->chassisnum) { > fprintf(f, "\t\t# Chassis %d", node->chrecord->chassisnum); > if (chname) > - fprintf(f, " (%s)", chname); > - if (is_xsigo_tca(node->nodeguid) && node->ports->remoteport) > - fprintf(f, " slot %d", node->ports->remoteport->portnum); > + fprintf(f, " (%s)", clean_nodedesc(chname)); > + if (ibnd_is_xsigo_tca(node->info.nodeguid) > + && node->ports[1] > + && node->ports[1]->remoteport) > + fprintf(f, " slot %d", node->ports[1]->remoteport->portnum); > } > fprintf(f, "\n"); > } > > + > uint64_t > -out_chassis(int chassisnum) > +out_chassis(ibnd_fabric_t *fabric, int chassisnum) > { > uint64_t guid; > > fprintf(f, "\nChassis %d", chassisnum); > - guid = get_chassis_guid(chassisnum); > + guid = ibnd_get_chassis_guid(fabric, chassisnum); > if (guid) > fprintf(f, " (guid 0x%" PRIx64 ")", guid); > fprintf(f, "\n"); > @@ -531,54 +157,49 @@ out_chassis(int chassisnum) > } > > void > -out_switch(Node *node, int group, char *chname) > +out_switch(ibnd_node_t *node, int group, char *chname) > { > char *str; > + char str2[256]; > char *nodename = NULL; > > out_ids(node, group, chname); > - fprintf(f, "switchguid=0x%" PRIx64, node->nodeguid); > - fprintf(f, "(%" PRIx64 ")", node->portguid); > - /* Currently, only if Voltaire chassis */ > - if (group > - && node->chrecord && node->chrecord->chassisnum > - && node->vendid == VTR_VENDOR_ID) { > - str = get_chassis_type(node->chrecord->chassistype); > + fprintf(f, "switchguid=0x%" PRIx64, node->info.nodeguid); > + fprintf(f, "(%" PRIx64 ")", node->info.nodeportguid); > + if (group) { > + str = ibnd_get_chassis_type(node); > if (str) > fprintf(f, "%s ", str); > - str = get_chassis_slot(node->chrecord->chassisslot); > + str = ibnd_get_chassis_slot_str(node, str2, 256); > if (str) > - fprintf(f, "%s ", str); > - fprintf(f, "%d Chip %d", node->chrecord->slotnum, node->chrecord->anafanum); > + fprintf(f, "%s", str); > } > > - nodename = remap_node_name(node_name_map, node->nodeguid, > + nodename = remap_node_name(node_name_map, node->info.nodeguid, > node->nodedesc); > > fprintf(f, "\nSwitch\t%d %s\t\t# \"%s\" %s port 0 lid %d lmc %d\n", > - node->numports, node_name(node), > + node->info.numports, node_name(node), > nodename, > - node->smaenhsp0 ? "enhanced" : "base", > + node->sw_info.smaenhsp0 ? "enhanced" : "base", > node->smalid, node->smalmc); > > free(nodename); > } > > void > -out_ca(Node *node, int group, char *chname) > +out_ca(ibnd_node_t *node, int group, char *chname) > { > char *node_type; > char *node_type2; > - char *nodename = remap_node_name(node_name_map, node->nodeguid, > - node->nodedesc); > > out_ids(node, group, chname); > - switch(node->type) { > - case CA_NODE: > + switch(node->info.type) { > + case IBND_CA_NODE: > node_type = "ca"; > node_type2 = "Ca"; > break; > - case ROUTER_NODE: > + case IBND_ROUTER_NODE: > node_type = "rt"; > node_type2 = "Rt"; > break; > @@ -588,37 +209,37 @@ out_ca(Node *node, int group, char *chname) > break; > } > > - fprintf(f, "%sguid=0x%" PRIx64 "\n", node_type, node->nodeguid); > + fprintf(f, "%sguid=0x%" PRIx64 "\n", node_type, node->info.nodeguid); > fprintf(f, "%s\t%d %s\t\t# \"%s\"", > - node_type2, node->numports, node_name(node), > - nodename); > - if (group && is_xsigo_hca(node->nodeguid)) > + node_type2, node->info.numports, node_name(node), > + clean_nodedesc(node->nodedesc)); > + if (group && ibnd_is_xsigo_hca(node->info.nodeguid)) > fprintf(f, " (scp)"); > fprintf(f, "\n"); > - > - free(nodename); > } > > +#define OUT_BUFFER_SIZE 16 > static char * > -out_ext_port(Port *port, int group) > +out_ext_port(ibnd_port_t *port, int group) > { > - char *str = NULL; > + static char mapping[OUT_BUFFER_SIZE]; > > - /* Currently, only if Voltaire chassis */ > - if (group > - && port->node->chrecord && port->node->vendid == VTR_VENDOR_ID) > - str = portmapstring(port); > + if (group && port->ext_portnum != 0) { > + snprintf(mapping, OUT_BUFFER_SIZE, > + "[ext %d]", port->ext_portnum); > + return (mapping); > + } > > - return (str); > + return (NULL); > } > > void > -out_switch_port(Port *port, int group) > +out_switch_port(ibnd_port_t *port, int group) > { > char *ext_port_str = NULL; > char *rem_nodename = NULL; > > - DEBUG("port %p:%d remoteport %p", port, port->portnum, port->remoteport); > + DEBUG("port %p:%d remoteport %p\n", port, port->portnum, port->remoteport); > fprintf(f, "[%d]", port->portnum); > > ext_port_str = out_ext_port(port, group); > @@ -626,7 +247,7 @@ out_switch_port(Port *port, int group) > fprintf(f, "%s", ext_port_str); > > rem_nodename = remap_node_name(node_name_map, > - port->remoteport->node->nodeguid, > + port->remoteport->node->info.nodeguid, > port->remoteport->node->nodedesc); > > ext_port_str = out_ext_port(port->remoteport, group); > @@ -634,17 +255,17 @@ out_switch_port(Port *port, int group) > node_name(port->remoteport->node), > port->remoteport->portnum, > ext_port_str ? ext_port_str : ""); > - if (port->remoteport->node->type != SWITCH_NODE) > - fprintf(f, "(%" PRIx64 ") ", port->remoteport->portguid); > + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) > + fprintf(f, "(%" PRIx64 ") ", port->remoteport->guid); > fprintf(f, "\t\t# \"%s\" lid %d %s%s", > rem_nodename, > - port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid, > - get_linkwidth_str(port->linkwidth), > - get_linkspeed_str(port->linkspeed)); > + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, > + ibnd_linkwidth_str(port->info.link_width_active), > + ibnd_linkspeed_str(port->info.link_speed_active)); > > - if (is_xsigo_tca(port->remoteport->portguid)) > + if (ibnd_is_xsigo_tca(port->remoteport->guid)) > fprintf(f, " slot %d", port->portnum); > - else if (is_xsigo_hca(port->remoteport->portguid)) > + else if (ibnd_is_xsigo_hca(port->remoteport->guid)) > fprintf(f, " (scp)"); > fprintf(f, "\n"); > > @@ -652,68 +273,80 @@ out_switch_port(Port *port, int group) > } > > void > -out_ca_port(Port *port, int group) > +out_ca_port(ibnd_port_t *port, int group) > { > char *str = NULL; > char *rem_nodename = NULL; > > fprintf(f, "[%d]", port->portnum); > - if (port->node->type != SWITCH_NODE) > - fprintf(f, "(%" PRIx64 ") ", port->portguid); > + if (port->node->info.type != IBND_SWITCH_NODE) > + fprintf(f, "(%" PRIx64 ") ", port->guid); > fprintf(f, "\t%s[%d]", > node_name(port->remoteport->node), > port->remoteport->portnum); > str = out_ext_port(port->remoteport, group); > if (str) > fprintf(f, "%s", str); > - if (port->remoteport->node->type != SWITCH_NODE) > - fprintf(f, " (%" PRIx64 ") ", port->remoteport->portguid); > + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) > + fprintf(f, " (%" PRIx64 ") ", port->remoteport->guid); > > rem_nodename = remap_node_name(node_name_map, > - port->remoteport->node->nodeguid, > + port->remoteport->node->info.nodeguid, > port->remoteport->node->nodedesc); > > fprintf(f, "\t\t# lid %d lmc %d \"%s\" lid %d %s%s\n", > - port->lid, port->lmc, rem_nodename, > - port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid, > - get_linkwidth_str(port->linkwidth), > - get_linkspeed_str(port->linkspeed)); > + port->info.lid, port->info.lmc, rem_nodename, > + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, > + ibnd_linkwidth_str(port->info.link_width_active), > + ibnd_linkspeed_str(port->info.link_speed_active)); > > free(rem_nodename); > } > > int > -dump_topology(int listtype, int group) > +dump_topology(int group, ibnd_fabric_t *fabric) > { > - Node *node; > - Port *port; > - int i = 0, dist = 0; > + ibnd_node_t *node; > + ibnd_port_t *port; > + int i = 0, dist = 0, p = 0; > time_t t = time(0); > uint64_t chguid; > char *chname = NULL; > > - if (!listtype) { > - fprintf(f, "#\n# Topology file: generated on %s#\n", ctime(&t)); > - fprintf(f, "# Max of %d hops discovered\n", maxhops_discovered); > - fprintf(f, "# Initiated from node %016" PRIx64 " port %016" PRIx64 "\n", mynode->nodeguid, mynode->portguid); > - } > + fprintf(f, "#\n# Topology file: generated on %s#\n", ctime(&t)); > + fprintf(f, "# Max of %d hops discovered\n", fabric->maxhops_discovered); > + fprintf(f, "# Initiated from node %016" PRIx64 " port %016" PRIx64 "\n", > + fabric->from_node->info.nodeguid, fabric->from_node->info.nodeportguid); > > /* Make pass on switches */ > - if (group && !listtype) { > - ChassisList *ch = NULL; > + if (group) { > + ibnd_chassis_list_t *ch = NULL; > > /* Chassis based switches first */ > - for (ch = chassis; ch; ch = ch->next) { > + for (ch = fabric->chassis; ch; ch = ch->next) { > int n = 0; > > if (!ch->chassisnum) > continue; > - chguid = out_chassis(ch->chassisnum); > - if (chname) > - free(chname); > + chguid = out_chassis(fabric, ch->chassisnum); > + > chname = NULL; > - if (is_xsigo_guid(chguid)) { > - for (node = nodesdist[MAXHOPS]; node; node = node->dnext) { > +/** > + * Hal will this work for Xsigo? > + */ > + if (ibnd_is_xsigo_guid(chguid)) { > + for (node = ch->nodes; node; node = node->chassis_next) { > + if (ibnd_is_xsigo_hca(node->info.nodeguid)) { > + chname = node->nodedesc; > + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); > + } > + } > + > +#if 0 > +/** > + * vs. this? > + */ > + for (node = fabric->nodesdist[MAXHOPS]; node; node = node->dnext) { > if (!node->chrecord || > !node->chrecord->chassisnum) > continue; > @@ -721,209 +354,171 @@ dump_topology(int listtype, int group) > if (node->chrecord->chassisnum != ch->chassisnum) > continue; > > - if (is_xsigo_hca(node->nodeguid)) { > - chname = remap_node_name(node_name_map, > - node->nodeguid, > - node->nodedesc); > - fprintf(f, "Hostname: %s\n", chname); > + if (ibnd_is_xsigo_hca(node->nodeguid)) { > + chname = node->nodedesc; > + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); > } > } > +#endif > } > > fprintf(f, "\n# Spine Nodes"); > - for (n = 1; n <= (SPINES_MAX_NUM+1); n++) { > + for (n = 1; n <= SPINES_MAX_NUM; n++) { > if (ch->spinenode[n]) { > out_switch(ch->spinenode[n], group, chname); > - for (port = ch->spinenode[n]->ports; port; port = port->next, i++) > - if (port->remoteport) > + for (p = 1; p <= ch->spinenode[n]->info.numports; p++) { > + port = ch->spinenode[n]->ports[p]; > + if (port && port->remoteport) > out_switch_port(port, group); > + } > } > } > fprintf(f, "\n# Line Nodes"); > - for (n = 1; n <= (LINES_MAX_NUM+1); n++) { > + for (n = 1; n <= LINES_MAX_NUM; n++) { > if (ch->linenode[n]) { > out_switch(ch->linenode[n], group, chname); > - for (port = ch->linenode[n]->ports; port; port = port->next, i++) > - if (port->remoteport) > + for (p = 1; p <= ch->linenode[n]->info.numports; p++) { > + port = ch->linenode[n]->ports[p]; > + if (port && port->remoteport) > out_switch_port(port, group); > + } > } > } > > fprintf(f, "\n# Chassis Switches"); > - for (dist = 0; dist <= maxhops_discovered; dist++) { > - > - for (node = nodesdist[dist]; node; node = node->dnext) { > - > - /* Non Voltaire chassis */ > - if (node->vendid == VTR_VENDOR_ID) > - continue; > - if (!node->chrecord || > - !node->chrecord->chassisnum) > - continue; > - > - if (node->chrecord->chassisnum != ch->chassisnum) > - continue; > - > + for (node = ch->nodes; node; node = node->chassis_next) { > + if (node->info.type == IBND_SWITCH_NODE) { > out_switch(node, group, chname); > - for (port = node->ports; port; port = port->next, i++) > - if (port->remoteport) > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > out_switch_port(port, group); > - > + } > } > - > } > > fprintf(f, "\n# Chassis CAs"); > - for (node = nodesdist[MAXHOPS]; node; node = node->dnext) { > - if (!node->chrecord || > - !node->chrecord->chassisnum) > - continue; > - > - if (node->chrecord->chassisnum != ch->chassisnum) > - continue; > - > - out_ca(node, group, chname); > - for (port = node->ports; port; port = port->next, i++) > - if (port->remoteport) > - out_ca_port(port, group); > - > + for (node = ch->nodes; node; node = node->chassis_next) { > + if (node->info.type == IBND_CA_NODE) { > + out_ca(node, group, chname); > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > + out_ca_port(port, group); > + } > + } > } > > } > > - } else { > - for (dist = 0; dist <= maxhops_discovered; dist++) { > - > - for (node = nodesdist[dist]; node; node = node->dnext) { > - > - DEBUG("SWITCH: dist %d node %p", dist, node); > - if (!listtype) > - out_switch(node, group, chname); > - else { > - if (listtype & LIST_SWITCH_NODE) > - list_node(node); > - continue; > - } > - > - for (port = node->ports; port; port = port->next, i++) > - if (port->remoteport) > + } else { /* !group */ > + for (node = fabric->switches; node; node = node->type_next) { > + DEBUG("SWITCH: dist %d node %p\n", dist, node); > + out_switch(node, group, chname); > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > out_switch_port(port, group); > - } > + } > } > } > > - if (chname) > - free(chname); > chname = NULL; > - if (group && !listtype) { > - > + if (group) { > fprintf(f, "\nNon-Chassis Nodes\n"); > - > - for (dist = 0; dist <= maxhops_discovered; dist++) { > - > - for (node = nodesdist[dist]; node; node = node->dnext) { > - > - DEBUG("SWITCH: dist %d node %p", dist, node); > + for (node = fabric->switches; node; node = node->type_next) { > + DEBUG("SWITCH: dist %d node %p\n", dist, node); > /* Now, skip chassis based switches */ > if (node->chrecord && > node->chrecord->chassisnum) > continue; > out_switch(node, group, chname); > > - for (port = node->ports; port; port = port->next, i++) > - if (port->remoteport) > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > out_switch_port(port, group); > - } > - > + } > } > > } > > /* Make pass on CAs */ > - for (node = nodesdist[MAXHOPS]; node; node = node->dnext) { > - > - DEBUG("CA: dist %d node %p", dist, node); > - if (!listtype) { > - /* Now, skip chassis based CAs */ > - if (group && node->chrecord && > - node->chrecord->chassisnum) > - continue; > - out_ca(node, group, chname); > - } else { > - if (((listtype & LIST_CA_NODE) && (node->type == CA_NODE)) || > - ((listtype & LIST_ROUTER_NODE) && (node->type == ROUTER_NODE))) > - list_node(node); > + for (node = fabric->ch_adapters; node; node = node->type_next) { > + DEBUG("CA: dist %d node %p\n", dist, node); > + /* Now, skip chassis based CAs */ > + if (group && node->chrecord && > + node->chrecord->chassisnum) > continue; > - } > + out_ca(node, group, chname); > > - for (port = node->ports; port; port = port->next, i++) > - if (port->remoteport) > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > out_ca_port(port, group); > + } > } > > - if (chname) > - free(chname); > + /* make pass on routers */ > + for (node = fabric->routers; node; node = node->type_next) { > + DEBUG("RT: dist %d node %p\n", dist, node); > + /* Now, skip chassis based CAs */ > + if (group && node->chrecord && > + node->chrecord->chassisnum) > + continue; > + out_ca(node, group, chname); > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > + out_ca_port(port, group); > + } > + } > > return i; > } > > -void dump_ports_report () > + > +void dump_ports_report (ibnd_node_t *node, void *user_data) > { > - int b, n = 0, p; > - Node *node; > - Port *port; > - > - // If switch and LID == 0, search of other switch ports with > - // valid LID and assign it to all ports of that switch > - for (b = 0; b <= MAXHOPS; b++) > - for (node = nodesdist[b]; node; node = node->dnext) > - if (node->type == SWITCH_NODE) { > - int swlid = 0; > - for (p = 0, port = node->ports; > - p < node->numports && port && !swlid; > - port = port->next) > - if (port->lid != 0) > - swlid = port->lid; > - for (p = 0, port = node->ports; > - p < node->numports && port; > - port = port->next) > - port->lid = swlid; > - } > + int p = 0; > + ibnd_port_t *port = NULL; > + > + /* for each port */ > + for (p = node->info.numports, port = node->ports[p]; > + p > 0; > + port = node->ports[--p]) { > + if (port == NULL) > + continue; > > - for (b = 0; b <= MAXHOPS; b++) > - for (node = nodesdist[b]; node; node = node->dnext) { > - for (p = 0, port = node->ports; > - p < node->numports && port; > - p++, port = port->next) { > - fprintf(stdout, > - "%2s %5d %2d 0x%016" PRIx64 " %s %s", > - node_type_str2(port->node), port->lid, > - port->portnum, > - port->portguid, > - get_linkwidth_str(port->linkwidth), > - get_linkspeed_str(port->linkspeed)); > - if (port->remoteport) > - fprintf(stdout, > - " - %2s %5d %2d 0x%016" PRIx64 > - " ( '%s' - '%s' )\n", > - node_type_str2(port->remoteport->node), > - port->remoteport->lid, > - port->remoteport->portnum, > - port->remoteport->portguid, > - port->node->nodedesc, > - port->remoteport->node->nodedesc); > - else > - fprintf(stdout, "%36s'%s'\n", "", > - port->node->nodedesc); > - } > - n++; > - } > + fprintf(stdout, > + "%2s %5d %2d 0x%016" PRIx64 " %s %s", > + ibnd_node_type_str_short(node), > + node->info.type == IBND_SWITCH_NODE ? node->smalid : port->info.lid, > + port->portnum, > + port->guid, > + ibnd_linkwidth_str(port->info.link_width_active), > + ibnd_linkspeed_str(port->info.link_speed_active)); > + if (port->remoteport) > + fprintf(stdout, > + " - %2s %5d %2d 0x%016" PRIx64 > + " ( '%s' - '%s' )\n", > + ibnd_node_type_str_short(port->remoteport->node), > + port->remoteport->node->info.type == IBND_SWITCH_NODE ? > + port->remoteport->node->smalid : port->remoteport->info.lid, > + port->remoteport->portnum, > + port->remoteport->guid, > + port->node->nodedesc, > + port->remoteport->node->nodedesc); > + else > + fprintf(stdout, "%36s'%s'\n", "", > + port->node->nodedesc); > + } > } > > void > usage(void) > { > - fprintf(stderr, "Usage: %s [-d(ebug)] -e(rr_show) -v(erbose) -s(how) -l(ist) -g(rouping) -H(ca_list) -S(witch_list) -R(outer_list) -V(ersion) -C ca_name -P ca_port " > + fprintf(stderr, "Usage: %s [-d(ebug)] -s(how) -l(ist) -g(rouping) -H(ca_list) -S(witch_list) -R(outer_list) -V(ersion) -C ca_name -P ca_port " > "-t(imeout) timeout_ms --node-name-map node-name-map] -p(orts) []\n", > argv0); > fprintf(stderr, " --node-name-map specify a node name map file\n"); > @@ -933,20 +528,18 @@ usage(void) > int > main(int argc, char **argv) > { > - int mgmt_classes[2] = {IB_SMI_CLASS, IB_SMI_DIRECT_CLASS}; > - ib_portid_t my_portid = {0}; > - int udebug = 0, list = 0; > + int list = 0; > char *ca = 0; > int ca_port = 0; > int group = 0; > int ports_report = 0; > + ibnd_fabric_t *fabric = NULL; > > static char const str_opts[] = "C:P:t:devslgHSRpVhu"; > static const struct option long_opts[] = { > { "C", 1, 0, 'C'}, > { "P", 1, 0, 'P'}, > { "debug", 0, 0, 'd'}, > - { "err_show", 0, 0, 'e'}, > { "verbose", 0, 0, 'v'}, > { "show", 0, 0, 's'}, > { "list", 0, 0, 'l'}, > @@ -982,23 +575,17 @@ main(int argc, char **argv) > ca_port = strtoul(optarg, 0, 0); > break; > case 'd': > - ibdebug++; > - madrpc_show_errors(1); > - umad_debug(udebug); > - udebug++; > + debug = 1; > + ibnd_debug(1); > break; > case 't': > - timeout = strtoul(optarg, 0, 0); > + timeout_ms = strtoul(optarg, 0, 0); > break; > case 'v': > verbose++; > - dumplevel++; > break; > case 's': > - dumplevel = 1; > - break; > - case 'e': > - madrpc_show_errors(1); > + ibnd_show_progress(1); > break; > case 'l': > list = LIST_CA_NODE | LIST_SWITCH_NODE | LIST_ROUTER_NODE; > @@ -1007,13 +594,13 @@ main(int argc, char **argv) > group = 1; > break; > case 'S': > - list = LIST_SWITCH_NODE; > + list |= LIST_SWITCH_NODE; > break; > case 'H': > - list = LIST_CA_NODE; > + list |= LIST_CA_NODE; > break; > case 'R': > - list = LIST_ROUTER_NODE; > + list |= LIST_ROUTER_NODE; > break; > case 'V': > fprintf(stderr, "%s %s\n", argv0, get_build_version() ); > @@ -1030,22 +617,25 @@ main(int argc, char **argv) > argv += optind; > > if (argc && !(f = fopen(argv[0], "w"))) > - IBERROR("can't open file %s for writing", argv[0]); > + fprintf(stderr, "can't open file %s for writing", argv[0]); > > - madrpc_init(ca, ca_port, mgmt_classes, 2); > node_name_map = open_node_name_map(node_name_map_file); > > - if (discover(&my_portid) < 0) > - IBERROR("discover"); > - > - if (group) > - chassis = group_nodes(); > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { > + fprintf(stderr, "discover failed\n"); > + exit(1); > + } > > if (ports_report) > - dump_ports_report(); > + ibnd_iter_nodes(fabric, > + dump_ports_report, > + NULL); > + else if (list) > + list_nodes(fabric, list); > else > - dump_topology(list, group); > + dump_topology(group, fabric); > > + ibnd_destroy_fabric(fabric); > close_node_name_map(node_name_map); > exit(0); > } > -- > 1.5.4.5 > From sashak at voltaire.com Sun Nov 23 10:58:36 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 23 Nov 2008 20:58:36 +0200 Subject: [ofa-general] Re: [PATCH] Opensm: main exit codes In-Reply-To: <4923678D.3080701@llnl.gov> References: <4923678D.3080701@llnl.gov> Message-ID: <20081123185836.GU21967@sashak.voltaire.com> Hi Tim, On 17:10 Tue 18 Nov , Timothy A. Meier wrote: > > I thought it would be useful to define a set of exit codes for opensm. A quick examination of main.c > showed a few different ways to terminate. How about this patch? Obviously this doesn't catch every > possible exit scenario, but its a start that can be built upon. Personally I read 'exit(0)' faster than 'exit(OSM_EXIT_TYPE_NORMAL)', but maybe it is just me :). Maybe error codes could be formalized, but I'm not sure that it would be beneficial without any practical uses (and clear requirements understanding). Finally we can found us in a middle of the total mess similar to how OSM_LOG_* is used today. Sasha From jeff at splitrockpr.com Sun Nov 23 16:56:08 2008 From: jeff at splitrockpr.com (Jeffrey Scott) Date: Sun, 23 Nov 2008 16:56:08 -0800 Subject: [ofa-general] We want your input for Sonoma 2009 Message-ID: <8F3AA2A8A5174958B80AF670B24BC534@Gaucho> OFA Members- We're putting together the agenda for the 2009 Sonoma Workshop. We'd like your input. Please let us know what topics or content would be of most interest to you. Would you like to hear about vendor support for the OFA software stack? Real-world implementations by end users? Are there specific technologies that you'd like to see covered on the agenda? Would you like to spend more time discussing the future of OFED and WinOF, including possible new features? Would you like to hear presentations from the major OS vendors? Do you want to discuss InfiniBand/Ethernet issues? Other topics that are of particular interest to you? Also, are there things you'd like to see changed from last year's event? Now is your chance to weigh in. Please help the MWG make the Sonoma Workshop as compelling and valuable as possible. One final request, please let us know if you think it would be worthwhile to host a hands-on training event at Sonoma to familiarize end users with the OFA software stack. Thanks in advance for your input. -MWG ----------------------------------- Jeffrey Scott Split Rock Communications 408-884-4017 408-348-3651 Mobile 408-884-3900 Fax www.SplitRockPR.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sashak at voltaire.com Sun Nov 23 22:17:25 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 24 Nov 2008 08:17:25 +0200 Subject: [ofa-general] Re: your mail In-Reply-To: <003701c94420$67840f80$368c2e80$@com> References: <003701c94420$67840f80$368c2e80$@com> Message-ID: <20081124061725.GV21967@sashak.voltaire.com> Hi Bob, On 11:10 Tue 11 Nov , Robert Pearson wrote: > > Here is the sixth patch implementing the mesh analysis algorithm. Could you provide description for this [PATCH 6]? Also note that your mailer breaks long lines and corrupts patches (not in the case of this patch but with anothers where long lines are used). Sasha > > This patch implements > - a table of polynomials for all 2D and 3D regular Cartesian meshes > - a routine to classify each switch based on the table > > Regards, > > Bob Pearson > > Signed-off-by: Bob Pearson > ---- > diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c > index 9254de3..30d09c2 100644 > --- a/opensm/opensm/osm_mesh.c > +++ b/opensm/opensm/osm_mesh.c > @@ -48,6 +48,76 @@ > #include > #include > > +#define MAX_DIMENSION (4) > +#define MAX_DEGREE (10) > + > +/* > + * characteristic polynomials for 2d and 3d regular tori > + * since 4 == 2x2 we choose to take 2x2 > + */ > +struct _mesh_info { > + int dimension; /* dimension of the torus */ > + int size[MAX_DIMENSION]; /* size of the torus */ > + int degree; /* degree of polynomial */ > + int poly[MAX_DEGREE+1]; /* polynomial */ > +} mesh_info[] = { > + {0, {0}, 0, {0}, }, > + > + {2, {2, 2}, 2, {-4, 0, 1}, }, > + {2, {3, 2}, 3, {8, 9, 0, -1}, }, > + //{2, {4, 2}, 3, {16, 12, 0, -1}, }, > + {2, {5, 2}, 3, {24, 17, 0, -1}, }, > + {2, {6, 2}, 3, {32, 24, 0, -1}, }, > + {2, {3, 3}, 4, {-15, -32, -18, 0, 1}, }, > + //{2, {4, 3}, 4, {-28, -48, -21, 0, 1}, }, > + {2, {5, 3}, 4, {-39, -64, -26, 0, 1}, }, > + {2, {6, 3}, 4, {-48, -80, -33, 0, 1}, }, > + //{2, {4, 4}, 4, {-48, -64, -24, 0, 1}, }, > + //{2, {5, 4}, 4, {-60, -80, -29, 0, 1}, }, > + //{2, {6, 4}, 4, {-64, -96, -36, 0, 1}, }, > + {2, {5, 5}, 4, {-63, -96, -34, 0, 1}, }, > + {2, {6, 5}, 4, {-48, -112, -41, 0, 1}, }, > + {2, {6, 6}, 4, {0, -128, -48, 0, 1}, }, > + > + {3, {2, 2, 2}, 3, {16, 12, 0, -1}, }, > + {3, {3, 2, 2}, 4, {-28, -48, -21, 0, 1}, }, > + {3, {4, 2, 2}, 4, {-48, -64, -24, 0, 1}, }, > + {3, {5, 2, 2}, 4, {-60, -80, -29, 0, 1}, }, > + {3, {6, 2, 2}, 4, {-64, -96, -36, 0, 1}, }, > + {3, {3, 3, 2}, 5, {48, 127, 112, 34, 0, -1}, }, > + {3, {4, 3, 2}, 5, {80, 180, 136, 37, 0, -1}, }, > + {3, {5, 3, 2}, 5, {96, 215, 160, 42, 0, -1}, }, > + {3, {6, 3, 2}, 5, {96, 232, 184, 49, 0, -1}, }, > + {3, {4, 4, 2}, 5, {128, 240, 160, 40, 0, -1}, }, > + {3, {5, 4, 2}, 5, {144, 276, 184, 45, 0, -1}, }, > + {3, {6, 4, 2}, 5, {128, 288, 208, 52, 0, -1}, }, > + {3, {5, 5, 2}, 5, {144, 303, 208, 50, 0, -1}, }, > + {3, {6, 5, 2}, 5, {96, 296, 232, 57, 0, -1}, }, > + {3, {6, 6, 2}, 5, {0, 256, 256, 64, 0, -1}, }, > + {3, {3, 3, 3}, 6, {-81, -288, -381, -224, -51, 0, 1}, }, > + {3, {4, 3, 3}, 6, {-132, -416, -487, -256, -54, 0, 1}, }, > + {3, {5, 3, 3}, 6, {-153, -480, -557, -288, -59, 0, 1}, }, > + {3, {6, 3, 3}, 6, {-144, -480, -591, -320, -66, 0, 1}, }, > + {3, {4, 4, 3}, 6, {-208, -576, -600, -288, -57, 0, 1}, }, > + {3, {5, 4, 3}, 6, {-228, -640, -671, -320, -62, 0, 1}, }, > + {3, {6, 4, 3}, 6, {-192, -608, -700, -352, -69, 0, 1}, }, > + {3, {5, 5, 3}, 6, {-225, -672, -733, -352, -67, 0, 1}, }, > + {3, {6, 5, 3}, 6, {-144, -576, -743, -384, -74, 0, 1}, }, > + {3, {6, 6, 3}, 6, {0, -384, -720, -416, -81, 0, 1}, }, > + {3, {4, 4, 4}, 6, {-320, -768, -720, -320, -60, 0, 1}, }, > + {3, {5, 4, 4}, 6, {-336, -832, -792, -352, -65, 0, 1}, }, > + {3, {6, 4, 4}, 6, {-256, -768, -816, -384, -72, 0, 1}, }, > + {3, {5, 5, 4}, 6, {-324, -864, -855, -384, -70, 0, 1}, }, > + {3, {6, 5, 4}, 6, {-192, -736, -860, -416, -77, 0, 1}, }, > + {3, {6, 6, 4}, 6, {0, -512, -832, -448, -84, 0, 1}, }, > + {3, {5, 5, 5}, 6, {-297, -864, -909, -416, -75, 0, 1}, }, > + {3, {6, 5, 5}, 6, {-144, -672, -895, -448, -82, 0, 1}, }, > + {3, {6, 6, 5}, 6, {0, -384, -848, -480, -89, 0, 1}, }, > + {3, {6, 6, 6}, 6, {0, 0, -768, -512, -96, 0, 1}, }, > + > + {-1, {0,}, 0, {0, }, }, > +}; > + > /* > * poly_alloc > * > @@ -489,6 +559,30 @@ static void classify_switch(lash_t *p_lash, int sw) > } > > /* > + * classify_mesh_type > + * > + * try to look up node polynomial in table > + */ > +static void classify_mesh_type(lash_t *p_lash, int sw) > +{ > + int i; > + switch_t *s = p_lash->switches[sw]; > + struct _mesh_info *t; > + > + for (i = 1; (t = &mesh_info[i])->dimension != -1; i++) { > + if (poly_diff(t->degree, t->poly, s)) > + continue; > + > + s->node->type = i; > + s->node->dimension = t->dimension; > + return; > + } > + > + s->node->type = 0; > + return; > +} > + > +/* > * get_local_geometry > * > * analyze the local geometry around each switch > @@ -500,6 +594,7 @@ static void get_local_geometry(lash_t *p_lash) > for (sw = 0; sw < p_lash->num_switches; sw++) { > get_switch_metric(p_lash, sw); > classify_switch(p_lash, sw); > + classify_mesh_type(p_lash, sw); > } > } > > > From sashak at voltaire.com Sun Nov 23 22:25:42 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 24 Nov 2008 08:25:42 +0200 Subject: [ofa-general] Re: [PATCH][8] opensm: measure size and reorder links In-Reply-To: <004501c94424$23551620$69ff4260$@com> References: <004501c94424$23551620$69ff4260$@com> Message-ID: <20081124062542.GW21967@sashak.voltaire.com> Hi Bob, On 11:37 Tue 11 Nov , Robert Pearson wrote: > > Here is the eighth patch implementing the mesh analysis algorithm. All white spaces are mangled in this patch, I cannot apply it. Could you resend in plain format? Thanks. Sasha > > > > This patch implements > > - routine to reorder links and measure the size of the mesh > > > > Regards, > > > > Bob Pearson > > > > Signed-off-by: Bob Pearson > > ---- > > diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c > > index 65afae6..a248522 100644 > > --- a/opensm/opensm/osm_mesh.c > > +++ b/opensm/opensm/osm_mesh.c > > @@ -832,6 +832,183 @@ next_j: > > } > > > > /* > > + * return |a| < |b| > > + */ > > +static inline int ltmag(int a, int b) > > +{ > > + int a1 = (a >= 0)? a : -a; > > + int b1 = (b >= 0)? b : -b; > > + > > + return (a1 < b1) || (a1 == b1 && a > b); > > +} > > + > > +/* > > + * reorder_links > > + * > > + * reorder the links out of a switch in sign/dimension order > > + */ > > +static int reorder_links(lash_t *p_lash, int sw) > > +{ > > + osm_log_t *p_log = &p_lash->p_osm->log; > > + switch_t *s = p_lash->switches[sw]; > > + mesh_node_t *node = s->node; > > + int n = node->num_links; > > + link_t **links; > > + int *axes; > > + int i, j; > > + int c; > > + int next = 0; > > + > > + if (!(links = calloc(n, sizeof(link_t *)))) { > > + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating temp array - > out of memory\n"); > > + return -1; > > + } > > + > > + if (!(axes = calloc(n, sizeof(int)))) { > > + free(links); > > + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating temp array - > out of memory\n"); > > + return -1; > > + } > > + > > + /* > > + * find the links with axes > > + */ > > + for (j = 1; j <= 2*node->dimension; j++) { > > + c = j; > > + if (node->coord[(c-1)/2] > 0) > > + c = opposite(s, c); > > + > > + for (i = 0; i < n; i++) { > > + if (!node->links[i]) > > + continue; > > + if (node->axes[i] == c) { > > + links[next] = node->links[i]; > > + axes[next] = node->axes[i]; > > + node->links[i] = NULL; > > + next++; > > + } > > + } > > + } > > + > > + /* > > + * get the rest > > + */ > > + for (i = 0; i < n; i++) { > > + if (!node->links[i]) > > + continue; > > + > > + links[next] = node->links[i]; > > + axes[next] = node->axes[i]; > > + node->links[i] = NULL; > > + next++; > > + } > > + > > + for (i = 0; i < n; i++) { > > + node->links[i] = links[i]; > > + node->axes[i] = axes[i]; > > + } > > + > > + free(links); > > + free(axes); > > + > > + return 0; > > +} > > + > > +/* > > + * measure geometry > > + */ > > +static int measure_geometry(lash_t *p_lash, int seed) > > +{ > > + int i, j, k; > > + int sw; > > + switch_t *s, *s1; > > + int change; > > + int dimension = p_lash->mesh->dimension; > > + int num_switches = p_lash->num_switches; > > + int assigned_axes = 0, unassigned_axes = 0; > > + int *max, *min; > > + > > + for (sw = 0; sw < num_switches; sw++) { > > + s = p_lash->switches[sw]; > > + > > + s->node->coord = calloc(dimension, sizeof(int)); > > + for (i = 0; i < dimension; i++) > > + s->node->coord[i] = (sw == seed)? 0 : 0x7fffffff; > > + > > + for (i = 0; i < s->node->num_links; i++) > > + if (s->node->axes[i] == 0) > > + unassigned_axes++; > > + else > > + assigned_axes++; > > + } > > + > > + printf("lash: %d/%d unassigned/assigned axes\n", unassigned_axes, > assigned_axes); > > + > > + do { > > + change = 0; > > + > > + for (sw = 0; sw < num_switches; sw++) { > > + s = p_lash->switches[sw]; > > + > > + if (s->node->coord[0] == 0x7fffffff) > > + continue; > > + > > + for (j = 0; j < s->node->num_links; j++) { > > + if (!s->node->axes[j]) > > + continue; > > + > > + s1 = p_lash->switches[s->node->links[j]->switch_id]; > > + > > + for (k = 0; k < dimension; k++) { > > + int coord = s->node->coord[k]; > > + int axis = s->node->axes[j] - 1; > > + > > + if (k == axis/2) > > + coord += (axis & 1)? -1 : +1; > > + > > + if (ltmag(coord, s1->node->coord[k])) { > > + s1->node->coord[k] = coord; > > + change++; > > + } > > + } > > + } > > + } > > + } while (change); > > + > > + for (sw = 0; sw < num_switches; sw++) { > > + if (reorder_links(p_lash, sw)) > > + return -1; > > + } > > + > > + max = calloc(dimension, sizeof(int)); > > + min = calloc(dimension, sizeof(int)); > > + p_lash->mesh->size = calloc(dimension, sizeof(int)); > > + > > + for (i = 0; i < dimension; i++) { > > + max[i] = -0x7fffffff; > > + min[i] = 0x7fffffff; > > + } > > + > > + for (sw = 0; sw < num_switches; sw++) { > > + s = p_lash->switches[sw]; > > + > > + for (i = 0; i < dimension; i++) { > > + if (s->node->coord[i] == 0x7fffffff) > > + continue; > > + if (s->node->coord[i] > max[i]) > > + max[i] = s->node->coord[i]; > > + if (s->node->coord[i] < min[i]) > > + min[i] = s->node->coord[i]; > > + } > > + } > > + > > + for (i = 0; i < dimension; i++) > > + p_lash->mesh->size[i] = max[i] - min[i] + 1; > > + > > + return 0; > > +} > > + > > +/* > > * osm_mesh_cleanup - free per mesh resources > > */ > > void osm_mesh_cleanup(lash_t *p_lash) > > @@ -941,6 +1118,14 @@ int osm_do_mesh_analysis(lash_t *p_lash) > > > > if (s->node->type) { > > make_geometry(p_lash, max_class_type); > > + > > + if (measure_geometry(p_lash, max_class_type)) > > + return -1; > > + > > + printf("lash: found "); > > + for (i = 0; i < mesh->dimension; i++) > > + printf("%s%d", i? "X" : "", mesh->size[i]); > > + printf(" mesh\n"); > > } > > > > OSM_LOG_EXIT(p_log); > > > > > From sashak at voltaire.com Sun Nov 23 23:01:00 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 24 Nov 2008 09:01:00 +0200 Subject: [ofa-general] [PATCH] opensm/ftree: save lft_buf memory allocations Message-ID: <20081124070100.GZ21967@sashak.voltaire.com> Use OpenSM switch lft_buf directly and save memory (48k per switch) in local structures. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_ucast_ftree.c | 53 +++++++------------------------------- 1 files changed, 10 insertions(+), 43 deletions(-) diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c index fb26247..875954b 100644 --- a/opensm/opensm/osm_ucast_ftree.c +++ b/opensm/opensm/osm_ucast_ftree.c @@ -48,7 +48,6 @@ #include #include #include -#include #include #include #include @@ -119,15 +118,6 @@ typedef struct { /*************************************************** ** - ** ftree_fwd_tbl_t definition - ** - ***************************************************/ - -typedef uint8_t *ftree_fwd_tbl_t; -#define FTREE_FWD_TBL_LEN (IB_LID_UCAST_END_HO + 1) - -/*************************************************** - ** ** ftree_port_t definition ** ***************************************************/ @@ -184,7 +174,6 @@ typedef struct ftree_sw_t_ { uint8_t down_port_groups_num; ftree_port_group_t **up_port_groups; uint8_t up_port_groups_num; - ftree_fwd_tbl_t lft_buf; boolean_t is_leaf; int down_port_groups_idx; } ftree_sw_t; @@ -222,7 +211,6 @@ typedef struct ftree_fabric_t_ { ftree_sw_t **leaf_switches; uint32_t leaf_switches_num; uint16_t max_cn_per_leaf; - cl_pool_t sw_fwd_tbl_pool; uint16_t lft_max_lid_ho; boolean_t fabric_built; } ftree_fabric_t; @@ -579,9 +567,7 @@ static ftree_sw_t *__osm_ftree_sw_create(IN ftree_fabric_t * p_ftree, p_sw->up_port_groups_num = 0; /* initialize lft buffer */ - p_sw->lft_buf = - (ftree_fwd_tbl_t) cl_pool_get(&p_ftree->sw_fwd_tbl_pool); - memset(p_sw->lft_buf, OSM_NO_PATH, FTREE_FWD_TBL_LEN); + memset(p_osm_sw->new_lft, OSM_NO_PATH, IB_LID_UCAST_END_HO + 1); p_sw->down_port_groups_idx = -1; @@ -607,10 +593,6 @@ static void __osm_ftree_sw_destroy(IN ftree_fabric_t * p_ftree, if (p_sw->up_port_groups) free(p_sw->up_port_groups); - /* return switch fwd_tbl to pool */ - if (p_sw->lft_buf) - cl_pool_put(&p_ftree->sw_fwd_tbl_pool, (void *)p_sw->lft_buf); - free(p_sw); } /* __osm_ftree_sw_destroy() */ @@ -892,7 +874,6 @@ __osm_ftree_hca_add_port(IN ftree_hca_t * p_hca, static ftree_fabric_t *__osm_ftree_fabric_create() { - cl_status_t status; ftree_fabric_t *p_ftree = (ftree_fabric_t *) malloc(sizeof(ftree_fabric_t)); if (p_ftree == NULL) @@ -907,16 +888,6 @@ static ftree_fabric_t *__osm_ftree_fabric_create() cl_qlist_init(&p_ftree->root_guid_list); - status = cl_pool_init(&p_ftree->sw_fwd_tbl_pool, 8, /* min pool size */ - 0, /* max pool size - unlimited */ - 8, /* grow size */ - FTREE_FWD_TBL_LEN, /* object_size */ - NULL, /* object initializer */ - NULL, /* object destructor */ - NULL); /* context */ - if (status != CL_SUCCESS) - return NULL; - return p_ftree; } @@ -1008,7 +979,6 @@ static void __osm_ftree_fabric_destroy(ftree_fabric_t * p_ftree) if (!p_ftree) return; __osm_ftree_fabric_clear(p_ftree); - cl_pool_destroy(&p_ftree->sw_fwd_tbl_pool); free(p_ftree); } @@ -1924,9 +1894,6 @@ static void __osm_ftree_set_sw_fwd_table(IN cl_map_item_t * const p_map_item, ftree_fabric_t *p_ftree = (ftree_fabric_t *) context; p_sw->p_osm_sw->max_lid_ho = p_ftree->lft_max_lid_ho; - - memcpy(p_sw->p_osm_sw->new_lft, p_sw->lft_buf, - IB_LID_UCAST_END_HO + 1); osm_ucast_mgr_set_fwd_table(&p_ftree->p_osm->sm.ucast_mgr, p_sw->p_osm_sw); } @@ -2065,13 +2032,13 @@ __osm_ftree_fabric_route_upgoing_by_going_down(IN ftree_fabric_t * p_ftree, /* second case: skip the port group if the remote (lower) switch has been already configured for this target LID */ if (is_real_lid && !is_main_path && - p_remote_sw->lft_buf[cl_ntoh16(target_lid)] != OSM_NO_PATH) + p_remote_sw->p_osm_sw->new_lft[cl_ntoh16(target_lid)] != OSM_NO_PATH) continue; /* setting fwd tbl port only if this is real LID */ if (is_real_lid) { - p_remote_sw->lft_buf[cl_ntoh16(target_lid)] = - p_min_port->remote_port_num; + p_remote_sw->p_osm_sw->new_lft[cl_ntoh16(target_lid)] = + p_min_port->remote_port_num; OSM_LOG(&p_ftree->p_osm->log, OSM_LOG_DEBUG, "Switch %s: set path to CA LID %u through port %u\n", __osm_ftree_tuple_to_str(p_remote_sw->tuple), @@ -2249,7 +2216,7 @@ __osm_ftree_fabric_route_downgoing_by_going_up(IN ftree_fabric_t * p_ftree, p_min_group->counter_down++; p_min_port->counter_down++; if (is_real_lid) { - p_remote_sw->lft_buf[cl_ntoh16(target_lid)] = + p_remote_sw->p_osm_sw->new_lft[cl_ntoh16(target_lid)] = p_min_port->remote_port_num; OSM_LOG(&p_ftree->p_osm->log, OSM_LOG_DEBUG, "Switch %s: set path to CA LID %u through port %u\n", @@ -2325,7 +2292,7 @@ __osm_ftree_fabric_route_downgoing_by_going_up(IN ftree_fabric_t * p_ftree, p_remote_sw = p_group->remote_hca_or_sw.p_sw; /* skip if target lid has been already set on remote switch fwd tbl */ - if (p_remote_sw->lft_buf[cl_ntoh16(target_lid)] != OSM_NO_PATH) + if (p_remote_sw->p_osm_sw->new_lft[cl_ntoh16(target_lid)] != OSM_NO_PATH) continue; if (p_sw->is_leaf) { @@ -2343,7 +2310,7 @@ __osm_ftree_fabric_route_downgoing_by_going_up(IN ftree_fabric_t * p_ftree, trying to balance these routes - always pick port 0. */ cl_ptr_vector_at(&p_group->ports, 0, (void *)&p_port); - p_remote_sw->lft_buf[cl_ntoh16(target_lid)] = + p_remote_sw->p_osm_sw->new_lft[cl_ntoh16(target_lid)] = p_port->remote_port_num; /* On the remote switch that is pointed by the p_group, @@ -2435,7 +2402,7 @@ static void __osm_ftree_fabric_route_to_cns(IN ftree_fabric_t * p_ftree) /* set local LFT(LID) to the port that is connected to HCA */ cl_ptr_vector_at(&p_leaf_port_group->ports, 0, (void *)&p_port); - p_sw->lft_buf[cl_ntoh16(hca_lid)] = p_port->port_num; + p_sw->p_osm_sw->new_lft[cl_ntoh16(hca_lid)] = p_port->port_num; OSM_LOG(&p_ftree->p_osm->log, OSM_LOG_DEBUG, "Switch %s: set path to CN LID %u through port %u\n", @@ -2544,7 +2511,7 @@ static void __osm_ftree_fabric_route_to_non_cns(IN ftree_fabric_t * p_ftree) cl_ptr_vector_at(&p_hca_port_group->ports, 0, (void *)&p_hca_port); port_num_on_switch = p_hca_port->remote_port_num; - p_sw->lft_buf[cl_ntoh16(hca_lid)] = port_num_on_switch; + p_sw->p_osm_sw->new_lft[cl_ntoh16(hca_lid)] = port_num_on_switch; OSM_LOG(&p_ftree->p_osm->log, OSM_LOG_DEBUG, "Switch %s: set path to non-CN HCA LID %u through port %u\n", @@ -2600,7 +2567,7 @@ static void __osm_ftree_fabric_route_to_switches(IN ftree_fabric_t * p_ftree) p_next_sw = (ftree_sw_t *) cl_qmap_next(&p_sw->map_item); /* set local LFT(LID) to 0 (route to itself) */ - p_sw->lft_buf[cl_ntoh16(p_sw->base_lid)] = 0; + p_sw->p_osm_sw->new_lft[cl_ntoh16(p_sw->base_lid)] = 0; OSM_LOG(&p_ftree->p_osm->log, OSM_LOG_DEBUG, "Switch %s (LID %u): routing switch-to-switch paths\n", -- 1.6.0.4.766.g6fc4a From dorfman.eli at gmail.com Mon Nov 24 00:01:27 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Mon, 24 Nov 2008 10:01:27 +0200 Subject: [ofa-general] ***SPAM*** Re: [PATCH] opensm/osm_trap_rcv.c disable the port with the least hop count In-Reply-To: <20081121094514.GC6965@sashak.voltaire.com> References: <49251926.9090509@gmail.com> <20081121094514.GC6965@sashak.voltaire.com> Message-ID: <694d48600811240001g1673d3aeo26ff7bc3bce0a6e8@mail.gmail.com> On Fri, Nov 21, 2008 at 11:45 AM, Sasha Khapyorsky wrote: > Hi Eli, > > On 10:00 Thu 20 Nov , Eli Dorfman wrote: >> disable the port with the least hop count. >> this will address the case of inter switch link where the >> most remote port (from opensm) is sending traps. >> in that case we would like to disable the nearest switch port (from opensm). >> >> Signed-off-by: Eli Dorfman > > I applied the patch. However have some question. > >> --- >> opensm/opensm/osm_trap_rcv.c | 4 ++-- >> 1 files changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/opensm/opensm/osm_trap_rcv.c b/opensm/opensm/osm_trap_rcv.c >> index 07c5183..d1dfbd4 100644 >> --- a/opensm/opensm/osm_trap_rcv.c >> +++ b/opensm/opensm/osm_trap_rcv.c >> @@ -239,8 +239,8 @@ static int disable_port(osm_sm_t *sm, osm_physp_t *p) >> ib_port_info_t *pi = (ib_port_info_t *)payload; >> int ret; >> >> - /* in case of endport - disable switch's peer port */ >> - if (osm_node_get_type(p->p_node) != IB_NODE_TYPE_SWITCH) >> + /* select the nearest port to master opensm */ >> + if (p->dr_path.hop_count > p->p_remote_physp->dr_path.hop_count) >> p = p->p_remote_physp; > > Is it possible that this noisy port is switch external port, "the > nearest" to OpenSM node and doesn't have remote port (due to unstable > link)? We saw such cases in practice and it is handled by OpenSM in a > light sweep (see __osm_state_mgr_get_remote_port_info() calls in > __osm_state_mgr_light_sweep_start() function). > > With endports check only is is impossible IMO, but with I don't see that > it cannot happen with switch ports. Right? > > If so then maybe the code should look like: > > if (p->p_remote_physp && > p->dr_path.hop_count > p->p_remote_physp->dr_path.hop_count) > p = p->p_remote_physp; > you are absolutely right. please add the above fix. Thanks, Eli > > Sasha > >> >> /* If trap 131, might want to disable peer port if available */ >> -- >> 1.5.5 >> > From sashak at voltaire.com Mon Nov 24 00:20:55 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 24 Nov 2008 10:20:55 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_trap_rcv.c disable the port with the least hop count In-Reply-To: <694d48600811240001g1673d3aeo26ff7bc3bce0a6e8@mail.gmail.com> References: <49251926.9090509@gmail.com> <20081121094514.GC6965@sashak.voltaire.com> <694d48600811240001g1673d3aeo26ff7bc3bce0a6e8@mail.gmail.com> Message-ID: <20081124082055.GE21967@sashak.voltaire.com> On 10:01 Mon 24 Nov , Eli Dorfman wrote: > > > > If so then maybe the code should look like: > > > > if (p->p_remote_physp && > > p->dr_path.hop_count > p->p_remote_physp->dr_path.hop_count) > > p = p->p_remote_physp; > > > > > you are absolutely right. please add the above fix. Applied. Sasha From kliteyn at dev.mellanox.co.il Mon Nov 24 01:06:54 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 24 Nov 2008 11:06:54 +0200 Subject: [ofa-general] Re: [PATCH] opensm/ftree: save lft_buf memory allocations In-Reply-To: <20081124070100.GZ21967@sashak.voltaire.com> References: <20081124070100.GZ21967@sashak.voltaire.com> Message-ID: <492A6EAE.7020909@dev.mellanox.co.il> Hi Sasha, Sasha Khapyorsky wrote: > Use OpenSM switch lft_buf directly and save memory (48k per switch) in > local structures. Looks good, thanks. -- Yevgeny > Signed-off-by: Sasha Khapyorsky > --- > opensm/opensm/osm_ucast_ftree.c | 53 +++++++------------------------------- > 1 files changed, 10 insertions(+), 43 deletions(-) > > diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c > index fb26247..875954b 100644 > --- a/opensm/opensm/osm_ucast_ftree.c > +++ b/opensm/opensm/osm_ucast_ftree.c > @@ -48,7 +48,6 @@ > #include > #include > #include > -#include > #include > #include > #include > @@ -119,15 +118,6 @@ typedef struct { > > /*************************************************** > ** > - ** ftree_fwd_tbl_t definition > - ** > - ***************************************************/ > - > -typedef uint8_t *ftree_fwd_tbl_t; > -#define FTREE_FWD_TBL_LEN (IB_LID_UCAST_END_HO + 1) > - > -/*************************************************** > - ** > ** ftree_port_t definition > ** > ***************************************************/ > @@ -184,7 +174,6 @@ typedef struct ftree_sw_t_ { > uint8_t down_port_groups_num; > ftree_port_group_t **up_port_groups; > uint8_t up_port_groups_num; > - ftree_fwd_tbl_t lft_buf; > boolean_t is_leaf; > int down_port_groups_idx; > } ftree_sw_t; > @@ -222,7 +211,6 @@ typedef struct ftree_fabric_t_ { > ftree_sw_t **leaf_switches; > uint32_t leaf_switches_num; > uint16_t max_cn_per_leaf; > - cl_pool_t sw_fwd_tbl_pool; > uint16_t lft_max_lid_ho; > boolean_t fabric_built; > } ftree_fabric_t; > @@ -579,9 +567,7 @@ static ftree_sw_t *__osm_ftree_sw_create(IN ftree_fabric_t * p_ftree, > p_sw->up_port_groups_num = 0; > > /* initialize lft buffer */ > - p_sw->lft_buf = > - (ftree_fwd_tbl_t) cl_pool_get(&p_ftree->sw_fwd_tbl_pool); > - memset(p_sw->lft_buf, OSM_NO_PATH, FTREE_FWD_TBL_LEN); > + memset(p_osm_sw->new_lft, OSM_NO_PATH, IB_LID_UCAST_END_HO + 1); > > p_sw->down_port_groups_idx = -1; > > @@ -607,10 +593,6 @@ static void __osm_ftree_sw_destroy(IN ftree_fabric_t * p_ftree, > if (p_sw->up_port_groups) > free(p_sw->up_port_groups); > > - /* return switch fwd_tbl to pool */ > - if (p_sw->lft_buf) > - cl_pool_put(&p_ftree->sw_fwd_tbl_pool, (void *)p_sw->lft_buf); > - > free(p_sw); > } /* __osm_ftree_sw_destroy() */ > > @@ -892,7 +874,6 @@ __osm_ftree_hca_add_port(IN ftree_hca_t * p_hca, > > static ftree_fabric_t *__osm_ftree_fabric_create() > { > - cl_status_t status; > ftree_fabric_t *p_ftree = > (ftree_fabric_t *) malloc(sizeof(ftree_fabric_t)); > if (p_ftree == NULL) > @@ -907,16 +888,6 @@ static ftree_fabric_t *__osm_ftree_fabric_create() > > cl_qlist_init(&p_ftree->root_guid_list); > > - status = cl_pool_init(&p_ftree->sw_fwd_tbl_pool, 8, /* min pool size */ > - 0, /* max pool size - unlimited */ > - 8, /* grow size */ > - FTREE_FWD_TBL_LEN, /* object_size */ > - NULL, /* object initializer */ > - NULL, /* object destructor */ > - NULL); /* context */ > - if (status != CL_SUCCESS) > - return NULL; > - > return p_ftree; > } > > @@ -1008,7 +979,6 @@ static void __osm_ftree_fabric_destroy(ftree_fabric_t * p_ftree) > if (!p_ftree) > return; > __osm_ftree_fabric_clear(p_ftree); > - cl_pool_destroy(&p_ftree->sw_fwd_tbl_pool); > free(p_ftree); > } > > @@ -1924,9 +1894,6 @@ static void __osm_ftree_set_sw_fwd_table(IN cl_map_item_t * const p_map_item, > ftree_fabric_t *p_ftree = (ftree_fabric_t *) context; > > p_sw->p_osm_sw->max_lid_ho = p_ftree->lft_max_lid_ho; > - > - memcpy(p_sw->p_osm_sw->new_lft, p_sw->lft_buf, > - IB_LID_UCAST_END_HO + 1); > osm_ucast_mgr_set_fwd_table(&p_ftree->p_osm->sm.ucast_mgr, > p_sw->p_osm_sw); > } > @@ -2065,13 +2032,13 @@ __osm_ftree_fabric_route_upgoing_by_going_down(IN ftree_fabric_t * p_ftree, > /* second case: skip the port group if the remote (lower) > switch has been already configured for this target LID */ > if (is_real_lid && !is_main_path && > - p_remote_sw->lft_buf[cl_ntoh16(target_lid)] != OSM_NO_PATH) > + p_remote_sw->p_osm_sw->new_lft[cl_ntoh16(target_lid)] != OSM_NO_PATH) > continue; > > /* setting fwd tbl port only if this is real LID */ > if (is_real_lid) { > - p_remote_sw->lft_buf[cl_ntoh16(target_lid)] = > - p_min_port->remote_port_num; > + p_remote_sw->p_osm_sw->new_lft[cl_ntoh16(target_lid)] = > + p_min_port->remote_port_num; > OSM_LOG(&p_ftree->p_osm->log, OSM_LOG_DEBUG, > "Switch %s: set path to CA LID %u through port %u\n", > __osm_ftree_tuple_to_str(p_remote_sw->tuple), > @@ -2249,7 +2216,7 @@ __osm_ftree_fabric_route_downgoing_by_going_up(IN ftree_fabric_t * p_ftree, > p_min_group->counter_down++; > p_min_port->counter_down++; > if (is_real_lid) { > - p_remote_sw->lft_buf[cl_ntoh16(target_lid)] = > + p_remote_sw->p_osm_sw->new_lft[cl_ntoh16(target_lid)] = > p_min_port->remote_port_num; > OSM_LOG(&p_ftree->p_osm->log, OSM_LOG_DEBUG, > "Switch %s: set path to CA LID %u through port %u\n", > @@ -2325,7 +2292,7 @@ __osm_ftree_fabric_route_downgoing_by_going_up(IN ftree_fabric_t * p_ftree, > p_remote_sw = p_group->remote_hca_or_sw.p_sw; > > /* skip if target lid has been already set on remote switch fwd tbl */ > - if (p_remote_sw->lft_buf[cl_ntoh16(target_lid)] != OSM_NO_PATH) > + if (p_remote_sw->p_osm_sw->new_lft[cl_ntoh16(target_lid)] != OSM_NO_PATH) > continue; > > if (p_sw->is_leaf) { > @@ -2343,7 +2310,7 @@ __osm_ftree_fabric_route_downgoing_by_going_up(IN ftree_fabric_t * p_ftree, > trying to balance these routes - always pick port 0. */ > > cl_ptr_vector_at(&p_group->ports, 0, (void *)&p_port); > - p_remote_sw->lft_buf[cl_ntoh16(target_lid)] = > + p_remote_sw->p_osm_sw->new_lft[cl_ntoh16(target_lid)] = > p_port->remote_port_num; > > /* On the remote switch that is pointed by the p_group, > @@ -2435,7 +2402,7 @@ static void __osm_ftree_fabric_route_to_cns(IN ftree_fabric_t * p_ftree) > /* set local LFT(LID) to the port that is connected to HCA */ > cl_ptr_vector_at(&p_leaf_port_group->ports, 0, > (void *)&p_port); > - p_sw->lft_buf[cl_ntoh16(hca_lid)] = p_port->port_num; > + p_sw->p_osm_sw->new_lft[cl_ntoh16(hca_lid)] = p_port->port_num; > > OSM_LOG(&p_ftree->p_osm->log, OSM_LOG_DEBUG, > "Switch %s: set path to CN LID %u through port %u\n", > @@ -2544,7 +2511,7 @@ static void __osm_ftree_fabric_route_to_non_cns(IN ftree_fabric_t * p_ftree) > cl_ptr_vector_at(&p_hca_port_group->ports, 0, > (void *)&p_hca_port); > port_num_on_switch = p_hca_port->remote_port_num; > - p_sw->lft_buf[cl_ntoh16(hca_lid)] = port_num_on_switch; > + p_sw->p_osm_sw->new_lft[cl_ntoh16(hca_lid)] = port_num_on_switch; > > OSM_LOG(&p_ftree->p_osm->log, OSM_LOG_DEBUG, > "Switch %s: set path to non-CN HCA LID %u through port %u\n", > @@ -2600,7 +2567,7 @@ static void __osm_ftree_fabric_route_to_switches(IN ftree_fabric_t * p_ftree) > p_next_sw = (ftree_sw_t *) cl_qmap_next(&p_sw->map_item); > > /* set local LFT(LID) to 0 (route to itself) */ > - p_sw->lft_buf[cl_ntoh16(p_sw->base_lid)] = 0; > + p_sw->p_osm_sw->new_lft[cl_ntoh16(p_sw->base_lid)] = 0; > > OSM_LOG(&p_ftree->p_osm->log, OSM_LOG_DEBUG, > "Switch %s (LID %u): routing switch-to-switch paths\n", From sashak at voltaire.com Mon Nov 24 01:27:40 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 24 Nov 2008 11:27:40 +0200 Subject: [ofa-general] Re: [PATCH] opensm/ftree: save lft_buf memory allocations In-Reply-To: <492A6EAE.7020909@dev.mellanox.co.il> References: <20081124070100.GZ21967@sashak.voltaire.com> <492A6EAE.7020909@dev.mellanox.co.il> Message-ID: <20081124092740.GG21967@sashak.voltaire.com> Hi Yevgeny, On 11:06 Mon 24 Nov , Yevgeny Kliteynik wrote: > > Sasha Khapyorsky wrote: >> Use OpenSM switch lft_buf directly and save memory (48k per switch) in >> local structures. > > Looks good, thanks. The only potential downside I could see here - is that this will require some handling if we will remove new_lft field (after #1406 and other debugging). Sasha From kliteyn at dev.mellanox.co.il Mon Nov 24 01:57:34 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 24 Nov 2008 11:57:34 +0200 Subject: [ofa-general] Re: [PATCH] opensm/ftree: save lft_buf memory allocations In-Reply-To: <20081124092740.GG21967@sashak.voltaire.com> References: <20081124070100.GZ21967@sashak.voltaire.com> <492A6EAE.7020909@dev.mellanox.co.il> <20081124092740.GG21967@sashak.voltaire.com> Message-ID: <492A7A8E.1050109@dev.mellanox.co.il> Hi Sasha, Sasha Khapyorsky wrote: > Hi Yevgeny, > > On 11:06 Mon 24 Nov , Yevgeny Kliteynik wrote: >> Sasha Khapyorsky wrote: >>> Use OpenSM switch lft_buf directly and save memory (48k per switch) in >>> local structures. >> Looks good, thanks. > > The only potential downside I could see here - is that this will require > some handling if we will remove new_lft field (after #1406 and other > debugging). Right, I thought about it too, but I decided that removing new_buf might be not so good. Right now we have two types of routing engines: engines that are basing their decisions on the min_hop tables, and engines that make their own decisions and creating min_hop tables as a by-product, just for multicast routing. The example of latter at this point is only fat-tree routing, but I'm sure that more will follow. New routing for 3D mesh/torus comes to mind (not necessarily the one that was already posted to the list). For this type of routing you will need new_buf anyway, so instead of having it inside of every routing (as it was with fat-tree before the unicast cache and lft simplification), we'd better have one in osm_switch_t. -- Yevgeny > Sasha > From vlad at lists.openfabrics.org Mon Nov 24 03:23:04 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Mon, 24 Nov 2008 03:23:04 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081124-0200 daily build status Message-ID: <20081124112304.A9909E608DC@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From amirv at mellanox.co.il Mon Nov 24 03:53:36 2008 From: amirv at mellanox.co.il (Amir Vadai) Date: Mon, 24 Nov 2008 13:53:36 +0200 Subject: [ofa-general] Re: [ewg] OFED 1.4 - delay the GA to Dec 4 In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EAD01006654@mtlexch01.mtl.com> References: <5D49E7A8952DC44FB38C38FA0D758EAD01006654@mtlexch01.mtl.com> Message-ID: <492A95C0.9050500@mellanox.co.il> Both bugs on SDP are fixed (BUG1348, BUG1349) - Currently there are no major bugs relevant to this release. - Amir Tziporet Koren wrote: > Hi All, > > I have Just reviewed bugs status with Vlad. > > We have 11 major and critical bugs, and we will not be able to fix all > of them in one week > > Thus - I delay the GA release to Dec 4 (since we have thanks-giving > holiday next week) > > I also suggest we will create RC6 by end of next week - since most of > the bugs are assigned to people in Israel and we do not have vacation > next week > > We will review the release status at the EWG meeting next week. > > Bug owners - please reply with status update and also update bug report > > Bugs list: > > 1370 blo vlad at mellanox.co.il Ping over IPoIB I/F > fails after ifconfig down and up > > 1242 cri yannick.cote at qlogic.com kernel panic while running > mpi2007 against ofed1.4 -- ib_... > > 1198 cri yosefe at voltaire.com hang during ipoib > create_child/ifdown > > 1348 maj amirv at mellanox.co.il Sdp sockets doesnt closed > after programs end > > 1349 maj amirv at mellanox.co.il Kernel panic on sdp > > 1289 maj jackm at mellanox.co.il Ib and ipoib doesnt respond > while running multiple tests ... > > 1389 maj jackm at mellanox.co.il poll_cq sometimes fail in a > multithreaded test > > 1401 maj sashak at voltaire.com segmentation fault when > running opensm -Q > > 1377 maj vu at mellanox.com Deadlock occured during HA test > > 1380 maj vu at mellanox.com Cannot unload ib_srpt module > on SRP target > > 1395 maj vu at mellanox.com kernel panic during SRP HA test > > > Tziporet & Vlad > > ------------------------------------------------------------------------ > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg From wangchen at cn.fujitsu.com Mon Nov 24 01:31:59 2008 From: wangchen at cn.fujitsu.com (Wang Chen) Date: Mon, 24 Nov 2008 17:31:59 +0800 Subject: [ofa-general] [PATCH next]infiniband: Kill directly reference of netdev->priv Message-ID: <492A748F.9040308@cn.fujitsu.com> This use of netdev->priv is wrong. The right way is: alloc_netdev() with no memory for private data. make netdev->ml_priv to point to c2_dev. I am doing this kind of work for net-next tree. So I send this patch to Dave, although infiniband's maintainer is not him. Signed-off-by: Wang Chen --- drivers/infiniband/hw/amso1100/c2_provider.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c index 69580e2..5119d65 100644 --- a/drivers/infiniband/hw/amso1100/c2_provider.c +++ b/drivers/infiniband/hw/amso1100/c2_provider.c @@ -653,7 +653,7 @@ static int c2_service_destroy(struct iw_cm_id *cm_id) static int c2_pseudo_up(struct net_device *netdev) { struct in_device *ind; - struct c2_dev *c2dev = netdev->priv; + struct c2_dev *c2dev = netdev->ml_priv; ind = in_dev_get(netdev); if (!ind) @@ -678,7 +678,7 @@ static int c2_pseudo_up(struct net_device *netdev) static int c2_pseudo_down(struct net_device *netdev) { struct in_device *ind; - struct c2_dev *c2dev = netdev->priv; + struct c2_dev *c2dev = netdev->ml_priv; ind = in_dev_get(netdev); if (!ind) @@ -746,14 +746,14 @@ static struct net_device *c2_pseudo_netdev_init(struct c2_dev *c2dev) /* change ethxxx to iwxxx */ strcpy(name, "iw"); strcat(name, &c2dev->netdev->name[3]); - netdev = alloc_netdev(sizeof(*netdev), name, setup); + netdev = alloc_netdev(0, name, setup); if (!netdev) { printk(KERN_ERR PFX "%s - etherdev alloc failed", __func__); return NULL; } - netdev->priv = c2dev; + netdev->ml_priv = c2dev; SET_NETDEV_DEV(netdev, &c2dev->pcidev->dev); -- 1.5.3.4 From vlad at mellanox.co.il Mon Nov 24 07:37:00 2008 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 24 Nov 2008 17:37:00 +0200 Subject: [ofa-general] [PATCH] IPoIB: Prevent address handles leak. Message-ID: <20081124153700.GA27848@mellanox.co.il> In case of removing ib_ipoib module ipoib_ib_dev_stop() function will be called and all address handles (ah) in dead_ahs list will be reaped. But some ah will be added to the dead list after ipoib_ib_dev_stop done by ipoib_mcast_free. These ahs will not be freed. The solution here is to wait till multicast_list will be empty. So, all ahs will be added to dead_ahs list. Signed-off-by: Vladimir Sokolovsky --- Roland, There may be some extremely slight window for leaking address handles still, since the multicast list is emptied in ipoib_mcast_dev_flush() before it calls ipoib_mcast_free() (which adds address handles to the dead list). However, this seems to be the best compromise that I can see without a lot of nasty (and possibly buggy) changes. drivers/infiniband/ulp/ipoib/ipoib_ib.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 66cafa2..6cc0c59 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -863,7 +863,7 @@ timeout: begin = jiffies; - while (!list_empty(&priv->dead_ahs)) { + while (!list_empty(&priv->dead_ahs) || !list_empty(&priv->multicast_list)) { __ipoib_reap_ah(dev); if (time_after(jiffies, begin + HZ)) { -- 1.5.6.3 From tziporet at mellanox.co.il Mon Nov 24 07:59:02 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 24 Nov 2008 17:59:02 +0200 Subject: [ofa-general] OFED 1.4 meeting agenda for today - Nov 24 Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD010AC6CC@mtlexch01.mtl.com> This is the agenda for the OFED meeting today 1. Bugs status review: 1370 blo vlad at mellanox.co.il Ping over IPoIB I/F fails after ifconfig down and up - there is a fix but its not integrated 1242 cri yannick.cote at qlogic.com kernel panic while running mpi2007 against ofed1.4 -- ib_... 1410 cri vlad at mellanox.co.il Memory leak (address handler not reped) in IPoIB 1289 maj jackm at mellanox.co.il Ib and ipoib doesn't respond while running multiple tests ... 1407 maj monis at voltaire.com Active-Backup failure when disabling an active slave inte... 1377 maj vu at mellanox.com Deadlock occurred during HA test 1380 maj vu at mellanox.com Cannot unload ib_srpt module on SRP target 1395 maj vu at mellanox.com kernel panic during SRP HA test 1384 maj eli at mellanox.co.il netperf latency small messages increase 5% 1385 maj eli at mellanox.co.il ofed 1.4 - netperf udp BW small messages decrease ~8% 1386 maj eli at mellanox.co.il ofed 1.4 - iperf tcp connected mode BW large messages dec... 2. Decide on release date: Not clear if we can make it for Dec 4, since we have many open bugs 3. Decide on next meetings dates: I suggest Dec 1, Dec 8 and Dec 15 (only if the release is not done) 4. Open discussion Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon Nov 24 09:00:59 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 24 Nov 2008 09:00:59 -0800 Subject: [ofa-general] Re: RDMA CM and IPv6 support In-Reply-To: <1227431794.4180.7.camel@alst60.voltaire.com> (Aleksey Senin's message of "Sun, 23 Nov 2008 09:16:34 +0000") References: <1227431794.4180.7.camel@alst60.voltaire.com> Message-ID: > There was a set of kernel patches written by me and approved by Sean for > RDMA CM to support IPv6 protocol. Is there any reason why it not > applied? I'll be glad fix them. I didn't see any comment on them last time as I recall. I would prefer to get an ack from Sean before applying them. In any case I've lost the original mails from my mailbox. I think it would be a good idea for you to repost the patches against the latest kernel to move things forward. - R. From rdreier at cisco.com Mon Nov 24 09:01:28 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 24 Nov 2008 09:01:28 -0800 Subject: [ofa-general] Re: [PATCH next]infiniband: Kill directly reference of netdev->priv In-Reply-To: <492A748F.9040308@cn.fujitsu.com> (Wang Chen's message of "Mon, 24 Nov 2008 17:31:59 +0800") References: <492A748F.9040308@cn.fujitsu.com> Message-ID: Looks fine to me. Acked-by: Roland Dreier From rdreier at cisco.com Mon Nov 24 09:02:33 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 24 Nov 2008 09:02:33 -0800 Subject: [ofa-general] Re: [PATCH] IPoIB: Prevent address handles leak. In-Reply-To: <20081124153700.GA27848@mellanox.co.il> (Vladimir Sokolovsky's message of "Mon, 24 Nov 2008 17:37:00 +0200") References: <20081124153700.GA27848@mellanox.co.il> Message-ID: > There may be some extremely slight window for leaking address handles > still, since the multicast list is emptied in ipoib_mcast_dev_flush() before > it calls ipoib_mcast_free() (which adds address handles to the dead list). > > However, this seems to be the best compromise that I can see without > a lot of nasty (and possibly buggy) changes. The impact of this bug seems very low to me, so this is 2.6.29 material anyway. I would really rather fix this bug for real rather than just reducing the window and leaving the bug to cause problems in the future, so could you try and think of a solution that doesn't leave a window at all? From yossi.openib at gmail.com Mon Nov 24 09:13:59 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Mon, 24 Nov 2008 19:13:59 +0200 Subject: ***SPAM*** Re: [ofa-general] [PATCH] IPoIB: Prevent address handles leak. In-Reply-To: <20081124153700.GA27848@mellanox.co.il> References: <20081124153700.GA27848@mellanox.co.il> Message-ID: <492AE0D7.5010508@gmail.com> I think the problem is that multicast is not really flushed when the interface is downed. Therefore, a join can start after the device was brought down, and the fix below will not reap the ah's. How about reaping all remaining ah's after multicast device is really flushed, that is in ipoib_ib_dev_cleanup(), which is called when ib_ipoib is unloaded? This way you can have only a limited amount of non-reaped dead ah when interface is down (until the reap task is back on), and you can be certain that all of them will be reaped when module is unloaded. --Yossi Vladimir Sokolovsky wrote: > In case of removing ib_ipoib module ipoib_ib_dev_stop() function will be > called and all address handles (ah) in dead_ahs list will be reaped. > But some ah will be added to the dead list after ipoib_ib_dev_stop done > by ipoib_mcast_free. These ahs will not be freed. > > The solution here is to wait till multicast_list will be empty. So, all > ahs will be added to dead_ahs list. > > Signed-off-by: Vladimir Sokolovsky > --- > Roland, > There may be some extremely slight window for leaking address handles > still, since the multicast list is emptied in ipoib_mcast_dev_flush() before > it calls ipoib_mcast_free() (which adds address handles to the dead list). > > However, this seems to be the best compromise that I can see without > a lot of nasty (and possibly buggy) changes. > > drivers/infiniband/ulp/ipoib/ipoib_ib.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > index 66cafa2..6cc0c59 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > @@ -863,7 +863,7 @@ timeout: > > begin = jiffies; > > - while (!list_empty(&priv->dead_ahs)) { > + while (!list_empty(&priv->dead_ahs) || !list_empty(&priv->multicast_list)) { > __ipoib_reap_ah(dev); > > if (time_after(jiffies, begin + HZ)) { From weiny2 at llnl.gov Mon Nov 24 09:16:05 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 24 Nov 2008 09:16:05 -0800 Subject: [ofa-general] [PATCH 0/3] ibnetdiscover library "libibnetdisc" In-Reply-To: References: <20081120163809.26a3c499.weiny2@llnl.gov> Message-ID: <20081124091605.298547e9.weiny2@llnl.gov> On Fri, 21 Nov 2008 07:25:23 -0500 "Hal Rosenstock" wrote: > Hi Ira, > > On Thu, Nov 20, 2008 at 7:38 PM, Ira Weiny wrote: > > The following 3 patches implement "libibnetdisc" which provides the > > functionality of ibnetdiscover in a C library. > > > > I mentioned this to Sasha at the last Sonoma conference and posted the bulk of > > this code to the list a few months ago. This libary is still providing the 85% > > performance speed up of iblinkinfo.pl on our clusters. > > > > This new series is heavily tested and, for our hardware, preserves the > > functionality of ibnetdiscover. Since I don't have a Xsigo box to test on I > > can only verify that it compiles correctly. > > Have you also verified this QLogic/Silverstorm and Cisco chassis > switches ? They were supported too. I did not see the code for their support. I probably missed something. We have some QLogic switches on Hyperion now so I will test that. Thanks for the catch, Ira > > -- Hal > > > Ira > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general > > > From celine.bourde at ext.bull.net Mon Nov 24 09:30:02 2008 From: celine.bourde at ext.bull.net (Celine Bourde) Date: Mon, 24 Nov 2008 18:30:02 +0100 Subject: [ofa-general] QoS implementation Message-ID: <492AE49A.7090607@ext.bull.net> Hi, I'm testing QoS on opensm. I work with OFED-1.4-20081123-0600.tgz and opensm-3.2.4_20081122_c732c34. I've set up qos-policy file, SL2VL and VLArbitration Table (all in attachement). Results still have wrong values. I've launch opensm -Q /etc/ofa/opensm.conf and use qperf tools to test QoS implementation : bandwith test : --------------- SL1 should have 33% of bandwidth (1:64) , SL2 sould have 66% of bandwidth (2:128) cmd server : qperf -lp 19766 & qperf -lp 19764 cmd client : qperf 192.168.0.3 -lp 19766 -sl 1 rc_rdma_write_bw > sl1.txt & qperf 192.168.0.3 -lp 19764 -sl 2 rc_rdma_write_bw > sl2.txt results : sl1.txt : rc_rdma_write_bw: bw = 1.7 GB/sec sl2.txt : rc_rdma_write_bw: bw = 1.7 GB/sec latency test : -------------- cmd server: qperf -lp 19766 & qperf -lp 19764 cmd client : qperf 192.168.0.3 -lp 19766 -sl 1 rc_rdma_write_lat > sl1.txt & qperf 192.168.0.3 -lp 19764 -sl 2 rc_rdma_write_lat > sl2.txt results : sl1.txt : rc_rdma_write_lat latency = 12.9 us sl2.txt : rc_rdma_write_lat: latency = 12.9 us I've tested pc to pc without switch between both. I've a Mellanox ConnectX card on each pc with following features : ConnectX® IB QDR, ConnectX IB HCA/TCA IC, dual-port, QDR, PCIe 2.0 PCIe 2.0 5.0GT/s Firmware has been updated with latest version 2.5.9 []# ibstat CA 'mlx4_0' CA type: MT26428 Number of ports: 2 Firmware version: 2.5.900 Hardware version: a0 Node GUID: 0x0002c903000290aa System image GUID: 0x0002c903000290ad Capability mask: 0x0251086a When I use smpquery, results are the following : VLCap:...........................VL0-7 VLHighLimit:.....................0 VLArbHighCap:....................8 VLArbLowCap:.....................8 VLStallCount:....................0 So, my configuration (cf opensm.conf in attachement) doesn't correspond to smpquery results. I've tried to restart openibd on both pc. I've restarted opensm (stop and start), I'v tested "opensm -Q conf_file" but my configuration is always unset. Did I miss something or is it a bug ? Thanks for your help. Céline Bourde. MY CONFIGURATION IS THE FOLLOWING : I've added "options mlx4_core enable_qos=1" in modprobe.conf to set on QoS I've configured qos-policy file with following rules : ----------------------- qos-levels qos-level name: DEFAULT sl: 0 end-qos-level qos-level name: MPI sl: 1 end-qos-level qos-level name: Lustre sl: 2 end-qos-level end-qos-levels ---------------------- my qos settings in /etc/ofa/opensm.conf ----------------------- # # QoS OPTIONS # # Enable QoS setup qos TRUE # QoS policy file to be used qos_policy_file /etc/opensm/qos-policy.conf # QoS default options qos_max_vls 15 qos_high_limit 0 qos_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 qos_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 qos_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 # QoS CA options qos_ca_max_vls 15 qos_ca_high_limit 0 qos_ca_vlarb_high 0:0,1:0,2:0 qos_ca_vlarb_low 0:1,1:64,2:128 qos_ca_sl2vl 0,1,2,3,4,6,7,8,9,10,11,12,13,14,7,5 # QoS Switch external ports options qos_swe_max_vls 15 qos_swe_high_limit 0 qos_swe_vlarb_high 0:0,1:0,2:0 qos_swe_vlarb_low 0:1,1:64,2:128 qos_swe_sl2vl 0,1,2,3,4,6,7,8,9,10,11,12,13,14,7,5 From weiny2 at llnl.gov Mon Nov 24 09:42:43 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 24 Nov 2008 09:42:43 -0800 Subject: [ofa-general] Re: [PATCH 0/3] ibnetdiscover library "libibnetdisc" In-Reply-To: <20081123182741.GS21967@sashak.voltaire.com> References: <20081120163809.26a3c499.weiny2@llnl.gov> <20081123182741.GS21967@sashak.voltaire.com> Message-ID: <20081124094243.4dbcff51.weiny2@llnl.gov> On Sun, 23 Nov 2008 20:27:41 +0200 Sasha Khapyorsky wrote: > Hi Ira, > > On 16:38 Thu 20 Nov , Ira Weiny wrote: > > The following 3 patches implement "libibnetdisc" which provides the > > functionality of ibnetdiscover in a C library. > > > > I mentioned this to Sasha at the last Sonoma conference and posted the bulk of > > this code to the list a few months ago. This libary is still providing the 85% > > performance speed up of iblinkinfo.pl on our clusters. > > This is great! > > Do not you think this library should be rather part of infiniband-diags, > rather than separate package/management sub-project? Personally I would > prefer to have this as part of infiniband-diags. No, I would like to see it be a stand alone library. Currently infiniband-diags does not provide any library functionality and simply depends on the libraries provided by the rest of the management tree. Don't you think this is a good model to follow? Ira From weiny2 at llnl.gov Mon Nov 24 09:55:07 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 24 Nov 2008 09:55:07 -0800 Subject: [ofa-general] Re: [PATCH 3/3] Convert ibnetdiscover to use new ibnetdisc library. In-Reply-To: <20081123183517.GT21967@sashak.voltaire.com> References: <20081120163815.5cd110fb.weiny2@llnl.gov> <20081123183517.GT21967@sashak.voltaire.com> Message-ID: <20081124095507.785be95a.weiny2@llnl.gov> On Sun, 23 Nov 2008 20:35:17 +0200 Sasha Khapyorsky wrote: > Hi Ira, > > On 16:38 Thu 20 Nov , Ira Weiny wrote: > > From e2b8bac5d651c2278719d511dee2ab2e8ad05706 Mon Sep 17 00:00:00 2001 > > From: Ira Weiny > > Date: Thu, 20 Nov 2008 09:29:57 -0800 > > Subject: [PATCH] Convert ibnetdiscover to use new ibnetdisc library. > > > > Removed -e and -v since they were somewhat redundant with the -d option. > > I think it would be better to preserve an options for backward > compatibility. At least '-v' is used in dump_ftts.sh. It can be used in > other scripts... > Ah, ok... Actually dump_[lm]fts.sh use output which is provided by the "-s" option. I did not think any of the scripts would use "debugging" output for their processing... More testing is obviously needed. Thanks, Ira From yosefe at Voltaire.COM Mon Nov 24 09:58:22 2008 From: yosefe at Voltaire.COM (Yossi Etigin) Date: Mon, 24 Nov 2008 19:58:22 +0200 Subject: [ofa-general] [PATCH] ipoib: do not join broadcast group if interface is brought down In-Reply-To: <49246EB7.3070607@Voltaire.COM> References: <49246EB7.3070607@Voltaire.COM> Message-ID: <492AEB3E.4030202@Voltaire.COM> Roland, Can you please comment on this? Yossi Etigin wrote: > Because ipoib_workqueue is not flushed when ipoib interface is brought > down, > ipoib_mcast_join() may trigger a join to the broadcast group after > priv->broadcast > was set to NULL (during cleanup). This will cause ipoib to be joined to the > broadcast group when interface is down. > As a side effect, this breaks the optimization of setting qkey only when > joining > the broadcast group. > > Signed-off-by: Yossi Etigin > > -- > > Fix bugzilla 1370. > > Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > =================================================================== > --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-11-19 > 21:33:54.000000000 +0200 > +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-11-19 > 21:40:12.000000000 +0200 > @@ -565,7 +565,8 @@ void ipoib_mcast_join_task(struct work_s > ipoib_warn(priv, "ib_query_port failed\n"); > } > > - if (!priv->broadcast) { > + rtnl_lock(); > + if (test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) && !priv->broadcast) { > struct ipoib_mcast *broadcast; > > broadcast = ipoib_mcast_alloc(dev, 1); > @@ -576,6 +577,7 @@ void ipoib_mcast_join_task(struct work_s > queue_delayed_work(ipoib_workqueue, > &priv->mcast_join_task, HZ); > mutex_unlock(&mcast_mutex); > + rtnl_unlock(); > return; > } > > @@ -587,6 +589,7 @@ void ipoib_mcast_join_task(struct work_s > __ipoib_mcast_add(dev, priv->broadcast); > spin_unlock_irq(&priv->lock); > } > + rtnl_unlock(); > > if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { > if (!test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags)) -- --Yossi From sashak at voltaire.com Mon Nov 24 10:02:18 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 24 Nov 2008 20:02:18 +0200 Subject: [ofa-general] QoS implementation In-Reply-To: <492AE49A.7090607@ext.bull.net> References: <492AE49A.7090607@ext.bull.net> Message-ID: <20081124180218.GR6183@sashak.voltaire.com> Hi, On 18:30 Mon 24 Nov , Celine Bourde wrote: > > I'm testing QoS on opensm. > I work with OFED-1.4-20081123-0600.tgz and opensm-3.2.4_20081122_c732c34. > I've set up qos-policy file, SL2VL and VLArbitration Table (all in > attachement). > Results still have wrong values. > > I've launch opensm -Q /etc/ofa/opensm.conf Maybe you need: opensm -Q -F /etc/ofa/opensm.conf ? Sasha From cap at nsc.liu.se Mon Nov 24 10:16:55 2008 From: cap at nsc.liu.se (Peter Kjellstrom) Date: Mon, 24 Nov 2008 19:16:55 +0100 Subject: [ofa-general] infiniband problem, no NICs In-Reply-To: <492904C8.7000402@voltaire.com> References: <4925BD78.4030003@tu-berlin.de> <492904C8.7000402@voltaire.com> Message-ID: <200811241917.00503.cap@nsc.liu.se> On Sunday 23 November 2008, Or Gerlitz wrote: > Michael Oevermann wrote: > > However, when I directly start a mpi job (without using a scheduler) via: > > /usr/mpi/gcc4/openmpi-1.2.2-1/bin/mpirun -np 4 -hostfile > > /home/sysgen/infiniband-mpi-test/machine/usr/mpi/gcc4/openmpi-1.2.2-1/tes > >ts/IMB-2.3/IMB-MPI1 > > > > > > I get the error message: > > > > 0,1,0]: uDAPL on host n01 was unable to find any NICs... ... > The BTL you are working with uses a library named udapl and this library > relies on the IPoIB (IP over Infiniband) NICs (e.g ib0, ib1) existence. > Assuming these nics are not configured on your system, you can either > configure them (modprobe ib_ipoib / ifconfig ib0 x.y.z.w) or use a verb > (native IB access layer) BTL which does not reply on operative ipoib. Using verbs is the way to go. OpenMPI, afaik, does not recommend the udapl btl. I would recommend checking for the btl "openib" which is the verbs btl. If it does not exist rebuild OpenMPI (you will need libibverbs-devel). /Peter > Or. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. URL: From sashak at voltaire.com Mon Nov 24 10:28:06 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 24 Nov 2008 20:28:06 +0200 Subject: [ofa-general] [PATCH] opensm/man/opensm.8: add some missing stuff Message-ID: <20081124182806.GS6183@sashak.voltaire.com> Add some missing options, add arguments where needed. Signed-off-by: Sasha Khapyorsky --- opensm/man/opensm.8.in | 90 ++++++++++++++++++++++++++++-------------------- 1 files changed, 53 insertions(+), 37 deletions(-) diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in index b64daba..5c08de9 100644 --- a/opensm/man/opensm.8.in +++ b/opensm/man/opensm.8.in @@ -6,28 +6,44 @@ opensm \- InfiniBand subnet manager and administration (SM/SA) .SH SYNOPSIS .B opensm [\-\-version]] -[\-F | \-\-config ] [\-c(reate-config) ] -[\-g(uid) ] [\-l(mc) ] -[\-p(riority) ] [\-smkey ] [\-r(eassign_lids)] +[\-F | \-\-config ] +[\-c(reate-config) ] +[\-g(uid) ] +[\-l(mc) ] +[\-p(riority) ] +[\-smkey ] +[\-r(eassign_lids)] [\-R | \-\-routing_engine ] -[\-A | \-\-ucast_cache] [\-z | \-\-connect_roots] +[\-A | \-\-ucast_cache] +[\-z | \-\-connect_roots] [\-M | \-\-lid_matrix_file ] [\-U | \-\-lfts_file ] -[\-S | \-\-sadb_file ] [\-a | \-\-root_guid_file ] +[\-S | \-\-sadb_file ] +[\-a | \-\-root_guid_file ] [\-u | \-\-cn_guid_file ] [\-X | \-\-guid_routing_order_file ] [\-m | \-\-ids_guid_file ] -[\-o(nce)] [\-s(weep) ] -[\-t(imeout) ] [\-maxsmps ] -[\-console [off | local | socket | loopback]] [\-console-port ] -[\-i(gnore-guids) ] [\-f | \-\-log_file] -[\-L | \-\-log_limit ] [\-e(rase_log_file)] [\-P(config)] +[\-o(nce)] +[\-s(weep) ] +[\-t(imeout) ] +[\-maxsmps ] +[\-console [off | local | socket | loopback]] +[\-console-port ] +[\-i(gnore-guids) ] +[\-f | \-\-log_file ] +[\-L | \-\-log_limit ] [\-e(rase_log_file)] +[\-P(config) ] +[\-N | \-\-no_part_enforce] [\-Q | \-\-qos [\-Y | \-\-qos_policy_file ]] -[\-N | \-\-no_part_enforce] [\-y | \-\-stay_on_fatal] -[\-B | \-\-daemon] [\-I | \-\-inactive] -[\-\-perfmgr] [\-\-perfmgr_sweep_time_s ] +[\-y | \-\-stay_on_fatal] +[\-B | \-\-daemon] +[\-I | \-\-inactive] +[\-\-perfmgr] +[\-\-perfmgr_sweep_time_s ] [\-\-prefix_routes_file ] -[\-v(erbose)] [\-V] [\-D ] [\-d(ebug) ] [\-h(elp)] [\-?] +[\-\-consolidate_ipv6_snm_req] +[\-v(erbose)] [\-V] [\-D ] [\-d(ebug) ] +[\-h(elp)] [\-?] .SH DESCRIPTION .PP @@ -68,15 +84,15 @@ setup the subnet correctly. \fB\-\-version\fR Prints OpenSM version and exits. .TP -\fB\-F\fR, \fB\-\-config\fR +\fB\-F\fR, \fB\-\-config\fR The name of the OpenSM config file. When not specified \fB\% @OPENSM_CONFIG_DIR@/@OPENSM_CONFIG_FILE@\fP will be used (if exists). .TP -\fB\-c\fR, \fB\-\-create-config\fR +\fB\-c\fR, \fB\-\-create-config\fR OpenSM will dump its configuration to the specified file and exit. This is a way to generate OpenSM configuration file template. .TP -\fB\-g\fR, \fB\-\-guid\fR +\fB\-g\fR, \fB\-\-guid\fR This option specifies the local port GUID value with which OpenSM should bind. OpenSM may be bound to 1 port at a time. @@ -84,7 +100,7 @@ If GUID given is 0, OpenSM displays a list of possible port GUIDs and waits for user input. Without -g, OpenSM tries to use the default port. .TP -\fB\-l\fR, \fB\-\-lmc\fR +\fB\-l\fR, \fB\-\-lmc\fR This option specifies the subnet's LMC value. The number of LIDs assigned to each port is 2^LMC. The LMC value must be in the range 0-7. @@ -95,13 +111,13 @@ ports, i.e. multiple interconnects between switches. Without -l, OpenSM defaults to LMC = 0, which allows one path between any two ports. .TP -\fB\-p\fR, \fB\-\-priority\fR +\fB\-p\fR, \fB\-\-priority\fR This option specifies the SM\'s PRIORITY. This will effect the handover cases, where master is chosen by priority and GUID. Range goes from 0 (default and lowest priority) to 15 (highest). .TP -\fB\-smkey\fR +\fB\-smkey\fR This option specifies the SM\'s SM_Key (64 bits). This will effect SM authentication. Note that OpenSM version 3.2.1 and below used the default value '1' @@ -115,7 +131,7 @@ may disrupt subnet traffic. Without -r, OpenSM attempts to preserve existing LID assignments resolving multiple use of same LID. .TP -\fB\-R\fR, \fB\-\-routing_engine\fR +\fB\-R\fR, \fB\-\-routing_engine\fR This option chooses routing engine(s) to use instead of Min Hop algorithm (default). Multiple routing engines can be specified separated by commas so that specific ordering of routing algorithms @@ -140,33 +156,33 @@ only) to make connectivity between root switches and in this way to be fully IBA complaint. In many cases this can violate "pure" deadlock free algorithm, so use it carefully. .TP -\fB\-M\fR, \fB\-\-lid_matrix_file\fR +\fB\-M\fR, \fB\-\-lid_matrix_file\fR This option specifies the name of the lid matrix dump file from where switch lid matrices (min hops tables will be loaded. .TP -\fB\-U\fR, \fB\-\-lfts_file\fR +\fB\-U\fR, \fB\-\-lfts_file\fR This option specifies the name of the LFTs file from where switch forwarding tables will be loaded. .TP -\fB\-S\fR, \fB\-\-sadb_file\fR +\fB\-S\fR, \fB\-\-sadb_file\fR This option specifies the name of the SA DB dump file from where SA database will be loaded. .TP -\fB\-a\fR, \fB\-\-root_guid_file\fR +\fB\-a\fR, \fB\-\-root_guid_file\fR Set the root nodes for the Up/Down or Fat-Tree routing algorithm to the guids provided in the given file (one to a line). .TP -\fB\-u\fR, \fB\-\-cn_guid_file\fR +\fB\-u\fR, \fB\-\-cn_guid_file\fR Set the compute nodes for the Fat-Tree routing algorithm to the guids provided in the given file (one to a line). .TP -\fB\-m\fR, \fB\-\-ids_guid_file\fR +\fB\-m\fR, \fB\-\-ids_guid_file\fR Name of the map file with set of the IDs which will be used by Up/Down routing algorithm instead of node GUIDs (format: per line). .TP -\fB\-X\fR, \fB\-\-guid_routing_order_file\fR +\fB\-X\fR, \fB\-\-guid_routing_order_file\fR Set the order port guids will be routed for the MinHop and Up/Down routing algorithms to the guids provided in the given file (one to a line). @@ -175,20 +191,20 @@ given file (one to a line). This option causes OpenSM to configure the subnet once, then exit. Ports remain in the ACTIVE state. .TP -\fB\-s\fR, \fB\-\-sweep\fR +\fB\-s\fR, \fB\-\-sweep\fR This option specifies the number of seconds between subnet sweeps. Specifying -s 0 disables sweeping. Without -s, OpenSM defaults to a sweep interval of 10 seconds. .TP -\fB\-t\fR, \fB\-\-timeout\fR +\fB\-t\fR, \fB\-\-timeout\fR This option specifies the time in milliseconds used for transaction timeouts. Specifying -t 0 disables timeouts. Without -t, OpenSM defaults to a timeout value of 200 milliseconds. .TP -\fB\-maxsmps\fR +\fB\-maxsmps\fR This option specifies the number of VL15 SMP MADs allowed on the wire at any one time. Specifying -maxsmps 0 allows unlimited outstanding @@ -217,7 +233,7 @@ when it comes out of Standby state, if such file exists under OSM_CACHE_DIR, and is valid. By default, this is FALSE. .TP -\fB\-f\fR, \fB\-\-log_file\fR +\fB\-f\fR, \fB\-\-log_file\fR This option defines the log to be the given file. By default, the log goes to /var/log/opensm.log. For the log to go to standard output use -f stdout. @@ -232,11 +248,11 @@ This option will cause deletion of the log file (if it previously exists). By default, the log file is accumulative. .TP -\fB\-P\fR, \fB\-\-Pconfig\fR +\fB\-P\fR, \fB\-\-Pconfig\fR This option defines the optional partition configuration file. The default name is \fB\%@OPENSM_CONFIG_DIR@/@PARTITION_CONFIG_FILE@\fP. .TP -.BI --prefix_routes_file= path +\fB\-\-prefix_routes_file\fR Prefix routes control how the SA responds to path record queries for off-subnet DGIDs. By default, the SA fails such queries. The .B PREFIX ROUTES @@ -246,7 +262,7 @@ The default path is \fB\%@OPENSM_CONFIG_DIR@/prefix\-routes.conf\fP. \fB\-Q\fR, \fB\-\-qos\fR This option enables QoS setup. It is disabled by default. .TP -\fB\-Y\fR, \fB\-\-qos_policy_file\fR +\fB\-Y\fR, \fB\-\-qos_policy_file\fR This option defines the optional QoS policy file. The default name is \fB\%@OPENSM_CONFIG_DIR@/@QOS_POLICY_FILE@\fP. .TP @@ -295,7 +311,7 @@ The -V option is equivalent to \'-D 0xFF -d 2\'. See the -D option for more information about log verbosity. .TP -\fB\-D\fR +\fB\-D\fR This option sets the log verbosity level. A flags field must follow the -D option. A bit set/clear in the flags enables/disables a @@ -318,7 +334,7 @@ Specifying -D 0xFF enables all messages (see -V). High verbosity levels may require increasing the transaction timeout with the -t option. .TP -\fB\-d\fR, \fB\-\-debug\fR +\fB\-d\fR, \fB\-\-debug\fR This option specifies a debug option. These options are not normally needed. The number following -d selects the debug -- 1.6.0.3.517.g759a From meier3 at llnl.gov Mon Nov 24 10:34:22 2008 From: meier3 at llnl.gov (Timothy A. Meier) Date: Mon, 24 Nov 2008 10:34:22 -0800 Subject: [ofa-general] Re: [PATCH] Opensm: main exit codes In-Reply-To: <20081123185836.GU21967@sashak.voltaire.com> References: <4923678D.3080701@llnl.gov> <20081123185836.GU21967@sashak.voltaire.com> Message-ID: <492AF3AE.3060605@llnl.gov> Hi Sasha, Sasha Khapyorsky wrote: > Hi Tim, > > On 17:10 Tue 18 Nov , Timothy A. Meier wrote: >> I thought it would be useful to define a set of exit codes for opensm. A quick examination of main.c >> showed a few different ways to terminate. How about this patch? Obviously this doesn't catch every >> possible exit scenario, but its a start that can be built upon. > > Personally I read 'exit(0)' faster than 'exit(OSM_EXIT_TYPE_NORMAL)', > but maybe it is just me :). Me too :^) Not much confusion over a return code of 0. The audience for this change wouldn't be the people writing the software, but admins, scripts, and tools that start/stop/monitor opensm. At least that is our use case. > > Maybe error codes could be formalized, but I'm not sure that it would be > beneficial without any practical uses (and clear requirements > understanding). Finally we can found us in a middle of the total mess > similar to how OSM_LOG_* is used today. > > Sasha > So the uses/requirements would be to formalize how opensm handles the non-ideal termination condition, for the purpose of providing quick, convenient, and consistent information for other system level tools that are responsible for starting/stopping/monitoring/reporting opensm. I can't think of any other reasons or needs. -- Timothy A. Meier Computer Scientist ICCD/High Performance Computing meier3 at llnl.gov From sashak at voltaire.com Mon Nov 24 11:02:51 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 24 Nov 2008 21:02:51 +0200 Subject: [ofa-general] Re: [PATCH] Opensm: main exit codes In-Reply-To: <492AF3AE.3060605@llnl.gov> References: <4923678D.3080701@llnl.gov> <20081123185836.GU21967@sashak.voltaire.com> <492AF3AE.3060605@llnl.gov> Message-ID: <20081124190251.GT6183@sashak.voltaire.com> On 10:34 Mon 24 Nov , Timothy A. Meier wrote: > Hi Sasha, > > Sasha Khapyorsky wrote: > > Hi Tim, > > > > On 17:10 Tue 18 Nov , Timothy A. Meier wrote: > >> I thought it would be useful to define a set of exit codes for opensm. A quick examination of main.c > >> showed a few different ways to terminate. How about this patch? Obviously this doesn't catch every > >> possible exit scenario, but its a start that can be built upon. > > > > Personally I read 'exit(0)' faster than 'exit(OSM_EXIT_TYPE_NORMAL)', > > but maybe it is just me :). > > Me too :^) Not much confusion over a return code of 0. > > The audience for this change wouldn't be the people writing the software, Somehow we need to care about yourselves too :) > but admins, scripts, and tools that > start/stop/monitor opensm. At least that is our use case. > > > > > Maybe error codes could be formalized, but I'm not sure that it would be > > beneficial without any practical uses (and clear requirements > > understanding). Finally we can found us in a middle of the total mess > > similar to how OSM_LOG_* is used today. > > > > Sasha > > > > So the uses/requirements would be to formalize how opensm handles the non-ideal termination condition, > for the purpose of providing quick, convenient, and consistent information for other system level tools > that are responsible for starting/stopping/monitoring/reporting opensm. And are there any of such tools? Or any *real* use? Sasha From halr at obsidianresearch.com Mon Nov 24 11:06:59 2008 From: halr at obsidianresearch.com (Hal Rosenstock) Date: Mon, 24 Nov 2008 12:06:59 -0700 Subject: [ofa-general] [PATCH][TRIVIAL] opensm.8.in: Update email address Message-ID: <492AFB53.7010007@obsidianresearch.com> Sasha, Attached patch is a trivial update of email address in opensm man page. -- Hal -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch-osm-man2 URL: From hal.rosenstock at gmail.com Mon Nov 24 11:08:18 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 24 Nov 2008 14:08:18 -0500 Subject: [ofa-general] [PATCH 0/3] ibnetdiscover library "libibnetdisc" In-Reply-To: <20081124091605.298547e9.weiny2@llnl.gov> References: <20081120163809.26a3c499.weiny2@llnl.gov> <20081124091605.298547e9.weiny2@llnl.gov> Message-ID: On Mon, Nov 24, 2008 at 12:16 PM, Ira Weiny wrote: > On Fri, 21 Nov 2008 07:25:23 -0500 > "Hal Rosenstock" wrote: > >> Hi Ira, >> >> On Thu, Nov 20, 2008 at 7:38 PM, Ira Weiny wrote: >> > The following 3 patches implement "libibnetdisc" which provides the >> > functionality of ibnetdiscover in a C library. >> > >> > I mentioned this to Sasha at the last Sonoma conference and posted the bulk of >> > this code to the list a few months ago. This libary is still providing the 85% >> > performance speed up of iblinkinfo.pl on our clusters. >> > >> > This new series is heavily tested and, for our hardware, preserves the >> > functionality of ibnetdiscover. Since I don't have a Xsigo box to test on I >> > can only verify that it compiles correctly. >> >> Have you also verified this QLogic/Silverstorm and Cisco chassis >> switches ? They were supported too. > > I did not see the code for their support. I probably missed something. We > have some QLogic switches on Hyperion now so I will test that. Just to be sure: it's the grouping option which should be tested. -- Hal > Thanks for the catch, > Ira > >> >> -- Hal >> >> > Ira >> > >> > _______________________________________________ >> > general mailing list >> > general at lists.openfabrics.org >> > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> > >> > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general >> > >> > From sashak at voltaire.com Mon Nov 24 11:10:50 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 24 Nov 2008 21:10:50 +0200 Subject: [ofa-general] Re: [PATCH 0/3] ibnetdiscover library "libibnetdisc" In-Reply-To: <20081124094243.4dbcff51.weiny2@llnl.gov> References: <20081120163809.26a3c499.weiny2@llnl.gov> <20081123182741.GS21967@sashak.voltaire.com> <20081124094243.4dbcff51.weiny2@llnl.gov> Message-ID: <20081124191050.GU6183@sashak.voltaire.com> On 09:42 Mon 24 Nov , Ira Weiny wrote: > > > > Do not you think this library should be rather part of infiniband-diags, > > rather than separate package/management sub-project? Personally I would > > prefer to have this as part of infiniband-diags. > > No, I would like to see it be a stand alone library. Currently > infiniband-diags does not provide any library functionality and simply depends > on the libraries provided by the rest of the management tree. Don't you think > this is a good model to follow? Why it must be so - infiniband-diags will be useless without this library. And I would really hate to handle one more package (actually not just one - libibnetdisc, libibnetdisc-devel, libibnetdisc-static, etc.). I wanted to remove libibcommon... Sasha From sashak at voltaire.com Mon Nov 24 11:13:04 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 24 Nov 2008 21:13:04 +0200 Subject: [ofa-general] Re: [PATCH][TRIVIAL] opensm.8.in: Update email address In-Reply-To: <492AFB53.7010007@obsidianresearch.com> References: <492AFB53.7010007@obsidianresearch.com> Message-ID: <20081124191304.GV6183@sashak.voltaire.com> On 12:06 Mon 24 Nov , Hal Rosenstock wrote: > Sasha, > > Attached patch is a trivial update of email address in opensm man page. > > -- Hal > opensm.8.in: Update email address > > Signed-off-by: Hal Rosenstock Applied. Thanks. Sasha From weiny2 at llnl.gov Mon Nov 24 11:30:05 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 24 Nov 2008 11:30:05 -0800 Subject: [ofa-general] Re: [PATCH 0/3] ibnetdiscover library "libibnetdisc" In-Reply-To: <20081124191050.GU6183@sashak.voltaire.com> References: <20081120163809.26a3c499.weiny2@llnl.gov> <20081123182741.GS21967@sashak.voltaire.com> <20081124094243.4dbcff51.weiny2@llnl.gov> <20081124191050.GU6183@sashak.voltaire.com> Message-ID: <20081124113005.4261cfd1.weiny2@llnl.gov> On Mon, 24 Nov 2008 21:10:50 +0200 Sasha Khapyorsky wrote: > On 09:42 Mon 24 Nov , Ira Weiny wrote: > > > > > > Do not you think this library should be rather part of infiniband-diags, > > > rather than separate package/management sub-project? Personally I would > > > prefer to have this as part of infiniband-diags. > > > > No, I would like to see it be a stand alone library. Currently > > infiniband-diags does not provide any library functionality and simply depends > > on the libraries provided by the rest of the management tree. Don't you think > > this is a good model to follow? > > Why it must be so - infiniband-diags will be useless without this library. > > And I would really hate to handle one more package (actually not just one > - libibnetdisc, libibnetdisc-devel, libibnetdisc-static, etc.). I wanted > to remove libibcommon... > I think the argument against ibcommon is that it does not provide enough additional functionality to warrant an entire new library. On the other hand infiniband-diags depends on many libraries: AC_CHECK_LIB(ibcommon, ... <== delete this... And you still have the following... AC_CHECK_LIB(ibumad, ... AC_CHECK_LIB(ibmad, ... AC_CHECK_LIB(osmcomp, ... AC_CHECK_LIB(osmvendor, ... AC_CHECK_LIB(opensm, ... I don't think it is in appropriate to have utilities which are dependent on libraries, it is done all the time. Ira From sashak at voltaire.com Mon Nov 24 12:01:51 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 24 Nov 2008 22:01:51 +0200 Subject: [ofa-general] Re: [PATCH 0/3] ibnetdiscover library "libibnetdisc" In-Reply-To: <20081124113005.4261cfd1.weiny2@llnl.gov> References: <20081120163809.26a3c499.weiny2@llnl.gov> <20081123182741.GS21967@sashak.voltaire.com> <20081124094243.4dbcff51.weiny2@llnl.gov> <20081124191050.GU6183@sashak.voltaire.com> <20081124113005.4261cfd1.weiny2@llnl.gov> Message-ID: <20081124200151.GX6183@sashak.voltaire.com> On 11:30 Mon 24 Nov , Ira Weiny wrote: > On Mon, 24 Nov 2008 21:10:50 +0200 > Sasha Khapyorsky wrote: > > > On 09:42 Mon 24 Nov , Ira Weiny wrote: > > > > > > > > Do not you think this library should be rather part of infiniband-diags, > > > > rather than separate package/management sub-project? Personally I would > > > > prefer to have this as part of infiniband-diags. > > > > > > No, I would like to see it be a stand alone library. Currently > > > infiniband-diags does not provide any library functionality and simply depends > > > on the libraries provided by the rest of the management tree. Don't you think > > > this is a good model to follow? > > > > Why it must be so - infiniband-diags will be useless without this library. > > > > And I would really hate to handle one more package (actually not just one > > - libibnetdisc, libibnetdisc-devel, libibnetdisc-static, etc.). I wanted > > to remove libibcommon... > > > > I think the argument against ibcommon is that it does not provide enough > additional functionality to warrant an entire new library. It is probably the same case with libibnetdisc (at least now). > On the other hand > infiniband-diags depends on many libraries: > > AC_CHECK_LIB(ibcommon, ... <== delete this... > > And you still have the following... > > AC_CHECK_LIB(ibumad, ... > AC_CHECK_LIB(ibmad, ... > AC_CHECK_LIB(osmcomp, ... > AC_CHECK_LIB(osmvendor, ... > AC_CHECK_LIB(opensm, ... > > I don't think it is in appropriate to have utilities which are dependent on > libraries, it is done all the time. OTOH it doesn't mean that any new shared code must be done as separate subproject. The stuff is new. I think it is better to integrate it in smaller iterations, to start with the code and functionality and to not bother with packaging, dependencies, etc.. If there will be a reason to make separate library we can do it, but then we will have a stable code already. Sasha From tziporet at mellanox.co.il Mon Nov 24 13:15:32 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 24 Nov 2008 23:15:32 +0200 Subject: [ofa-general] OFED Nov 24, 2008 meeting minutes In-Reply-To: <458BC6B0F287034F92FE78908BD01CE84EF35EF0@mtlexch01.mtl.com> Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD0FE7A8@mtlexch01.mtl.com> OFED Nov 24, 2008 meeting minutes =========================== Meeting Summary: ============== * OFED 1.4 release: RC6 on Nov 28, GA on Dec 8 * UNH will test RC6 as part of Logo program (will start with RC5 this week) * OFED documentation and training - Jim Ryan will raise in next XWG meeting Details: ======= > 1. Bugs status review: > 1370 blo vlad at mellanox.co.il Ping over IPoIB > I/F fails after ifconfig down and up - there is a fix but its not > integrated - Vlad to take it > 1242 cri yannick.cote at qlogic.com kernel panic while > running mpi2007 against ofed1.4 -- ib_... - should be delayed for > 1.4.1 - move to normal > 1410 cri vlad at mellanox.co.il Memory leak > (address handler not reped) in IPoIB - we have a fix, need to decide > 1289 maj jackm at mellanox.co.il Ib and ipoib doesn't > respond while running multiple tests ... - should be fixed - ask > Mellanox QA to check > 1407 maj monis at voltaire.com Active-Backup failure > when disabling an active slave inte... - fixed with new bonding > package > 1377 maj vu at mellanox.com Deadlock occurred during > HA test - on work > 1380 maj vu at mellanox.com Cannot unload ib_srpt > module on SRP target - moved to normal (involves scst's mid-layer > module which we don't have *a lot of* control) > 1395 maj vu at mellanox.com kernel panic during SRP > HA test - on work > 1384 maj eli at mellanox.co.il netperf latency > small messages increase 5% - not reproduced on SLES10 SP2 with FW > 2.5.0 > 1385 maj eli at mellanox.co.il ofed 1.4 - netperf udp > BW small messages decrease ~8% > 1386 maj eli at mellanox.co.il ofed 1.4 - iperf tcp > connected mode BW large messages dec... > mvapich2 - Going to update package + RN > 2. Decided on release date: * RC6 - 28 for Nov * GA - Dec 8 after UNH testing for the logo program > 3. Decided on next meetings dates: > Dec 1, Dec 8 and Dec 15 (only if the release is not done) > 4. Logo program and OFED release: * We wish that UNH will test it on each RC, as each vendor does, so we will not find surprises in the last RC. * Rupert will see if they can start with RC5 this week. * UNH will test RC6 next week * Need to add Windows - Linux interop to the Logo program - Rupert will lead the change to the test plan. Should be done in next interop event. 5. Training and documentation: * In OFA BOF there was a request for documentation and training for using OFED. * Jim Ryan will raise the subject of training/documents in the XWG meeting, and see if OFA can finance it * Options - some company in the bay area (I did not captured the name) and UNH on east cost * Olga to check with Voltaire if they have something to contribute * Tziporet will check with Mellanox to see if we can contribute our verbs user manual > Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From weiny2 at llnl.gov Mon Nov 24 13:49:38 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 24 Nov 2008 13:49:38 -0800 Subject: [ofa-general] Re: [PATCH 0/3] ibnetdiscover library "libibnetdisc" In-Reply-To: <20081124200151.GX6183@sashak.voltaire.com> References: <20081120163809.26a3c499.weiny2@llnl.gov> <20081123182741.GS21967@sashak.voltaire.com> <20081124094243.4dbcff51.weiny2@llnl.gov> <20081124191050.GU6183@sashak.voltaire.com> <20081124113005.4261cfd1.weiny2@llnl.gov> <20081124200151.GX6183@sashak.voltaire.com> Message-ID: <20081124134938.61c345e0.weiny2@llnl.gov> On Mon, 24 Nov 2008 22:01:51 +0200 Sasha Khapyorsky wrote: > On 11:30 Mon 24 Nov , Ira Weiny wrote: > > On Mon, 24 Nov 2008 21:10:50 +0200 > > Sasha Khapyorsky wrote: > > > > > On 09:42 Mon 24 Nov , Ira Weiny wrote: > > > > > > > > > > Do not you think this library should be rather part of infiniband-diags, > > > > > rather than separate package/management sub-project? Personally I would > > > > > prefer to have this as part of infiniband-diags. > > > > > > > > No, I would like to see it be a stand alone library. Currently > > > > infiniband-diags does not provide any library functionality and simply depends > > > > on the libraries provided by the rest of the management tree. Don't you think > > > > this is a good model to follow? > > > > > > Why it must be so - infiniband-diags will be useless without this library. > > > > > > And I would really hate to handle one more package (actually not just one > > > - libibnetdisc, libibnetdisc-devel, libibnetdisc-static, etc.). I wanted > > > to remove libibcommon... > > > > > > > I think the argument against ibcommon is that it does not provide enough > > additional functionality to warrant an entire new library. > > It is probably the same case with libibnetdisc (at least now). > > > On the other hand > > infiniband-diags depends on many libraries: > > > > AC_CHECK_LIB(ibcommon, ... <== delete this... > > > > And you still have the following... > > > > AC_CHECK_LIB(ibumad, ... > > AC_CHECK_LIB(ibmad, ... > > AC_CHECK_LIB(osmcomp, ... > > AC_CHECK_LIB(osmvendor, ... > > AC_CHECK_LIB(opensm, ... > > > > I don't think it is in appropriate to have utilities which are dependent on > > libraries, it is done all the time. > > OTOH it doesn't mean that any new shared code must be done as separate > subproject. > > The stuff is new. I think it is better to integrate it in smaller > iterations, to start with the code and functionality and to not bother > with packaging, dependencies, etc.. If there will be a reason to make > separate library we can do it, but then we will have a stable code > already. As long as the library exists any dependant package can of course use the library from whatever package we chose (libibnetdisc or infiniband-diags). We have some code which is prototyped against ibnetdiscover but we plan on using this library instead. This would be separate from infiniband-diags. But we can just as easily put a dependancy on infiniband-diags as on libibnetdisc. The fact is that it was actually easier to put this in a new package rather than try and integrate with infiniband-diags. I thought it made for a very clean conversion by putting the library in as a new patch and then we could convert the diags as appropriate. Anyway, I will integrate it as you say and resubmit the patch. Ira From rdreier at cisco.com Mon Nov 24 13:52:55 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 24 Nov 2008 13:52:55 -0800 Subject: [ofa-general] Re: [PATCH 1 of 2] libmlx4: Fix race condition in create/destroy QP In-Reply-To: <200811221153.49156.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Sat, 22 Nov 2008 11:53:48 +0200") References: <200811221153.49156.jackm@dev.mellanox.co.il> Message-ID: I think I see one bug at least: > @@ -580,9 +584,12 @@ int mlx4_destroy_qp(struct ibv_qp *ibqp) > struct mlx4_qp *qp = to_mqp(ibqp); > int ret; > > + pthread_mutex_lock(&to_mctx(ibqp->context)->qp_table_mutex); > ret = ibv_cmd_destroy_qp(ibqp); > - if (ret) > + if (ret) { > + pthread_mutex_lock(&to_mctx(ibqp->context)->qp_table_mutex); The second one should be unlock. I'm too tired to check everything carefully enough to be sure it's right though. Can you double-check your lock balancing and error paths, and resend fixed patches? Thanks, Roland From rdreier at cisco.com Mon Nov 24 13:56:50 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 24 Nov 2008 13:56:50 -0800 Subject: [ofa-general] Re: [PATCH 03/10] RDMA/nes: Remove tx_free_list In-Reply-To: <20081121205044.GA7424@ctung-MOBL> (Chien Tung's message of "Fri, 21 Nov 2008 14:50:44 -0600") References: <20081121205044.GA7424@ctung-MOBL> Message-ID: > +static struct sk_buff *get_free_pkt(u32 pktsize) > { > - u32 hashkey = 0; > - > - hashkey = loc_addr + rem_addr + loc_port + rem_port; > - hashkey = (hashkey % NES_CM_HASHTABLE_SIZE); > - > - return hashkey; > + return dev_alloc_skb(pktsize); > } Given this, is there any reason to have get_free_pkt() at all? Why not just use dev_alloc_skb() directly? - R. From rdreier at cisco.com Mon Nov 24 13:59:43 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 24 Nov 2008 13:59:43 -0800 Subject: [ofa-general] Re: [PATCH 08/10] RDMA/nes: Change accept_pend_cnt to atomic In-Reply-To: <20081121205058.GA8184@ctung-MOBL> (Chien Tung's message of "Fri, 21 Nov 2008 14:50:58 -0600") References: <20081121205058.GA8184@ctung-MOBL> Message-ID: > There is a race condition on accept_pend_cnt. Change it to atomic. This is much too terse, so I don't know what the race is or how the patch fixes it. But... > + if (atomic_dec_and_test(&cm_node->accept_pend)) { you do atomic_dec_and_test() but then the only other manipulations of accept_pend that I see are: > + atomic_set(&cm_node->accept_pend, 0); > + atomic_set(&cm_node->accept_pend, 1); and there's no particular ordering between atomic_set() and atomic_dec_and_test() that I know of to protect against races. So at least a better desription of the patch, please. - R. From meier3 at llnl.gov Mon Nov 24 14:03:38 2008 From: meier3 at llnl.gov (Timothy A. Meier) Date: Mon, 24 Nov 2008 14:03:38 -0800 Subject: [ofa-general] Re: [PATCH] Opensm: main exit codes In-Reply-To: <20081124190251.GT6183@sashak.voltaire.com> References: <4923678D.3080701@llnl.gov> <20081123185836.GU21967@sashak.voltaire.com> <492AF3AE.3060605@llnl.gov> <20081124190251.GT6183@sashak.voltaire.com> Message-ID: <492B24BA.40303@llnl.gov> Hi Sasha, I guess I viewed this patch as just cleaning up the interface between the program and the system. Sasha Khapyorsky wrote: > On 10:34 Mon 24 Nov , Timothy A. Meier wrote: >> Hi Sasha, >> >> Sasha Khapyorsky wrote: >>> Hi Tim, >>> >>> On 17:10 Tue 18 Nov , Timothy A. Meier wrote: >>>> I thought it would be useful to define a set of exit codes for opensm. A quick examination of main.c >>>> showed a few different ways to terminate. How about this patch? Obviously this doesn't catch every >>>> possible exit scenario, but its a start that can be built upon. >>> Personally I read 'exit(0)' faster than 'exit(OSM_EXIT_TYPE_NORMAL)', >>> but maybe it is just me :). >> Me too :^) Not much confusion over a return code of 0. >> >> The audience for this change wouldn't be the people writing the software, > > Somehow we need to care about yourselves too :) > >> but admins, scripts, and tools that >> start/stop/monitor opensm. At least that is our use case. >> >>> Maybe error codes could be formalized, but I'm not sure that it would be >>> beneficial without any practical uses (and clear requirements >>> understanding). Finally we can found us in a middle of the total mess >>> similar to how OSM_LOG_* is used today. >>> >>> Sasha >>> >> So the uses/requirements would be to formalize how opensm handles the non-ideal termination condition, >> for the purpose of providing quick, convenient, and consistent information for other system level tools >> that are responsible for starting/stopping/monitoring/reporting opensm. > > And are there any of such tools? Or any *real* use? > Chicken/Egg? Currently, we depend on only ZERO or non-zero. Although OpenSM returns "other" values on exit, they aren't really formalized or documented. Hence the patch. ;^) Personally, I have (and create) several different versions of opensm with small customizations, and test them on our cluster testbeds. I often will start/stop them in a variety of configurations (with and without plugins, more than one sm on a node, etc.) and if and when opensm doesn't startup normally, it would be nice to have a meaningful exit code. Perhaps others might find it useful as well, or for some future use. But again, I originally considered this more as code cleanup. Converting the exits, returns, and aborts to provide a more consistent interface to the system. -- Timothy A. Meier Computer Scientist ICCD/High Performance Computing meier3 at llnl.gov From chien.tin.tung at intel.com Mon Nov 24 14:14:30 2008 From: chien.tin.tung at intel.com (Tung, Chien Tin) Date: Mon, 24 Nov 2008 15:14:30 -0700 Subject: [ofa-general] RE: [PATCH 03/10] RDMA/nes: Remove tx_free_list In-Reply-To: References: <20081121205044.GA7424@ctung-MOBL> Message-ID: <60BEFF3FBD4C6047B0F13F205CAFA3830310DC721E@azsmsx501.amr.corp.intel.com> >Given this, is there any reason to have get_free_pkt() at all? Why not >just use dev_alloc_skb() directly? We were trying to make minimum change to the code. There is no reason left For get_free_pkt(). I can rework the patch to remove it. Chien From davem at davemloft.net Mon Nov 24 15:34:11 2008 From: davem at davemloft.net (David Miller) Date: Mon, 24 Nov 2008 15:34:11 -0800 (PST) Subject: [ofa-general] Re: [PATCH next]infiniband: Kill directly reference of netdev->priv In-Reply-To: References: <492A748F.9040308@cn.fujitsu.com> Message-ID: <20081124.153411.148586270.davem@davemloft.net> From: Roland Dreier Date: Mon, 24 Nov 2008 09:01:28 -0800 > Looks fine to me. > > Acked-by: Roland Dreier Applied, thanks everyone. From jackm at dev.mellanox.co.il Mon Nov 24 22:36:45 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 25 Nov 2008 08:36:45 +0200 Subject: [ofa-general] Re: [PATCH 1 of 2] libmlx4: Fix race condition in create/destroy QP In-Reply-To: References: <200811221153.49156.jackm@dev.mellanox.co.il> Message-ID: <200811250836.45468.jackm@dev.mellanox.co.il> On Monday 24 November 2008 23:52, Roland Dreier wrote: > I think I see one bug at least: > > > @@ -580,9 +584,12 @@ int mlx4_destroy_qp(struct ibv_qp *ibqp) > > struct mlx4_qp *qp = to_mqp(ibqp); > > int ret; > > > > + pthread_mutex_lock(&to_mctx(ibqp->context)->qp_table_mutex); > > ret = ibv_cmd_destroy_qp(ibqp); > > - if (ret) > > + if (ret) { > > + pthread_mutex_lock(&to_mctx(ibqp->context)->qp_table_mutex); > > The second one should be unlock. > > I'm too tired to check everything carefully enough to be sure it's right > though. Can you double-check your lock balancing and error paths, and > resend fixed patches? > > Thanks, > Roland Ouch! that is the only bug (after my careful review). I guess I was tired when I sent them. I'm resending this patch (fixed) only -- the libmthca patch is OK. - Jack From jackm at dev.mellanox.co.il Mon Nov 24 22:40:07 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 25 Nov 2008 08:40:07 +0200 Subject: [ofa-general] [PATCH 1 of 2 V2] libmlx4: Fix race condition in create/destroy QP Message-ID: <200811250840.07944.jackm@dev.mellanox.co.il> Index: libmlx4/src/qp.c =================================================================== --- libmlx4.orig/src/qp.c 2008-11-20 11:46:58.000000000 +0200 +++ libmlx4/src/qp.c 2008-11-22 09:44:13.000000000 +0200 @@ -667,37 +667,25 @@ struct mlx4_qp *mlx4_find_qp(struct mlx4 int mlx4_store_qp(struct mlx4_context *ctx, uint32_t qpn, struct mlx4_qp *qp) { int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift; - int ret = 0; - - pthread_mutex_lock(&ctx->qp_table_mutex); if (!ctx->qp_table[tind].refcnt) { ctx->qp_table[tind].table = calloc(ctx->qp_table_mask + 1, sizeof (struct mlx4_qp *)); - if (!ctx->qp_table[tind].table) { - ret = -1; - goto out; - } + if (!ctx->qp_table[tind].table) + return -1; } ++ctx->qp_table[tind].refcnt; ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = qp; - -out: - pthread_mutex_unlock(&ctx->qp_table_mutex); - return ret; + return 0; } void mlx4_clear_qp(struct mlx4_context *ctx, uint32_t qpn) { int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift; - pthread_mutex_lock(&ctx->qp_table_mutex); - if (!--ctx->qp_table[tind].refcnt) free(ctx->qp_table[tind].table); else ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = NULL; - - pthread_mutex_unlock(&ctx->qp_table_mutex); } Index: libmlx4/src/verbs.c =================================================================== --- libmlx4.orig/src/verbs.c 2008-11-20 11:46:58.000000000 +0200 +++ libmlx4/src/verbs.c 2008-11-25 08:31:26.000000000 +0200 @@ -452,6 +452,8 @@ struct ibv_qp *mlx4_create_qp(struct ibv cmd.sq_no_prefetch = 0; /* OK for ABI 2: just a reserved field */ memset(cmd.reserved, 0, sizeof cmd.reserved); + pthread_mutex_lock(&to_mctx(pd->context)->qp_table_mutex); + ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr, &cmd.ibv_cmd, sizeof cmd, &resp, sizeof resp); if (ret) @@ -460,6 +462,7 @@ struct ibv_qp *mlx4_create_qp(struct ibv ret = mlx4_store_qp(to_mctx(pd->context), qp->ibv_qp.qp_num, qp); if (ret) goto err_destroy; + pthread_mutex_unlock(&to_mctx(pd->context)->qp_table_mutex); qp->rq.wqe_cnt = qp->rq.max_post = attr->cap.max_recv_wr; qp->rq.max_gs = attr->cap.max_recv_sge; @@ -477,6 +480,7 @@ err_destroy: ibv_cmd_destroy_qp(&qp->ibv_qp); err_rq_db: + pthread_mutex_unlock(&to_mctx(pd->context)->qp_table_mutex); if (!attr->srq) mlx4_free_db(to_mctx(pd->context), MLX4_DB_TYPE_RQ, qp->db); @@ -580,9 +584,12 @@ int mlx4_destroy_qp(struct ibv_qp *ibqp) struct mlx4_qp *qp = to_mqp(ibqp); int ret; + pthread_mutex_lock(&to_mctx(ibqp->context)->qp_table_mutex); ret = ibv_cmd_destroy_qp(ibqp); - if (ret) + if (ret) { + pthread_mutex_unlock(&to_mctx(ibqp->context)->qp_table_mutex); return ret; + } mlx4_lock_cqs(ibqp); @@ -594,6 +601,7 @@ int mlx4_destroy_qp(struct ibv_qp *ibqp) mlx4_clear_qp(to_mctx(ibqp->context), ibqp->qp_num); mlx4_unlock_cqs(ibqp); + pthread_mutex_unlock(&to_mctx(ibqp->context)->qp_table_mutex); if (!ibqp->srq) mlx4_free_db(to_mctx(ibqp->context), MLX4_DB_TYPE_RQ, qp->db); From vlad at lists.openfabrics.org Tue Nov 25 03:42:57 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Tue, 25 Nov 2008 03:42:57 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081125-0200 daily build status Message-ID: <20081125114257.AFA86E60939@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From fenkes at de.ibm.com Tue Nov 25 04:58:06 2008 From: fenkes at de.ibm.com (Joachim Fenkes) Date: Tue, 25 Nov 2008 13:58:06 +0100 Subject: [ofa-general] [PATCH] IB/ehca: Change misleading error message In-Reply-To: <48499C11.7030504@gmail.com> References: <200806061835.43802.fenkes@de.ibm.com> <48499C11.7030504@gmail.com> Message-ID: <200811251358.06729.fenkes@de.ibm.com> The error message printed when the eHCA driver prevents memory hotplug is misleading -- the user might think that hot-removing the lhca, hotplugging memory, then hot-adding the lhca again will work, but it doesn't. Signed-off-by: Joachim Fenkes --- drivers/infiniband/hw/ehca/ehca_main.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index bb02a86..bec7e02 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -994,8 +994,7 @@ static int ehca_mem_notifier(struct notifier_block *nb, if (printk_timed_ratelimit(&ehca_dmem_warn_time, 30 * 1000)) ehca_gen_err("DMEM operations are not allowed" - "as long as an ehca adapter is" - "attached to the LPAR"); + "in conjunction with eHCA"); return NOTIFY_BAD; } } -- 1.5.5 From Robert at saq.co.uk Tue Nov 25 06:20:43 2008 From: Robert at saq.co.uk (Robert Dunkley) Date: Tue, 25 Nov 2008 14:20:43 -0000 Subject: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" Message-ID: Hi everyone, I'm using a setup of two machines (Lets call them A and B) directly connected by 1 cable. Each machine has a Mellanox MT25204 (Gen3 Mellanox PCI-E Infiniband card) and uses IPOIB, they run Centos 5.2 with OFED 1.3 installed, Machine B runs OpenSM. All was working fine. I shutdown Machine A did some maintenance and then powered it on again, everything is OK again. I then shutdown Machine B (The one running OpenSM), this seemed to really upset Machine A. After booting Machine B again, Machine B looks OK with the port down and in polling state. Machine A however gives the following error if I run ibstat: ibpanic: [11406] main: stat of IB device 'mthca0' failed: (Resource temporarily unavailable) I don't want to reboot Machine A as it must synch data with Machine B over the Infiniband link first. Does anyone have any idea how to fix machine A? Thanks, Rob The SAQ Group Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ SEMTEC Limited Trading as SAQ is Registered in England & Wales Company Number: 06481952 http://www.saqnet.co.uk AS29219 SAQ Group Delivers high quality, honestly priced communication and I.T. services to UK Business. DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : Backups : Managed Networks : Remote Support. Find us in http://www.thebestof.co.uk/petersfield From Robert at saq.co.uk Tue Nov 25 06:39:21 2008 From: Robert at saq.co.uk (Robert Dunkley) Date: Tue, 25 Nov 2008 14:39:21 -0000 Subject: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" References: <4DCBAA39733E8048992FB7737126041910FFD96A@gsmbnbp23es.firmwide.corp.gs.com> Message-ID: Hi Eric, Thanks for the response. OpenSM is running and set to start on bootup on MachineB: ps aux | grep open root 5616 0.0 0.1 142004 1396 ? Sl 13:39 0:00 /usr/sbin/opensm -t 200 -f /var/log/opensm.log -g 0 The log on Machine B just logs this every 10 seconds: Nov 25 14:34:21 148541 [477A7940] 0x01 -> __osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state IB_SMINFO_STATE_DISCOVERING Nov 25 14:34:31 153173 [477A7940] 0x80 -> SM port is down Ibstat confirms port is in polling state on MachineB. MachineA however is in a bad state, I tried the openibd restart command, it accepted the command but after 5 minutes shows no progress of doing anything and is just at the cursor. Is some sort of forced restart of openibd possible? Thanks, Rob -----Original Message----- From: Baur, Eric [mailto:Eric.Baur at gs.com] Sent: 25 November 2008 14:31 To: Robert Dunkley Subject: RE: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource Temporarily unavailable" Robert- Is OpenSM set to start on boot? chkconfig --list | grep opensmd If not: chkconfig opensmd on and: /etc/init.d/opensmd start You can also restart openib without rebooting the machines. /etc/init.d/openibd restart -Eric -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Robert Dunkley Sent: Tuesday, November 25, 2008 9:21 AM To: general at lists.openfabrics.org Subject: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource Temporarily unavailable" Hi everyone, I'm using a setup of two machines (Lets call them A and B) directly connected by 1 cable. Each machine has a Mellanox MT25204 (Gen3 Mellanox PCI-E Infiniband card) and uses IPOIB, they run Centos 5.2 with OFED 1.3 installed, Machine B runs OpenSM. All was working fine. I shutdown Machine A did some maintenance and then powered it on again, everything is OK again. I then shutdown Machine B (The one running OpenSM), this seemed to really upset Machine A. After booting Machine B again, Machine B looks OK with the port down and in polling state. Machine A however gives the following error if I run ibstat: ibpanic: [11406] main: stat of IB device 'mthca0' failed: (Resource temporarily unavailable) I don't want to reboot Machine A as it must synch data with Machine B over the Infiniband link first. Does anyone have any idea how to fix machine A? Thanks, Rob The SAQ Group Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ SEMTEC Limited Trading as SAQ is Registered in England & Wales Company Number: 06481952 http://www.saqnet.co.uk AS29219 SAQ Group Delivers high quality, honestly priced communication and I.T. services to UK Business. DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : Backups : Managed Networks : Remote Support. Find us in http://www.thebestof.co.uk/petersfield _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From hal.rosenstock at gmail.com Tue Nov 25 06:46:28 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 25 Nov 2008 09:46:28 -0500 Subject: ***SPAM*** Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" In-Reply-To: References: Message-ID: On Tue, Nov 25, 2008 at 9:20 AM, Robert Dunkley wrote: > Hi everyone, > > I'm using a setup of two machines (Lets call them A and B) directly > connected by 1 cable. Each machine has a Mellanox MT25204 (Gen3 Mellanox > PCI-E Infiniband card) and uses IPOIB, they run Centos 5.2 with OFED 1.3 > installed, Machine B runs OpenSM. > > All was working fine. I shutdown Machine A did some maintenance and then > powered it on again, everything is OK again. I then shutdown Machine B > (The one running OpenSM), this seemed to really upset Machine A. After > booting Machine B again, Machine B looks OK with the port down and in > polling state. Is this with machine A powered off ? > Machine A however gives the following error if I run > ibstat: ibpanic: [11406] main: stat of IB device 'mthca0' failed: > (Resource temporarily unavailable) Does /sys/class/infiniband/mthca0 exist on machine A ? If so, what files are there ? -- Hal > I don't want to reboot Machine A as it must synch data with Machine B > over the Infiniband link first. Does anyone have any idea how to fix > machine A? > > Thanks, > > Rob > > The SAQ Group > > Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ > SEMTEC Limited Trading as SAQ is Registered in England & Wales > Company Number: 06481952 > > > > http://www.saqnet.co.uk AS29219 > > SAQ Group Delivers high quality, honestly priced communication and I.T. services to UK Business. > > DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : Backups : Managed Networks : Remote Support. > > Find us in http://www.thebestof.co.uk/petersfield > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From hal.rosenstock at gmail.com Tue Nov 25 06:49:23 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 25 Nov 2008 09:49:23 -0500 Subject: ***SPAM*** Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" In-Reply-To: References: <4DCBAA39733E8048992FB7737126041910FFD96A@gsmbnbp23es.firmwide.corp.gs.com> Message-ID: On Tue, Nov 25, 2008 at 9:39 AM, Robert Dunkley wrote: > Hi Eric, > > Thanks for the response. OpenSM is running and set to start on bootup on > MachineB: > ps aux | grep open > root 5616 0.0 0.1 142004 1396 ? Sl 13:39 0:00 > /usr/sbin/opensm -t 200 -f /var/log/opensm.log -g 0 > > The log on Machine B just logs this every 10 seconds: > Nov 25 14:34:21 148541 [477A7940] 0x01 -> > __osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal > OSM_SM_SIGNAL_DISCOVER in state IB_SMINFO_STATE_DISCOVERING > Nov 25 14:34:31 153173 [477A7940] 0x80 -> SM port is down > > Ibstat confirms port is in polling state on MachineB. Is the port in init or down ? > MachineA however is in a bad state, Any additional details on this ? Can you kill/unload all the ib stuff and reload it ? That would be gentler than rebooting. -- Hal >I tried the openibd restart command, it accepted the > command but after 5 minutes shows no progress of doing anything and is > just at the cursor. Is some sort of forced restart of openibd possible? > > Thanks, > > Rob > > > -----Original Message----- > From: Baur, Eric [mailto:Eric.Baur at gs.com] > Sent: 25 November 2008 14:31 > To: Robert Dunkley > Subject: RE: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource > Temporarily unavailable" > > Robert- > > Is OpenSM set to start on boot? > chkconfig --list | grep opensmd > > If not: chkconfig opensmd on > and: /etc/init.d/opensmd start > > You can also restart openib without rebooting the machines. > /etc/init.d/openibd restart > > -Eric > > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Robert > Dunkley > Sent: Tuesday, November 25, 2008 9:21 AM > To: general at lists.openfabrics.org > Subject: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource > Temporarily unavailable" > > Hi everyone, > > I'm using a setup of two machines (Lets call them A and B) directly > connected by 1 cable. Each machine has a Mellanox MT25204 (Gen3 Mellanox > PCI-E Infiniband card) and uses IPOIB, they run Centos 5.2 with OFED 1.3 > installed, Machine B runs OpenSM. > > All was working fine. I shutdown Machine A did some maintenance and then > powered it on again, everything is OK again. I then shutdown Machine B > (The one running OpenSM), this seemed to really upset Machine A. After > booting Machine B again, Machine B looks OK with the port down and in > polling state. Machine A however gives the following error if I run > ibstat: ibpanic: [11406] main: stat of IB device 'mthca0' failed: > (Resource temporarily unavailable) > > I don't want to reboot Machine A as it must synch data with Machine B > over the Infiniband link first. Does anyone have any idea how to fix > machine A? > > Thanks, > > Rob > > The SAQ Group > > Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ > SEMTEC Limited Trading as SAQ is Registered in England & Wales > Company Number: 06481952 > > > > http://www.saqnet.co.uk AS29219 > > SAQ Group Delivers high quality, honestly priced communication and I.T. > services to UK Business. > > DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : > Backups : Managed Networks : Remote Support. > > Find us in http://www.thebestof.co.uk/petersfield > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From Robert at saq.co.uk Tue Nov 25 06:46:56 2008 From: Robert at saq.co.uk (Robert Dunkley) Date: Tue, 25 Nov 2008 14:46:56 -0000 Subject: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" References: Message-ID: Hi Hal, Machine A is powered on. It was after powering down machine B and OpenSM with it that Machine A went weird. /sys/class/infiniband/mthca0 exists on Machine A, contents is: board_id fw_ver hw_rev node_guid ports sys_image_guid device hca_type node_desc node_type subsystem uevent Thanks, Rob -----Original Message----- From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] Sent: 25 November 2008 14:46 To: Robert Dunkley Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" On Tue, Nov 25, 2008 at 9:20 AM, Robert Dunkley wrote: > Hi everyone, > > I'm using a setup of two machines (Lets call them A and B) directly > connected by 1 cable. Each machine has a Mellanox MT25204 (Gen3 Mellanox > PCI-E Infiniband card) and uses IPOIB, they run Centos 5.2 with OFED 1.3 > installed, Machine B runs OpenSM. > > All was working fine. I shutdown Machine A did some maintenance and then > powered it on again, everything is OK again. I then shutdown Machine B > (The one running OpenSM), this seemed to really upset Machine A. After > booting Machine B again, Machine B looks OK with the port down and in > polling state. Is this with machine A powered off ? > Machine A however gives the following error if I run > ibstat: ibpanic: [11406] main: stat of IB device 'mthca0' failed: > (Resource temporarily unavailable) Does /sys/class/infiniband/mthca0 exist on machine A ? If so, what files are there ? -- Hal > I don't want to reboot Machine A as it must synch data with Machine B > over the Infiniband link first. Does anyone have any idea how to fix > machine A? > > Thanks, > > Rob > > The SAQ Group > > Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ > SEMTEC Limited Trading as SAQ is Registered in England & Wales > Company Number: 06481952 > > > > http://www.saqnet.co.uk AS29219 > > SAQ Group Delivers high quality, honestly priced communication and I.T. services to UK Business. > > DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : Backups : Managed Networks : Remote Support. > > Find us in http://www.thebestof.co.uk/petersfield > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From vlad at mellanox.co.il Tue Nov 25 06:56:04 2008 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 25 Nov 2008 16:56:04 +0200 Subject: [ofa-general] [PATCH] IPoIB: Prevent address handles leak. Message-ID: <20081125145604.GA22726@mellanox.co.il> When removing the ib_ipoib module, ipoib_ib_dev_stop() is called and all address handles (ah) in the dead_ahs list are reaped. However, some ah's may be still be added to the dead list by ipoib_mcast_free() after ipoib_ib_dev_stop() is called. These ah's will not be freed. The solution is to reap any remaining ah's after multicast device is really flushed during cleanup. Based on a recommendation by Yossi Etigin. This fixes Bugzilla https://bugs.openfabrics.org/show_bug.cgi?id=1410 Signed-off-by: Vladimir Sokolovsky --- diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 66cafa2..2b77bbd 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -640,6 +640,25 @@ void ipoib_reap_ah(struct work_struct *work) round_jiffies_relative(HZ)); } +static void ipoib_ah_dev_cleanup(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + unsigned long begin; + + begin = jiffies; + + while (!list_empty(&priv->dead_ahs)) { + __ipoib_reap_ah(dev); + + if (time_after(jiffies, begin + HZ)) { + ipoib_warn(priv, "timing out; will leak address handles\n"); + break; + } + + msleep(1); + } +} + static void ipoib_ib_tx_timer_func(unsigned long ctx) { drain_tx_cq((struct net_device *)ctx); @@ -861,18 +880,7 @@ timeout: if (flush) flush_workqueue(ipoib_workqueue); - begin = jiffies; - - while (!list_empty(&priv->dead_ahs)) { - __ipoib_reap_ah(dev); - - if (time_after(jiffies, begin + HZ)) { - ipoib_warn(priv, "timing out; will leak address handles\n"); - break; - } - - msleep(1); - } + ipoib_ah_dev_cleanup(dev); ib_req_notify_cq(priv->recv_cq, IB_CQ_NEXT_COMP); @@ -1005,6 +1013,7 @@ void ipoib_ib_dev_cleanup(struct net_device *dev) ipoib_mcast_stop_thread(dev, 1); ipoib_mcast_dev_flush(dev); + ipoib_ah_dev_cleanup(dev); ipoib_transport_dev_cleanup(dev); } From hal.rosenstock at gmail.com Tue Nov 25 06:56:41 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 25 Nov 2008 09:56:41 -0500 Subject: ***SPAM*** Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" In-Reply-To: References: Message-ID: Hi Rob, On Tue, Nov 25, 2008 at 9:46 AM, Robert Dunkley wrote: > Hi Hal, > > Machine A is powered on. It was after powering down machine B and OpenSM > with it that Machine A went weird. > /sys/class/infiniband/mthca0 exists on Machine A, contents is: > board_id fw_ver hw_rev node_guid ports sys_image_guid > device hca_type node_desc node_type subsystem uevent What about machine B ? Do these files exist ? Also what is the port state (down or init or something else) ? -- Hal > Thanks, > > Rob > > -----Original Message----- > From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] > Sent: 25 November 2008 14:46 > To: Robert Dunkley > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource > Temporarily unavailable" > > On Tue, Nov 25, 2008 at 9:20 AM, Robert Dunkley > wrote: >> Hi everyone, >> >> I'm using a setup of two machines (Lets call them A and B) directly >> connected by 1 cable. Each machine has a Mellanox MT25204 (Gen3 > Mellanox >> PCI-E Infiniband card) and uses IPOIB, they run Centos 5.2 with OFED > 1.3 >> installed, Machine B runs OpenSM. >> >> All was working fine. I shutdown Machine A did some maintenance and > then >> powered it on again, everything is OK again. I then shutdown Machine B >> (The one running OpenSM), this seemed to really upset Machine A. After >> booting Machine B again, Machine B looks OK with the port down and in >> polling state. > > Is this with machine A powered off ? > >> Machine A however gives the following error if I run >> ibstat: ibpanic: [11406] main: stat of IB device 'mthca0' failed: >> (Resource temporarily unavailable) > > Does /sys/class/infiniband/mthca0 exist on machine A ? If so, what > files are there ? > > -- Hal > >> I don't want to reboot Machine A as it must synch data with Machine B >> over the Infiniband link first. Does anyone have any idea how to fix >> machine A? >> >> Thanks, >> >> Rob >> >> The SAQ Group >> >> Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ >> SEMTEC Limited Trading as SAQ is Registered in England & Wales >> Company Number: 06481952 >> >> >> >> http://www.saqnet.co.uk AS29219 >> >> SAQ Group Delivers high quality, honestly priced communication and > I.T. services to UK Business. >> >> DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : > Backups : Managed Networks : Remote Support. >> >> Find us in http://www.thebestof.co.uk/petersfield >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general >> > From Robert at saq.co.uk Tue Nov 25 06:54:07 2008 From: Robert at saq.co.uk (Robert Dunkley) Date: Tue, 25 Nov 2008 14:54:07 -0000 Subject: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" References: <4DCBAA39733E8048992FB7737126041910FFD96A@gsmbnbp23es.firmwide.corp.gs.com> Message-ID: Hi Hal, Thank you for your help. Ibstat on MachineB: CA 'mthca0' CA type: MT25204 Number of ports: 1 Firmware version: 1.2.0 Hardware version: a0 Node GUID: 0x0002c9020022d428 System image GUID: 0x0002c9020022d42b Port 1: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a6a Port GUID: 0x0002c9020022d429 Machine A is operating normally with the exception of Infiniband which broke after powering down Machine B and did not recover once Machine B was powered on again. An extract from the log of Machine A: Nov 25 14:30:21 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) Nov 25 14:30:31 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_CQ failed (-11) Nov 25 14:30:41 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) Nov 25 14:30:51 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_CQ failed (-11) Nov 25 14:31:01 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) Nov 25 14:31:11 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_SRQ failed (-11) Nov 25 14:31:21 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) Nov 25 14:32:01 mrhappy last message repeated 3 times Nov 25 14:32:11 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) Thanks again, Rob -----Original Message----- From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] Sent: 25 November 2008 14:49 To: Robert Dunkley Cc: Baur, Eric; general at lists.openfabrics.org Subject: Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" On Tue, Nov 25, 2008 at 9:39 AM, Robert Dunkley wrote: > Hi Eric, > > Thanks for the response. OpenSM is running and set to start on bootup on > MachineB: > ps aux | grep open > root 5616 0.0 0.1 142004 1396 ? Sl 13:39 0:00 > /usr/sbin/opensm -t 200 -f /var/log/opensm.log -g 0 > > The log on Machine B just logs this every 10 seconds: > Nov 25 14:34:21 148541 [477A7940] 0x01 -> > __osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal > OSM_SM_SIGNAL_DISCOVER in state IB_SMINFO_STATE_DISCOVERING > Nov 25 14:34:31 153173 [477A7940] 0x80 -> SM port is down > > Ibstat confirms port is in polling state on MachineB. Is the port in init or down ? > MachineA however is in a bad state, Any additional details on this ? Can you kill/unload all the ib stuff and reload it ? That would be gentler than rebooting. -- Hal >I tried the openibd restart command, it accepted the > command but after 5 minutes shows no progress of doing anything and is > just at the cursor. Is some sort of forced restart of openibd possible? > > Thanks, > > Rob > > > -----Original Message----- > From: Baur, Eric [mailto:Eric.Baur at gs.com] > Sent: 25 November 2008 14:31 > To: Robert Dunkley > Subject: RE: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource > Temporarily unavailable" > > Robert- > > Is OpenSM set to start on boot? > chkconfig --list | grep opensmd > > If not: chkconfig opensmd on > and: /etc/init.d/opensmd start > > You can also restart openib without rebooting the machines. > /etc/init.d/openibd restart > > -Eric > > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Robert > Dunkley > Sent: Tuesday, November 25, 2008 9:21 AM > To: general at lists.openfabrics.org > Subject: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource > Temporarily unavailable" > > Hi everyone, > > I'm using a setup of two machines (Lets call them A and B) directly > connected by 1 cable. Each machine has a Mellanox MT25204 (Gen3 Mellanox > PCI-E Infiniband card) and uses IPOIB, they run Centos 5.2 with OFED 1.3 > installed, Machine B runs OpenSM. > > All was working fine. I shutdown Machine A did some maintenance and then > powered it on again, everything is OK again. I then shutdown Machine B > (The one running OpenSM), this seemed to really upset Machine A. After > booting Machine B again, Machine B looks OK with the port down and in > polling state. Machine A however gives the following error if I run > ibstat: ibpanic: [11406] main: stat of IB device 'mthca0' failed: > (Resource temporarily unavailable) > > I don't want to reboot Machine A as it must synch data with Machine B > over the Infiniband link first. Does anyone have any idea how to fix > machine A? > > Thanks, > > Rob > > The SAQ Group > > Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ > SEMTEC Limited Trading as SAQ is Registered in England & Wales > Company Number: 06481952 > > > > http://www.saqnet.co.uk AS29219 > > SAQ Group Delivers high quality, honestly priced communication and I.T. > services to UK Business. > > DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : > Backups : Managed Networks : Remote Support. > > Find us in http://www.thebestof.co.uk/petersfield > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From hal.rosenstock at gmail.com Tue Nov 25 07:00:22 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 25 Nov 2008 10:00:22 -0500 Subject: ***SPAM*** Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" In-Reply-To: References: <4DCBAA39733E8048992FB7737126041910FFD96A@gsmbnbp23es.firmwide.corp.gs.com> Message-ID: Hi Rob, On Tue, Nov 25, 2008 at 9:54 AM, Robert Dunkley wrote: > Hi Hal, > > Thank you for your help. > > Ibstat on MachineB: > CA 'mthca0' > CA type: MT25204 > Number of ports: 1 > Firmware version: 1.2.0 > Hardware version: a0 > Node GUID: 0x0002c9020022d428 > System image GUID: 0x0002c9020022d42b > Port 1: > State: Down Is machine A on ? Is mthca loaded there ? If so, this should at least be init but the driver errors below may preclude this from occurring. > Physical state: Polling > Rate: 10 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x02510a6a > Port GUID: 0x0002c9020022d429 > > Machine A is operating normally with the exception of Infiniband which > broke after powering down Machine B and did not recover once Machine B > was powered on again. An extract from the log of Machine A: > Nov 25 14:30:21 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT failed > (-11) > Nov 25 14:30:31 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_CQ failed > (-11) > Nov 25 14:30:41 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT failed > (-11) > Nov 25 14:30:51 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_CQ failed > (-11) > Nov 25 14:31:01 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT failed > (-11) > Nov 25 14:31:11 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_SRQ failed > (-11) > Nov 25 14:31:21 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT failed > (-11) > Nov 25 14:32:01 mrhappy last message repeated 3 times > Nov 25 14:32:11 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT failed > (-11) -11 is EAGAIN. Not sure what this is used for in the mthca driver. Can you unload and reload the IB stack especially mthca driver ? -- Hal > Thanks again, > > Rob > > -----Original Message----- > From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] > Sent: 25 November 2008 14:49 > To: Robert Dunkley > Cc: Baur, Eric; general at lists.openfabrics.org > Subject: Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource > Temporarily unavailable" > > On Tue, Nov 25, 2008 at 9:39 AM, Robert Dunkley > wrote: >> Hi Eric, >> >> Thanks for the response. OpenSM is running and set to start on bootup > on >> MachineB: >> ps aux | grep open >> root 5616 0.0 0.1 142004 1396 ? Sl 13:39 0:00 >> /usr/sbin/opensm -t 200 -f /var/log/opensm.log -g 0 >> >> The log on Machine B just logs this every 10 seconds: >> Nov 25 14:34:21 148541 [477A7940] 0x01 -> >> __osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal >> OSM_SM_SIGNAL_DISCOVER in state IB_SMINFO_STATE_DISCOVERING >> Nov 25 14:34:31 153173 [477A7940] 0x80 -> SM port is down >> >> Ibstat confirms port is in polling state on MachineB. > > Is the port in init or down ? > >> MachineA however is in a bad state, > > Any additional details on this ? > > Can you kill/unload all the ib stuff and reload it ? That would be > gentler than rebooting. > > -- Hal > >>I tried the openibd restart command, it accepted the >> command but after 5 minutes shows no progress of doing anything and is >> just at the cursor. Is some sort of forced restart of openibd > possible? >> >> Thanks, >> >> Rob >> >> >> -----Original Message----- >> From: Baur, Eric [mailto:Eric.Baur at gs.com] >> Sent: 25 November 2008 14:31 >> To: Robert Dunkley >> Subject: RE: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource >> Temporarily unavailable" >> >> Robert- >> >> Is OpenSM set to start on boot? >> chkconfig --list | grep opensmd >> >> If not: chkconfig opensmd on >> and: /etc/init.d/opensmd start >> >> You can also restart openib without rebooting the machines. >> /etc/init.d/openibd restart >> >> -Eric >> >> -----Original Message----- >> From: general-bounces at lists.openfabrics.org >> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Robert >> Dunkley >> Sent: Tuesday, November 25, 2008 9:21 AM >> To: general at lists.openfabrics.org >> Subject: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource >> Temporarily unavailable" >> >> Hi everyone, >> >> I'm using a setup of two machines (Lets call them A and B) directly >> connected by 1 cable. Each machine has a Mellanox MT25204 (Gen3 > Mellanox >> PCI-E Infiniband card) and uses IPOIB, they run Centos 5.2 with OFED > 1.3 >> installed, Machine B runs OpenSM. >> >> All was working fine. I shutdown Machine A did some maintenance and > then >> powered it on again, everything is OK again. I then shutdown Machine B >> (The one running OpenSM), this seemed to really upset Machine A. After >> booting Machine B again, Machine B looks OK with the port down and in >> polling state. Machine A however gives the following error if I run >> ibstat: ibpanic: [11406] main: stat of IB device 'mthca0' failed: >> (Resource temporarily unavailable) >> >> I don't want to reboot Machine A as it must synch data with Machine B >> over the Infiniband link first. Does anyone have any idea how to fix >> machine A? >> >> Thanks, >> >> Rob >> >> The SAQ Group >> >> Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ >> SEMTEC Limited Trading as SAQ is Registered in England & Wales >> Company Number: 06481952 >> >> >> >> http://www.saqnet.co.uk AS29219 >> >> SAQ Group Delivers high quality, honestly priced communication and > I.T. >> services to UK Business. >> >> DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : >> Backups : Managed Networks : Remote Support. >> >> Find us in http://www.thebestof.co.uk/petersfield >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general >> > From Robert at saq.co.uk Tue Nov 25 07:21:10 2008 From: Robert at saq.co.uk (Robert Dunkley) Date: Tue, 25 Nov 2008 15:21:10 -0000 Subject: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" References: <4DCBAA39733E8048992FB7737126041910FFD96A@gsmbnbp23es.firmwide.corp.gs.com> Message-ID: Hi Hal, Thanks again, I will try this in a minute. I think I have found the moment it went bad on Machine A using Dmesg: ib_mthca 0000:87:00.0: Catastrophic error detected: unknown error ib_mthca 0000:87:00.0: buf[00]: ffffffff ib_mthca 0000:87:00.0: buf[01]: ffffffff ib_mthca 0000:87:00.0: buf[02]: ffffffff ib_mthca 0000:87:00.0: buf[03]: ffffffff ib_mthca 0000:87:00.0: buf[04]: ffffffff ib_mthca 0000:87:00.0: buf[05]: ffffffff ib_mthca 0000:87:00.0: buf[06]: ffffffff ib_mthca 0000:87:00.0: buf[07]: ffffffff ib_mthca 0000:87:00.0: buf[08]: ffffffff ib_mthca 0000:87:00.0: buf[09]: ffffffff ib_mthca 0000:87:00.0: buf[0a]: ffffffff ib_mthca 0000:87:00.0: buf[0b]: ffffffff ib_mthca 0000:87:00.0: buf[0c]: ffffffff ib_mthca 0000:87:00.0: buf[0d]: ffffffff ib_mthca 0000:87:00.0: buf[0e]: ffffffff ib_mthca 0000:87:00.0: buf[0f]: ffffffff ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib0: ib_query_gid() failed ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib0: ib_query_port failed ib0: Failed to modify QP to ERROR state ib0: timing out; 1 sends 250 receives not completed ib0: Failed to modify QP to RESET state ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib_mthca 0000:87:00.0: HW2SW_CQ failed (-11) ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib_mthca 0000:87:00.0: HW2SW_CQ failed (-11) ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib_mthca 0000:87:00.0: HW2SW_SRQ failed (-11) ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) Does this help to pinpoint what might have caused this? Thanks, Rob -----Original Message----- From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] Sent: 25 November 2008 15:19 To: Robert Dunkley Subject: Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" Hi Rob, On Tue, Nov 25, 2008 at 10:01 AM, Robert Dunkley wrote: > Hi Hal, > > Machine A is definitely on and I have had the cable connection checked. > I'm afraid I'm not much of a techy, how do I unload and reload the IB > stack? It depends on what you have running... Is it just OpenSM and IPoIB ? Kill off opensm Use modprobe -r to remove all the ib_ modules. You can find them via lsmod | grep ib_. There is a dependency order. If you can get them all unloaded, reload them in the reverse order and hopefully things will be better... -- Hal > Thanks, > > Rob > > > -----Original Message----- > From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] > Sent: 25 November 2008 15:00 > To: Robert Dunkley > Cc: Baur, Eric; general at lists.openfabrics.org > Subject: Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource > Temporarily unavailable" > > Hi Rob, > > On Tue, Nov 25, 2008 at 9:54 AM, Robert Dunkley > wrote: >> Hi Hal, >> >> Thank you for your help. >> >> Ibstat on MachineB: >> CA 'mthca0' >> CA type: MT25204 >> Number of ports: 1 >> Firmware version: 1.2.0 >> Hardware version: a0 >> Node GUID: 0x0002c9020022d428 >> System image GUID: 0x0002c9020022d42b >> Port 1: >> State: Down > > Is machine A on ? Is mthca loaded there ? If so, this should at least > be init but the driver errors below may preclude this from occurring. > >> Physical state: Polling >> Rate: 10 >> Base lid: 0 >> LMC: 0 >> SM lid: 0 >> Capability mask: 0x02510a6a >> Port GUID: 0x0002c9020022d429 >> >> Machine A is operating normally with the exception of Infiniband which >> broke after powering down Machine B and did not recover once Machine B >> was powered on again. An extract from the log of Machine A: >> Nov 25 14:30:21 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT > failed >> (-11) >> Nov 25 14:30:31 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_CQ failed >> (-11) >> Nov 25 14:30:41 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT > failed >> (-11) >> Nov 25 14:30:51 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_CQ failed >> (-11) >> Nov 25 14:31:01 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT > failed >> (-11) >> Nov 25 14:31:11 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_SRQ > failed >> (-11) >> Nov 25 14:31:21 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT > failed >> (-11) >> Nov 25 14:32:01 mrhappy last message repeated 3 times >> Nov 25 14:32:11 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT > failed >> (-11) > > -11 is EAGAIN. Not sure what this is used for in the mthca driver. > > Can you unload and reload the IB stack especially mthca driver ? > > -- Hal > >> Thanks again, >> >> Rob >> >> -----Original Message----- >> From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] >> Sent: 25 November 2008 14:49 >> To: Robert Dunkley >> Cc: Baur, Eric; general at lists.openfabrics.org >> Subject: Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - > "Resource >> Temporarily unavailable" >> >> On Tue, Nov 25, 2008 at 9:39 AM, Robert Dunkley >> wrote: >>> Hi Eric, >>> >>> Thanks for the response. OpenSM is running and set to start on bootup >> on >>> MachineB: >>> ps aux | grep open >>> root 5616 0.0 0.1 142004 1396 ? Sl 13:39 0:00 >>> /usr/sbin/opensm -t 200 -f /var/log/opensm.log -g 0 >>> >>> The log on Machine B just logs this every 10 seconds: >>> Nov 25 14:34:21 148541 [477A7940] 0x01 -> >>> __osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal >>> OSM_SM_SIGNAL_DISCOVER in state IB_SMINFO_STATE_DISCOVERING >>> Nov 25 14:34:31 153173 [477A7940] 0x80 -> SM port is down >>> >>> Ibstat confirms port is in polling state on MachineB. >> >> Is the port in init or down ? >> >>> MachineA however is in a bad state, >> >> Any additional details on this ? >> >> Can you kill/unload all the ib stuff and reload it ? That would be >> gentler than rebooting. >> >> -- Hal >> >>>I tried the openibd restart command, it accepted the >>> command but after 5 minutes shows no progress of doing anything and > is >>> just at the cursor. Is some sort of forced restart of openibd >> possible? >>> >>> Thanks, >>> >>> Rob >>> >>> >>> -----Original Message----- >>> From: Baur, Eric [mailto:Eric.Baur at gs.com] >>> Sent: 25 November 2008 14:31 >>> To: Robert Dunkley >>> Subject: RE: [ofa-general] Mellanox Gen3,Linux and ibpanic - > "Resource >>> Temporarily unavailable" >>> >>> Robert- >>> >>> Is OpenSM set to start on boot? >>> chkconfig --list | grep opensmd >>> >>> If not: chkconfig opensmd on >>> and: /etc/init.d/opensmd start >>> >>> You can also restart openib without rebooting the machines. >>> /etc/init.d/openibd restart >>> >>> -Eric >>> >>> -----Original Message----- >>> From: general-bounces at lists.openfabrics.org >>> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Robert >>> Dunkley >>> Sent: Tuesday, November 25, 2008 9:21 AM >>> To: general at lists.openfabrics.org >>> Subject: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource >>> Temporarily unavailable" >>> >>> Hi everyone, >>> >>> I'm using a setup of two machines (Lets call them A and B) directly >>> connected by 1 cable. Each machine has a Mellanox MT25204 (Gen3 >> Mellanox >>> PCI-E Infiniband card) and uses IPOIB, they run Centos 5.2 with OFED >> 1.3 >>> installed, Machine B runs OpenSM. >>> >>> All was working fine. I shutdown Machine A did some maintenance and >> then >>> powered it on again, everything is OK again. I then shutdown Machine > B >>> (The one running OpenSM), this seemed to really upset Machine A. > After >>> booting Machine B again, Machine B looks OK with the port down and in >>> polling state. Machine A however gives the following error if I run >>> ibstat: ibpanic: [11406] main: stat of IB device 'mthca0' failed: >>> (Resource temporarily unavailable) >>> >>> I don't want to reboot Machine A as it must synch data with Machine B >>> over the Infiniband link first. Does anyone have any idea how to fix >>> machine A? >>> >>> Thanks, >>> >>> Rob >>> >>> The SAQ Group >>> >>> Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ >>> SEMTEC Limited Trading as SAQ is Registered in England & Wales >>> Company Number: 06481952 >>> >>> >>> >>> http://www.saqnet.co.uk AS29219 >>> >>> SAQ Group Delivers high quality, honestly priced communication and >> I.T. >>> services to UK Business. >>> >>> DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : >>> Backups : Managed Networks : Remote Support. >>> >>> Find us in http://www.thebestof.co.uk/petersfield >>> >>> _______________________________________________ >>> general mailing list >>> general at lists.openfabrics.org >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>> >>> To unsubscribe, please visit >>> http://openib.org/mailman/listinfo/openib-general >>> _______________________________________________ >>> general mailing list >>> general at lists.openfabrics.org >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>> >>> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >>> >> > From hal.rosenstock at gmail.com Tue Nov 25 07:30:39 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 25 Nov 2008 10:30:39 -0500 Subject: ***SPAM*** Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" In-Reply-To: References: <4DCBAA39733E8048992FB7737126041910FFD96A@gsmbnbp23es.firmwide.corp.gs.com> Message-ID: Hi Rob, On Tue, Nov 25, 2008 at 10:21 AM, Robert Dunkley wrote: > Hi Hal, > > Thanks again, I will try this in a minute. I think I have found the > moment it went bad on Machine A using Dmesg: > ib_mthca 0000:87:00.0: Catastrophic error detected: unknown error Definitely need to reset mthca after this. > ib_mthca 0000:87:00.0: buf[00]: ffffffff > ib_mthca 0000:87:00.0: buf[01]: ffffffff > ib_mthca 0000:87:00.0: buf[02]: ffffffff > ib_mthca 0000:87:00.0: buf[03]: ffffffff > ib_mthca 0000:87:00.0: buf[04]: ffffffff > ib_mthca 0000:87:00.0: buf[05]: ffffffff > ib_mthca 0000:87:00.0: buf[06]: ffffffff > ib_mthca 0000:87:00.0: buf[07]: ffffffff > ib_mthca 0000:87:00.0: buf[08]: ffffffff > ib_mthca 0000:87:00.0: buf[09]: ffffffff > ib_mthca 0000:87:00.0: buf[0a]: ffffffff > ib_mthca 0000:87:00.0: buf[0b]: ffffffff > ib_mthca 0000:87:00.0: buf[0c]: ffffffff > ib_mthca 0000:87:00.0: buf[0d]: ffffffff > ib_mthca 0000:87:00.0: buf[0e]: ffffffff > ib_mthca 0000:87:00.0: buf[0f]: ffffffff > ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) > ib0: ib_query_gid() failed > ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) > ib0: ib_query_port failed > ib0: Failed to modify QP to ERROR state > ib0: timing out; 1 sends 250 receives not completed > ib0: Failed to modify QP to RESET state > ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) > ib_mthca 0000:87:00.0: HW2SW_CQ failed (-11) > ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) > ib_mthca 0000:87:00.0: HW2SW_CQ failed (-11) > ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) > ib_mthca 0000:87:00.0: HW2SW_SRQ failed (-11) > ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) > ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) > ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) > ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) > ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) > > Does this help to pinpoint what might have caused this? Maybe Mellanox can comment. What firmware version are you using ? -- Hal > > Thanks, > > Rob > > > -----Original Message----- > From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] > Sent: 25 November 2008 15:19 > To: Robert Dunkley > Subject: Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource > Temporarily unavailable" > > Hi Rob, > > On Tue, Nov 25, 2008 at 10:01 AM, Robert Dunkley > wrote: >> Hi Hal, >> >> Machine A is definitely on and I have had the cable connection > checked. >> I'm afraid I'm not much of a techy, how do I unload and reload the IB >> stack? > > It depends on what you have running... Is it just OpenSM and IPoIB ? > > Kill off opensm > > Use modprobe -r to remove all the ib_ modules. You can find them via > lsmod | grep ib_. There is a dependency order. > > If you can get them all unloaded, reload them in the reverse order and > hopefully things will be better... > > -- Hal > >> Thanks, >> >> Rob >> >> >> -----Original Message----- >> From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] >> Sent: 25 November 2008 15:00 >> To: Robert Dunkley >> Cc: Baur, Eric; general at lists.openfabrics.org >> Subject: Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - > "Resource >> Temporarily unavailable" >> >> Hi Rob, >> >> On Tue, Nov 25, 2008 at 9:54 AM, Robert Dunkley >> wrote: >>> Hi Hal, >>> >>> Thank you for your help. >>> >>> Ibstat on MachineB: >>> CA 'mthca0' >>> CA type: MT25204 >>> Number of ports: 1 >>> Firmware version: 1.2.0 >>> Hardware version: a0 >>> Node GUID: 0x0002c9020022d428 >>> System image GUID: 0x0002c9020022d42b >>> Port 1: >>> State: Down >> >> Is machine A on ? Is mthca loaded there ? If so, this should at least >> be init but the driver errors below may preclude this from occurring. >> >>> Physical state: Polling >>> Rate: 10 >>> Base lid: 0 >>> LMC: 0 >>> SM lid: 0 >>> Capability mask: 0x02510a6a >>> Port GUID: 0x0002c9020022d429 >>> >>> Machine A is operating normally with the exception of Infiniband > which >>> broke after powering down Machine B and did not recover once Machine > B >>> was powered on again. An extract from the log of Machine A: >>> Nov 25 14:30:21 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT >> failed >>> (-11) >>> Nov 25 14:30:31 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_CQ > failed >>> (-11) >>> Nov 25 14:30:41 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT >> failed >>> (-11) >>> Nov 25 14:30:51 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_CQ > failed >>> (-11) >>> Nov 25 14:31:01 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT >> failed >>> (-11) >>> Nov 25 14:31:11 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_SRQ >> failed >>> (-11) >>> Nov 25 14:31:21 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT >> failed >>> (-11) >>> Nov 25 14:32:01 mrhappy last message repeated 3 times >>> Nov 25 14:32:11 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT >> failed >>> (-11) >> >> -11 is EAGAIN. Not sure what this is used for in the mthca driver. >> >> Can you unload and reload the IB stack especially mthca driver ? >> >> -- Hal >> >>> Thanks again, >>> >>> Rob >>> >>> -----Original Message----- >>> From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] >>> Sent: 25 November 2008 14:49 >>> To: Robert Dunkley >>> Cc: Baur, Eric; general at lists.openfabrics.org >>> Subject: Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - >> "Resource >>> Temporarily unavailable" >>> >>> On Tue, Nov 25, 2008 at 9:39 AM, Robert Dunkley >>> wrote: >>>> Hi Eric, >>>> >>>> Thanks for the response. OpenSM is running and set to start on > bootup >>> on >>>> MachineB: >>>> ps aux | grep open >>>> root 5616 0.0 0.1 142004 1396 ? Sl 13:39 0:00 >>>> /usr/sbin/opensm -t 200 -f /var/log/opensm.log -g 0 >>>> >>>> The log on Machine B just logs this every 10 seconds: >>>> Nov 25 14:34:21 148541 [477A7940] 0x01 -> >>>> __osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal >>>> OSM_SM_SIGNAL_DISCOVER in state IB_SMINFO_STATE_DISCOVERING >>>> Nov 25 14:34:31 153173 [477A7940] 0x80 -> SM port is down >>>> >>>> Ibstat confirms port is in polling state on MachineB. >>> >>> Is the port in init or down ? >>> >>>> MachineA however is in a bad state, >>> >>> Any additional details on this ? >>> >>> Can you kill/unload all the ib stuff and reload it ? That would be >>> gentler than rebooting. >>> >>> -- Hal >>> >>>>I tried the openibd restart command, it accepted the >>>> command but after 5 minutes shows no progress of doing anything and >> is >>>> just at the cursor. Is some sort of forced restart of openibd >>> possible? >>>> >>>> Thanks, >>>> >>>> Rob >>>> >>>> >>>> -----Original Message----- >>>> From: Baur, Eric [mailto:Eric.Baur at gs.com] >>>> Sent: 25 November 2008 14:31 >>>> To: Robert Dunkley >>>> Subject: RE: [ofa-general] Mellanox Gen3,Linux and ibpanic - >> "Resource >>>> Temporarily unavailable" >>>> >>>> Robert- >>>> >>>> Is OpenSM set to start on boot? >>>> chkconfig --list | grep opensmd >>>> >>>> If not: chkconfig opensmd on >>>> and: /etc/init.d/opensmd start >>>> >>>> You can also restart openib without rebooting the machines. >>>> /etc/init.d/openibd restart >>>> >>>> -Eric >>>> >>>> -----Original Message----- >>>> From: general-bounces at lists.openfabrics.org >>>> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Robert >>>> Dunkley >>>> Sent: Tuesday, November 25, 2008 9:21 AM >>>> To: general at lists.openfabrics.org >>>> Subject: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource >>>> Temporarily unavailable" >>>> >>>> Hi everyone, >>>> >>>> I'm using a setup of two machines (Lets call them A and B) directly >>>> connected by 1 cable. Each machine has a Mellanox MT25204 (Gen3 >>> Mellanox >>>> PCI-E Infiniband card) and uses IPOIB, they run Centos 5.2 with OFED >>> 1.3 >>>> installed, Machine B runs OpenSM. >>>> >>>> All was working fine. I shutdown Machine A did some maintenance and >>> then >>>> powered it on again, everything is OK again. I then shutdown Machine >> B >>>> (The one running OpenSM), this seemed to really upset Machine A. >> After >>>> booting Machine B again, Machine B looks OK with the port down and > in >>>> polling state. Machine A however gives the following error if I run >>>> ibstat: ibpanic: [11406] main: stat of IB device 'mthca0' failed: >>>> (Resource temporarily unavailable) >>>> >>>> I don't want to reboot Machine A as it must synch data with Machine > B >>>> over the Infiniband link first. Does anyone have any idea how to fix >>>> machine A? >>>> >>>> Thanks, >>>> >>>> Rob >>>> >>>> The SAQ Group >>>> >>>> Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ >>>> SEMTEC Limited Trading as SAQ is Registered in England & Wales >>>> Company Number: 06481952 >>>> >>>> >>>> >>>> http://www.saqnet.co.uk AS29219 >>>> >>>> SAQ Group Delivers high quality, honestly priced communication and >>> I.T. >>>> services to UK Business. >>>> >>>> DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : >>>> Backups : Managed Networks : Remote Support. >>>> >>>> Find us in http://www.thebestof.co.uk/petersfield >>>> >>>> _______________________________________________ >>>> general mailing list >>>> general at lists.openfabrics.org >>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>>> >>>> To unsubscribe, please visit >>>> http://openib.org/mailman/listinfo/openib-general >>>> _______________________________________________ >>>> general mailing list >>>> general at lists.openfabrics.org >>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>>> >>>> To unsubscribe, please visit >>> http://openib.org/mailman/listinfo/openib-general >>>> >>> >> > From tziporet at dev.mellanox.co.il Tue Nov 25 07:55:34 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 25 Nov 2008 17:55:34 +0200 Subject: ***SPAM*** Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" In-Reply-To: References: <4DCBAA39733E8048992FB7737126041910FFD96A@gsmbnbp23es.firmwide.corp.gs.com> Message-ID: <492C1FF6.8070403@mellanox.co.il> Hal Rosenstock wrote: > Hi Rob, > > On Tue, Nov 25, 2008 at 10:21 AM, Robert Dunkley wrote: > >> Hi Hal, >> >> Thanks again, I will try this in a minute. I think I have found the >> moment it went bad on Machine A using Dmesg: >> ib_mthca 0000:87:00.0: Catastrophic error detected: unknown error >> > > Definitely need to reset mthca after this. > > >> ib_mthca 0000:87:00.0: buf[00]: ffffffff >> ib_mthca 0000:87:00.0: buf[01]: ffffffff >> ib_mthca 0000:87:00.0: buf[02]: ffffffff >> ib_mthca 0000:87:00.0: buf[03]: ffffffff >> ib_mthca 0000:87:00.0: buf[04]: ffffffff >> ib_mthca 0000:87:00.0: buf[05]: ffffffff >> ib_mthca 0000:87:00.0: buf[06]: ffffffff >> ib_mthca 0000:87:00.0: buf[07]: ffffffff >> ib_mthca 0000:87:00.0: buf[08]: ffffffff >> ib_mthca 0000:87:00.0: buf[09]: ffffffff >> ib_mthca 0000:87:00.0: buf[0a]: ffffffff >> ib_mthca 0000:87:00.0: buf[0b]: ffffffff >> ib_mthca 0000:87:00.0: buf[0c]: ffffffff >> ib_mthca 0000:87:00.0: buf[0d]: ffffffff >> ib_mthca 0000:87:00.0: buf[0e]: ffffffff >> ib_mthca 0000:87:00.0: buf[0f]: ffffffff >> ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) >> ib0: ib_query_gid() failed >> ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) >> ib0: ib_query_port failed >> ib0: Failed to modify QP to ERROR state >> ib0: timing out; 1 sends 250 receives not completed >> ib0: Failed to modify QP to RESET state >> ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) >> ib_mthca 0000:87:00.0: HW2SW_CQ failed (-11) >> ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) >> ib_mthca 0000:87:00.0: HW2SW_CQ failed (-11) >> ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) >> ib_mthca 0000:87:00.0: HW2SW_SRQ failed (-11) >> ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) >> ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) >> ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) >> ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) >> ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) >> >> Does this help to pinpoint what might have caused this? >> > > The ffffffff in the buf showing you have some PCI bus error. The mthca driver then moved to error mode and no command will be executed. I suggest you check that the card has not moved in the system and you better reboot the system again Tziporet From monis at Voltaire.COM Tue Nov 25 08:06:05 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Tue, 25 Nov 2008 18:06:05 +0200 Subject: [ofa-general] [PATCH] mlx4_ib/mthca: Fix dispatch of IB_EVENT_LID_CHANGE Message-ID: <492C226D.7040009@Voltaire.COM> When snooping a portinfo MAD, it's client_reregister bit is checked. If the bit is ON then a CLIENT_REREGISTER event is dispatched, otherwise a LID_CHANGE event is dispatched. This way of decision ignores the cases where the MAD changes the LID along with an instruction to reregister (so a necessary LID_CHANGE event won't be dispatched) or the MAD is neither of these (and an unnecessary LID_CHANGE event will be dispatched). This patch compares the LID in the MAD to the current LID. If and only if they are not identical then a LID_CHANGE event will be dispatched. Signed-off-by: Moni Shoua --- drivers/infiniband/hw/mlx4/mad.c | 21 +++++++++++++++------ drivers/infiniband/hw/mthca/mthca_mad.c | 20 +++++++++++++++----- 2 files changed, 30 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c index 606f1e2..ca5fa9e 100644 --- a/drivers/infiniband/hw/mlx4/mad.c +++ b/drivers/infiniband/hw/mlx4/mad.c @@ -147,7 +147,8 @@ static void update_sm_ah(struct mlx4_ib_dev *dev, u8 port_num, u16 lid, u8 sl) * Snoop SM MADs for port info and P_Key table sets, so we can * synthesize LID change and P_Key change events. */ -static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad) +static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad, + u16 prev_lid) { struct ib_event event; @@ -157,6 +158,7 @@ static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad) if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO) { struct ib_port_info *pinfo = (struct ib_port_info *) ((struct ib_smp *) mad)->data; + u16 lid = be16_to_cpu(pinfo->lid); update_sm_ah(to_mdev(ibdev), port_num, be16_to_cpu(pinfo->sm_lid), @@ -165,12 +167,15 @@ static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad) event.device = ibdev; event.element.port_num = port_num; - if (pinfo->clientrereg_resv_subnetto & 0x80) + if (pinfo->clientrereg_resv_subnetto & 0x80) { event.event = IB_EVENT_CLIENT_REREGISTER; - else + ib_dispatch_event(&event); + } + if ((prev_lid != 0) && (prev_lid != lid)) { event.event = IB_EVENT_LID_CHANGE; + ib_dispatch_event(&event); + } - ib_dispatch_event(&event); } if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PKEY_TABLE) { @@ -228,8 +233,9 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, struct ib_wc *in_wc, struct ib_grh *in_grh, struct ib_mad *in_mad, struct ib_mad *out_mad) { - u16 slid; + u16 slid, prev_lid = 0; int err; + struct ib_port_attr pattr; slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE); @@ -263,6 +269,9 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, } else return IB_MAD_RESULT_SUCCESS; + if (!ib_query_port(ibdev, port_num, &pattr)) + prev_lid = pattr.lid; + err = mlx4_MAD_IFC(to_mdev(ibdev), mad_flags & IB_MAD_IGNORE_MKEY, mad_flags & IB_MAD_IGNORE_BKEY, @@ -271,7 +280,7 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, return IB_MAD_RESULT_FAILURE; if (!out_mad->mad_hdr.status) { - smp_snoop(ibdev, port_num, in_mad); + smp_snoop(ibdev, port_num, in_mad, prev_lid); node_desc_override(ibdev, out_mad); } diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c index 6404495..6ac114a 100644 --- a/drivers/infiniband/hw/mthca/mthca_mad.c +++ b/drivers/infiniband/hw/mthca/mthca_mad.c @@ -104,7 +104,8 @@ static void update_sm_ah(struct mthca_dev *dev, */ static void smp_snoop(struct ib_device *ibdev, u8 port_num, - struct ib_mad *mad) + struct ib_mad *mad, + u16 prev_lid) { struct ib_event event; @@ -114,6 +115,7 @@ static void smp_snoop(struct ib_device *ibdev, if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO) { struct ib_port_info *pinfo = (struct ib_port_info *) ((struct ib_smp *) mad)->data; + u16 lid = be16_to_cpu(pinfo->lid); mthca_update_rate(to_mdev(ibdev), port_num); update_sm_ah(to_mdev(ibdev), port_num, @@ -123,12 +125,15 @@ static void smp_snoop(struct ib_device *ibdev, event.device = ibdev; event.element.port_num = port_num; - if (pinfo->clientrereg_resv_subnetto & 0x80) + if (pinfo->clientrereg_resv_subnetto & 0x80) { event.event = IB_EVENT_CLIENT_REREGISTER; - else + ib_dispatch_event(&event); + } + if ((prev_lid != 0) && (prev_lid != lid)) { event.event = IB_EVENT_LID_CHANGE; + ib_dispatch_event(&event); + } - ib_dispatch_event(&event); } if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PKEY_TABLE) { @@ -196,6 +201,8 @@ int mthca_process_mad(struct ib_device *ibdev, int err; u8 status; u16 slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE); + u16 prev_lid = 0; + struct ib_port_attr pattr; /* Forward locally generated traps to the SM */ if (in_mad->mad_hdr.method == IB_MGMT_METHOD_TRAP && @@ -234,6 +241,9 @@ int mthca_process_mad(struct ib_device *ibdev, } else return IB_MAD_RESULT_SUCCESS; + if (!ib_query_port(ibdev, port_num, &pattr)) + prev_lid = pattr.lid; + err = mthca_MAD_IFC(to_mdev(ibdev), mad_flags & IB_MAD_IGNORE_MKEY, mad_flags & IB_MAD_IGNORE_BKEY, @@ -252,7 +262,7 @@ int mthca_process_mad(struct ib_device *ibdev, } if (!out_mad->mad_hdr.status) { - smp_snoop(ibdev, port_num, in_mad); + smp_snoop(ibdev, port_num, in_mad, prev_lid); node_desc_override(ibdev, out_mad); } From weiny2 at llnl.gov Tue Nov 25 10:59:37 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 25 Nov 2008 10:59:37 -0800 Subject: ***SPAM*** Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" In-Reply-To: References: <4DCBAA39733E8048992FB7737126041910FFD96A@gsmbnbp23es.firmwide.corp.gs.com> Message-ID: <20081125105937.7b12508b.weiny2@llnl.gov> On Tue, 25 Nov 2008 10:00:22 -0500 "Hal Rosenstock" wrote: > Hi Rob, > > On Tue, Nov 25, 2008 at 9:54 AM, Robert Dunkley wrote: > > Hi Hal, > > > > Thank you for your help. > > > > Ibstat on MachineB: > > CA 'mthca0' > > CA type: MT25204 > > Number of ports: 1 > > Firmware version: 1.2.0 > > Hardware version: a0 > > Node GUID: 0x0002c9020022d428 > > System image GUID: 0x0002c9020022d42b > > Port 1: > > State: Down > > Is machine A on ? Is mthca loaded there ? If so, this should at least > be init but the driver errors below may preclude this from occurring. > > > Physical state: Polling > > Rate: 10 > > Base lid: 0 > > LMC: 0 > > SM lid: 0 > > Capability mask: 0x02510a6a > > Port GUID: 0x0002c9020022d429 > > > > Machine A is operating normally with the exception of Infiniband which > > broke after powering down Machine B and did not recover once Machine B > > was powered on again. An extract from the log of Machine A: > > Nov 25 14:30:21 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT failed > > (-11) > > Nov 25 14:30:31 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_CQ failed > > (-11) > > Nov 25 14:30:41 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT failed > > (-11) > > Nov 25 14:30:51 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_CQ failed > > (-11) > > Nov 25 14:31:01 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT failed > > (-11) > > Nov 25 14:31:11 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_SRQ failed > > (-11) > > Nov 25 14:31:21 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT failed > > (-11) > > Nov 25 14:32:01 mrhappy last message repeated 3 times > > Nov 25 14:32:11 mrhappy kernel: ib_mthca 0000:87:00.0: HW2SW_MPT failed > > (-11) > > -11 is EAGAIN. Not sure what this is used for in the mthca driver. When we have seen these errors, it has meant the firmware is in a bad state and is not responsive. Unfortunately for you, in this situation we have been forced to reboot to correct the problem. (If rebooting is problematic for you perhaps Mellanox has a way around this.) For the future speak with Mellanox to ensure you have the latest firmware as that has fixed a number of items for us. Ira > > Can you unload and reload the IB stack especially mthca driver ? > > -- Hal > > > Thanks again, > > > > Rob > > > > -----Original Message----- > > From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] > > Sent: 25 November 2008 14:49 > > To: Robert Dunkley > > Cc: Baur, Eric; general at lists.openfabrics.org > > Subject: Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource > > Temporarily unavailable" > > > > On Tue, Nov 25, 2008 at 9:39 AM, Robert Dunkley > > wrote: > >> Hi Eric, > >> > >> Thanks for the response. OpenSM is running and set to start on bootup > > on > >> MachineB: > >> ps aux | grep open > >> root 5616 0.0 0.1 142004 1396 ? Sl 13:39 0:00 > >> /usr/sbin/opensm -t 200 -f /var/log/opensm.log -g 0 > >> > >> The log on Machine B just logs this every 10 seconds: > >> Nov 25 14:34:21 148541 [477A7940] 0x01 -> > >> __osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal > >> OSM_SM_SIGNAL_DISCOVER in state IB_SMINFO_STATE_DISCOVERING > >> Nov 25 14:34:31 153173 [477A7940] 0x80 -> SM port is down > >> > >> Ibstat confirms port is in polling state on MachineB. > > > > Is the port in init or down ? > > > >> MachineA however is in a bad state, > > > > Any additional details on this ? > > > > Can you kill/unload all the ib stuff and reload it ? That would be > > gentler than rebooting. > > > > -- Hal > > > >>I tried the openibd restart command, it accepted the > >> command but after 5 minutes shows no progress of doing anything and is > >> just at the cursor. Is some sort of forced restart of openibd > > possible? > >> > >> Thanks, > >> > >> Rob > >> > >> > >> -----Original Message----- > >> From: Baur, Eric [mailto:Eric.Baur at gs.com] > >> Sent: 25 November 2008 14:31 > >> To: Robert Dunkley > >> Subject: RE: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource > >> Temporarily unavailable" > >> > >> Robert- > >> > >> Is OpenSM set to start on boot? > >> chkconfig --list | grep opensmd > >> > >> If not: chkconfig opensmd on > >> and: /etc/init.d/opensmd start > >> > >> You can also restart openib without rebooting the machines. > >> /etc/init.d/openibd restart > >> > >> -Eric > >> > >> -----Original Message----- > >> From: general-bounces at lists.openfabrics.org > >> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Robert > >> Dunkley > >> Sent: Tuesday, November 25, 2008 9:21 AM > >> To: general at lists.openfabrics.org > >> Subject: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource > >> Temporarily unavailable" > >> > >> Hi everyone, > >> > >> I'm using a setup of two machines (Lets call them A and B) directly > >> connected by 1 cable. Each machine has a Mellanox MT25204 (Gen3 > > Mellanox > >> PCI-E Infiniband card) and uses IPOIB, they run Centos 5.2 with OFED > > 1.3 > >> installed, Machine B runs OpenSM. > >> > >> All was working fine. I shutdown Machine A did some maintenance and > > then > >> powered it on again, everything is OK again. I then shutdown Machine B > >> (The one running OpenSM), this seemed to really upset Machine A. After > >> booting Machine B again, Machine B looks OK with the port down and in > >> polling state. Machine A however gives the following error if I run > >> ibstat: ibpanic: [11406] main: stat of IB device 'mthca0' failed: > >> (Resource temporarily unavailable) > >> > >> I don't want to reboot Machine A as it must synch data with Machine B > >> over the Infiniband link first. Does anyone have any idea how to fix > >> machine A? > >> > >> Thanks, > >> > >> Rob > >> > >> The SAQ Group > >> > >> Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ > >> SEMTEC Limited Trading as SAQ is Registered in England & Wales > >> Company Number: 06481952 > >> > >> > >> > >> http:// www. saqnet.co.uk AS29219 > >> > >> SAQ Group Delivers high quality, honestly priced communication and > > I.T. > >> services to UK Business. > >> > >> DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : > >> Backups : Managed Networks : Remote Support. > >> > >> Find us in http:// www. thebestof.co.uk/petersfield > >> > >> _______________________________________________ > >> general mailing list > >> general at lists.openfabrics.org > >> http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >> > >> To unsubscribe, please visit > >> http:// openib.org/mailman/listinfo/openib-general > >> _______________________________________________ > >> general mailing list > >> general at lists.openfabrics.org > >> http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >> > >> To unsubscribe, please visit > > http:// openib.org/mailman/listinfo/openib-general > >> > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Tue Nov 25 14:54:37 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 25 Nov 2008 14:54:37 -0800 Subject: [ofa-general] Re: [PATCH 1 of 2 V2] libmlx4: Fix race condition in create/destroy QP In-Reply-To: <200811250840.07944.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Tue, 25 Nov 2008 08:40:07 +0200") References: <200811250840.07944.jackm@dev.mellanox.co.il> Message-ID: Thanks for double checking and resending... I applied and pushed out. Kind of amazing that my bleary eyes noticed the one bug :) From rdreier at cisco.com Tue Nov 25 14:55:59 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 25 Nov 2008 14:55:59 -0800 Subject: [ofa-general] Re: [PATCH 2 of 2] libmthca: Fix race condition in create/destroy QP In-Reply-To: <200811221154.02427.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Sat, 22 Nov 2008 11:54:01 +0200") References: <200811221154.02427.jackm@dev.mellanox.co.il> Message-ID: thanks, applied From rdreier at cisco.com Tue Nov 25 14:56:52 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 25 Nov 2008 14:56:52 -0800 Subject: [ofa-general] RE: [PATCH 03/10] RDMA/nes: Remove tx_free_list In-Reply-To: <60BEFF3FBD4C6047B0F13F205CAFA3830310DC721E@azsmsx501.amr.corp.intel.com> (Chien Tin Tung's message of "Mon, 24 Nov 2008 15:14:30 -0700") References: <20081121205044.GA7424@ctung-MOBL> <60BEFF3FBD4C6047B0F13F205CAFA3830310DC721E@azsmsx501.amr.corp.intel.com> Message-ID: > We were trying to make minimum change to the code. There is no reason left > For get_free_pkt(). I can rework the patch to remove it. Please do. Since you changed the signature of get_free_pkt(), you have to touch every call site anyway, so may as well call dev_alloc_skb() directly and delete a few more lines of code. - R. From rdreier at cisco.com Tue Nov 25 15:13:51 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 25 Nov 2008 15:13:51 -0800 Subject: [ofa-general] Re: [PATCH] IB/ehca: Change misleading error message In-Reply-To: <200811251358.06729.fenkes@de.ibm.com> (Joachim Fenkes's message of "Tue, 25 Nov 2008 13:58:06 +0100") References: <200806061835.43802.fenkes@de.ibm.com> <48499C11.7030504@gmail.com> <200811251358.06729.fenkes@de.ibm.com> Message-ID: > The error message printed when the eHCA driver prevents memory hotplug is > misleading -- the user might think that hot-removing the lhca, hotplugging > memory, then hot-adding the lhca again will work, but it doesn't. That's too bad... I applied this patch but out of curiousity, why doesn't the hot-remove/hot-add work? I would have thought that re-registering all of memory after the hot-add would do the right thing. From rdreier at cisco.com Tue Nov 25 15:15:57 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 25 Nov 2008 15:15:57 -0800 Subject: [ofa-general] [PATCH 3/3] IB/ipath - improve UD loopback performance by allocating temp array once In-Reply-To: <20081023195017.10020.33878.stgit@eng-46.mv.qlogic.com> (Ralph Campbell's message of "Thu, 23 Oct 2008 12:50:17 -0700") References: <20081023195001.10020.96260.stgit@eng-46.mv.qlogic.com> <20081023195017.10020.33878.stgit@eng-46.mv.qlogic.com> Message-ID: thanks, applied From ddiss at sgi.com Tue Nov 25 21:12:13 2008 From: ddiss at sgi.com (David Disseldorp) Date: Wed, 26 Nov 2008 16:12:13 +1100 Subject: [ofa-general] [PATCH] iser: avoid recv buf exhaustion In-Reply-To: <49292119.9080105@voltaire.com> References: <1227247845-16023-1-git-send-email-ddiss@sgi.com> <49292119.9080105@voltaire.com> Message-ID: <20081126161213.000065c3@snort.melbourne.sgi.com> Thanks for the feedback Or, comments below. On Sun, 23 Nov 2008 11:23:37 +0200 Or Gerlitz wrote: > David Disseldorp wrote: > > iSCSI/iSER targets may send PDUs without a prior request from the initiator, RFC 5046 refers to these PDUs as "unexpected". NOP-In PDUs with itt=RESERVED and Asynchronous Message PDUs occupy this category. Currently when an iSER target sends an "unexpected" PDU, the initiators recv buffer consumed by the PDU is not replaced. If over initial_post_recv_bufs_num "unexpected" PDUs are received then the receive queue will run out of receive work requests. > Assuming these target initiated NOP-Ins are echoed back by the > initiator, the current code of iser_send_control would post a receive > buffer when sending the NOP-Out which will account for the buffer > consumed by the NOP-In. So we are remained with the Asynchronous PDUs > for which your patch indeed seems to fix a hole in the implementation. Yes, target initiated "ping" NOP-Ins with a valid TTT do not currently result in receive buffer depletion, however targets may use a NOP-In PDU with both ITT and TTT set to RESERVED for the sole purpose of advertising the command window counters (ExpCmdSN and MaxCmdSN). These PDUs do not require a NOP-Out PDU from the initiator. Likewise the Initiator may send a NOP-Out with both ITT and TTT set to RESERVED, in this case a recv buf for a target response should not be posted. > > > > This patch ensures recv buffers consumed by "unexpected" PDUs are replaced prior to sending the next control-type PDU. > The practice used by the patch is account unexpected receives and refill > the receive buffer queue when ever possible with as many as unexpected > receives that took place since the last refill attempt. To ease with > future maintainance and debugging / simplicity of the code, I would > prefer a patch with zero foot-print at the iser_send_xxx functions, > something like account --async-- receives and when calling > iser_post_receive_control fill-in the missing buffers. No problems, i'll rework the patch to post "unexpected" buffers along with the response buffer in iser_post_receive_control(). Cheers, Dave From ddiss at sgi.com Tue Nov 25 21:19:22 2008 From: ddiss at sgi.com (David Disseldorp) Date: Wed, 26 Nov 2008 16:19:22 +1100 Subject: [ofa-general] [PATCH] iser: avoid recv buf exhaustion v2 In-Reply-To: <20081126161213.000065c3@snort.melbourne.sgi.com> References: <20081126161213.000065c3@snort.melbourne.sgi.com> Message-ID: <1227676762-23505-1-git-send-email-ddiss@sgi.com> iSCSI/iSER targets may send PDUs without a prior request from the initiator, RFC 5046 refers to these PDUs as "unexpected". NOP-In PDUs with itt=RESERVED and Asynchronous Message PDUs occupy this category. The amount of active "unexpected" PDU's an iSER target may have at any time is governed by the MaxOutstandingUnexpectedPDUs key, which is not yet supported. Currently when an iSER target sends an "unexpected" PDU, the initiators recv buffer consumed by the PDU is not replaced. If over initial_post_recv_bufs_num "unexpected" PDUs are received then the receive queue will run out of receive work requests. This patch ensures recv buffers consumed by "unexpected" PDUs are replaced in the next iser_post_receive_control() call. Version 2: o replace unexpected recv bufs in iser_post_receive_control, transparent to iser_send_* functions. Signed-off-by: David Disseldorp Signed-off-by: Ken Sandars --- drivers/infiniband/ulp/iser/iscsi_iser.h | 3 + drivers/infiniband/ulp/iser/iser_initiator.c | 134 ++++++++++++++++++-------- drivers/infiniband/ulp/iser/iser_verbs.c | 1 + 3 files changed, 97 insertions(+), 41 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 81a8262..8611195 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -252,6 +252,9 @@ struct iser_conn { wait_queue_head_t wait; /* waitq for conn/disconn */ atomic_t post_recv_buf_count; /* posted rx count */ atomic_t post_send_buf_count; /* posted tx count */ + atomic_t unexpected_pdu_count;/* count of received * + * unexpected pdus * + * not yet retired */ char name[ISER_OBJECT_NAME_SIZE]; struct iser_page_vec *page_vec; /* represents SG to fmr maps* * maps serialized as tx is*/ diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index cdd2831..a0c56a4 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -183,14 +183,8 @@ static int iser_post_receive_control(struct iscsi_conn *conn) struct iser_regd_buf *regd_data; struct iser_dto *recv_dto = NULL; struct iser_device *device = iser_conn->ib_conn->device; - int rx_data_size, err = 0; - - rx_desc = kmem_cache_alloc(ig.desc_cache, GFP_NOIO); - if (rx_desc == NULL) { - iser_err("Failed to alloc desc for post recv\n"); - return -ENOMEM; - } - rx_desc->type = ISCSI_RX; + int rx_data_size, err; + int posts, outstanding_unexp_pdus; /* for the login sequence we must support rx of upto 8K; login is done * after conn create/bind (connect) and conn stop/bind (reconnect), @@ -201,46 +195,80 @@ static int iser_post_receive_control(struct iscsi_conn *conn) else /* FIXME till user space sets conn->max_recv_dlength correctly */ rx_data_size = 128; - rx_desc->data = kmalloc(rx_data_size, GFP_NOIO); - if (rx_desc->data == NULL) { - iser_err("Failed to alloc data buf for post recv\n"); - err = -ENOMEM; - goto post_rx_kmalloc_failure; - } + outstanding_unexp_pdus = + atomic_xchg(&iser_conn->ib_conn->unexpected_pdu_count, 0); - recv_dto = &rx_desc->dto; - recv_dto->ib_conn = iser_conn->ib_conn; - recv_dto->regd_vector_len = 0; + /* + * in addition to the response buffer, replace those consumed by + * unexpected pdus. + */ + for (posts = 0; posts < 1 + outstanding_unexp_pdus; posts++) { + rx_desc = kmem_cache_alloc(ig.desc_cache, GFP_NOIO); + if (rx_desc == NULL) { + iser_err("Failed to alloc desc for post recv %d\n", + posts); + err = -ENOMEM; + goto post_rx_cache_alloc_failure; + } + rx_desc->type = ISCSI_RX; + rx_desc->data = kmalloc(rx_data_size, GFP_NOIO); + if (rx_desc->data == NULL) { + iser_err("Failed to alloc data buf for post recv %d\n", + posts); + err = -ENOMEM; + goto post_rx_kmalloc_failure; + } - regd_hdr = &rx_desc->hdr_regd_buf; - memset(regd_hdr, 0, sizeof(struct iser_regd_buf)); - regd_hdr->device = device; - regd_hdr->virt_addr = rx_desc; /* == &rx_desc->iser_header */ - regd_hdr->data_size = ISER_TOTAL_HEADERS_LEN; + recv_dto = &rx_desc->dto; + recv_dto->ib_conn = iser_conn->ib_conn; + recv_dto->regd_vector_len = 0; - iser_reg_single(device, regd_hdr, DMA_FROM_DEVICE); + regd_hdr = &rx_desc->hdr_regd_buf; + memset(regd_hdr, 0, sizeof(struct iser_regd_buf)); + regd_hdr->device = device; + regd_hdr->virt_addr = rx_desc; /* == &rx_desc->iser_header */ + regd_hdr->data_size = ISER_TOTAL_HEADERS_LEN; - iser_dto_add_regd_buff(recv_dto, regd_hdr, 0, 0); + iser_reg_single(device, regd_hdr, DMA_FROM_DEVICE); - regd_data = &rx_desc->data_regd_buf; - memset(regd_data, 0, sizeof(struct iser_regd_buf)); - regd_data->device = device; - regd_data->virt_addr = rx_desc->data; - regd_data->data_size = rx_data_size; + iser_dto_add_regd_buff(recv_dto, regd_hdr, 0, 0); - iser_reg_single(device, regd_data, DMA_FROM_DEVICE); + regd_data = &rx_desc->data_regd_buf; + memset(regd_data, 0, sizeof(struct iser_regd_buf)); + regd_data->device = device; + regd_data->virt_addr = rx_desc->data; + regd_data->data_size = rx_data_size; - iser_dto_add_regd_buff(recv_dto, regd_data, 0, 0); + iser_reg_single(device, regd_data, DMA_FROM_DEVICE); - err = iser_post_recv(rx_desc); - if (!err) - return 0; + iser_dto_add_regd_buff(recv_dto, regd_data, 0, 0); - /* iser_post_recv failed */ + err = iser_post_recv(rx_desc); + if (err) { + iser_err("Failed iser_post_recv for post %d\n", posts); + goto post_rx_post_recv_failure; + } + } + /* all posts successful */ + return 0; + +post_rx_post_recv_failure: iser_dto_buffs_release(recv_dto); kfree(rx_desc->data); post_rx_kmalloc_failure: kmem_cache_free(ig.desc_cache, rx_desc); +post_rx_cache_alloc_failure: + if (posts > 0) { + /* + * response buffer posted, but did not replace all unexpected + * pdu recv bufs. Ignore error, retry occurs next send + */ + outstanding_unexp_pdus -= (posts - 1); + err = 0; + } + atomic_add(outstanding_unexp_pdus, + &iser_conn->ib_conn->unexpected_pdu_count); + return err; } @@ -274,8 +302,10 @@ int iser_conn_set_full_featured_mode(struct iscsi_conn *conn) struct iscsi_iser_conn *iser_conn = conn->dd_data; int i; - /* no need to keep it in a var, we are after login so if this should - * be negotiated, by now the result should be available here */ + /* + * FIXME this value should be declared to the target during login with + * the MaxOutstandingUnexpectedPDUs key when supported + */ int initial_post_recv_bufs_num = ISER_MAX_RX_MISC_PDUS; iser_dbg("Initially post: %d\n", initial_post_recv_bufs_num); @@ -478,6 +508,7 @@ int iser_send_control(struct iscsi_conn *conn, int err = 0; struct iser_regd_buf *regd_buf; struct iser_device *device; + unsigned char opcode; if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); @@ -512,10 +543,16 @@ int iser_send_control(struct iscsi_conn *conn, data_seg_len); } - if (iser_post_receive_control(conn) != 0) { - iser_err("post_rcv_buff failed!\n"); - err = -ENOMEM; - goto send_control_error; + opcode = task->hdr->opcode & ISCSI_OPCODE_MASK; + + /* post recv buffer for response if one is expected */ + if (!((opcode == ISCSI_OP_NOOP_OUT) + && (task->hdr->itt == RESERVED_ITT))) { + if (iser_post_receive_control(conn) != 0) { + iser_err("post_rcv_buff failed!\n"); + err = -ENOMEM; + goto send_control_error; + } } err = iser_post_send(mdesc); @@ -586,6 +623,21 @@ void iser_rcv_completion(struct iser_desc *rx_desc, * parallel to the execution of iser_conn_term. So the code that waits * * for the posted rx bufs refcount to become zero handles everything */ atomic_dec(&conn->ib_conn->post_recv_buf_count); + + /* + * if an unexpected PDU was received then the recv wr consumed must + * be replaced, this is done in the next send of a control-type PDU + */ + if ((opcode == ISCSI_OP_NOOP_IN) + && (hdr->itt == RESERVED_ITT)) { + /* nop-in with itt = 0xffffffff */ + atomic_inc(&conn->ib_conn->unexpected_pdu_count); + } + else if (opcode == ISCSI_OP_ASYNC_EVENT) { + /* asyncronous message */ + atomic_inc(&conn->ib_conn->unexpected_pdu_count); + } + /* a reject PDU consumes the recv buf posted for the response */ } void iser_snd_completion(struct iser_desc *tx_desc) diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 26ff621..6dc6b17 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -498,6 +498,7 @@ void iser_conn_init(struct iser_conn *ib_conn) init_waitqueue_head(&ib_conn->wait); atomic_set(&ib_conn->post_recv_buf_count, 0); atomic_set(&ib_conn->post_send_buf_count, 0); + atomic_set(&ib_conn->unexpected_pdu_count, 0); atomic_set(&ib_conn->refcount, 1); INIT_LIST_HEAD(&ib_conn->conn_list); spin_lock_init(&ib_conn->lock); -- 1.5.4.5 From jackm at dev.mellanox.co.il Tue Nov 25 23:42:46 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Wed, 26 Nov 2008 09:42:46 +0200 Subject: [ofa-general] Re: [PATCH 1 of 2 V2] libmlx4: Fix race condition in create/destroy QP In-Reply-To: References: <200811250840.07944.jackm@dev.mellanox.co.il> Message-ID: <200811260942.47110.jackm@dev.mellanox.co.il> On Wednesday 26 November 2008 00:54, Roland Dreier wrote: > Kind of amazing that my bleary eyes noticed the one bug :) Thank heaven you did! - Jack From vlad at dev.mellanox.co.il Wed Nov 26 00:53:28 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 26 Nov 2008 10:53:28 +0200 Subject: [ofa-general] [PATCH] ipoib: do not join broadcast group if interface is brought down In-Reply-To: <49246EB7.3070607@Voltaire.COM> References: <49246EB7.3070607@Voltaire.COM> Message-ID: <492D0E88.6080009@dev.mellanox.co.il> Yossi Etigin wrote: > Because ipoib_workqueue is not flushed when ipoib interface is brought > down, > ipoib_mcast_join() may trigger a join to the broadcast group after > priv->broadcast > was set to NULL (during cleanup). This will cause ipoib to be joined > to the > broadcast group when interface is down. > As a side effect, this breaks the optimization of setting qkey only > when joining > the broadcast group. > > Signed-off-by: Yossi Etigin > > -- > > Fix bugzilla 1370. > > Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > =================================================================== > --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-11-19 > 21:33:54.000000000 +0200 > +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-11-19 > 21:40:12.000000000 +0200 > @@ -565,7 +565,8 @@ void ipoib_mcast_join_task(struct work_s > ipoib_warn(priv, "ib_query_port failed\n"); > } > > - if (!priv->broadcast) { > + rtnl_lock(); > + if (test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) && > !priv->broadcast) { > struct ipoib_mcast *broadcast; > > broadcast = ipoib_mcast_alloc(dev, 1); > @@ -576,6 +577,7 @@ void ipoib_mcast_join_task(struct work_s > queue_delayed_work(ipoib_workqueue, > &priv->mcast_join_task, HZ); > mutex_unlock(&mcast_mutex); > + rtnl_unlock(); > return; > } > > @@ -587,6 +589,7 @@ void ipoib_mcast_join_task(struct work_s > __ipoib_mcast_add(dev, priv->broadcast); > spin_unlock_irq(&priv->lock); > } > + rtnl_unlock(); > > if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { > if (!test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags)) Hi Yossi, I got the following kernel oops on SLES 10 (2.6.16.21-0.8-smp) using the patch above. To reproduce, run: rmmod ib_ipoib Unable to handle kernel NULL pointer dereference at virtual address 00000068 printing eip: f8c5e3c4 *pde = 7a0e8067 Oops: 0000 [#1] SMP last sysfs file: /class/infiniband/mthca0/node_desc Modules linked in: ib_ipoib ib_cm ib_sa ib_uverbs ib_umad mlx4_ib mlx4_core ib_mthca ib_mad ib_core memtrack autofs4 nfs lockd nfs_acl sunrpc ipv6 af_packe CPU: 0 EIP: 0060:[] Tainted: G U VLI EFLAGS: 00010202 (2.6.16.21-0.8-smp #1) EIP is at ipoib_mcast_join_task+0x134/0x24d [ib_ipoib] eax: 00000000 ebx: f6a2c3e8 ecx: 00000000 edx: 00000000 esi: f6a2c56c edi: f6a2c12c ebp: f6a2c380 esp: f6a2bf0c ds: 007b es: 007b ss: 0068 Process ipoib (pid: 7858, threadinfo=f6a2a000 task=f7e3c0f0) Stack: <0>f6a2c000 00000004 00000004 00000004 00000020 02510a68 80000000 00000000 00000000 00020040 0400000f 02001200 00000501 f6a2c3e8 f6a2c3ec f73447c0 00000292 c012d85e f8c5e290 f6a2c3e8 f73447cc f73447c0 f73447d4 c012e052 Call Trace: [] run_workqueue+0x7f/0xba [] ipoib_mcast_join_task+0x0/0x24d [ib_ipoib] [] worker_thread+0x0/0x11e [] worker_thread+0xed/0x11e [] default_wake_function+0x0/0xc [] kthread+0x9d/0xc9 [] kthread+0x0/0xc9 [] kernel_thread_helper+0x5/0xb Code: 21 63 c7 8b 75 04 81 c6 3c 01 00 00 a5 a5 a5 a5 89 5d 28 8b 04 24 89 da e8 b3 f5 ff ff b0 01 86 45 00 fb e8 62 92 5e c7 8b 55 28 <8b> 42 68 a8 08 75 Regards, Vladimir From sashak at voltaire.com Wed Nov 26 02:13:05 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 26 Nov 2008 12:13:05 +0200 Subject: [ofa-general] Re: [PATCH] Opensm: main exit codes In-Reply-To: <492B24BA.40303@llnl.gov> References: <4923678D.3080701@llnl.gov> <20081123185836.GU21967@sashak.voltaire.com> <492AF3AE.3060605@llnl.gov> <20081124190251.GT6183@sashak.voltaire.com> <492B24BA.40303@llnl.gov> Message-ID: <20081126101305.GE12270@sashak.voltaire.com> Hi Tim, On 14:03 Mon 24 Nov , Timothy A. Meier wrote: > > > > And are there any of such tools? Or any *real* use? > > > > Chicken/Egg? Currently, we depend on only ZERO or non-zero. Although OpenSM returns "other" values > on exit, they aren't really formalized or documented. Hence the patch. ;^) And after this patch it is still be not formalized - there are another places in OpenSM where exit(N) is called. For example what could you do with exit(YY_EXIT_FAILURE)? > Personally, I have (and create) several different versions of opensm with small customizations, > and test them on our cluster testbeds. I often will start/stop them in a variety of configurations > (with and without plugins, more than one sm on a node, etc.) and if and when opensm doesn't > startup normally, it would be nice to have a meaningful exit code. > > Perhaps others might find it useful as well, or for some future use. Maybe, but for this clear rules should be defined and applied, not just several exit codes. Ideally such work could be done in parallel - OpenSM and analyzing tool (not a Chicken/Egg :)). > But again, I originally considered this more as code cleanup. Converting the exits, returns, and aborts > to provide a more consistent interface to the system. Ok, if it is only the purpose we can do something like this (assuming all exit(), abort(), etc. and not only in main.c are converted), but in this case I would suggest to start with very limited error codes set, and to not add OSM_EXIT_TYPE_NORMAL - "0" looks better and it is fine for the system too. And in any case I don't see this as OFED materials. Sasha From sashak at voltaire.com Wed Nov 26 02:19:19 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 26 Nov 2008 12:19:19 +0200 Subject: [ofa-general] Re: [PATCH 0/3] ibnetdiscover library "libibnetdisc" In-Reply-To: <20081124134938.61c345e0.weiny2@llnl.gov> References: <20081120163809.26a3c499.weiny2@llnl.gov> <20081123182741.GS21967@sashak.voltaire.com> <20081124094243.4dbcff51.weiny2@llnl.gov> <20081124191050.GU6183@sashak.voltaire.com> <20081124113005.4261cfd1.weiny2@llnl.gov> <20081124200151.GX6183@sashak.voltaire.com> <20081124134938.61c345e0.weiny2@llnl.gov> Message-ID: <20081126101919.GF12270@sashak.voltaire.com> Hi Ira, On 13:49 Mon 24 Nov , Ira Weiny wrote: > > As long as the library exists any dependant package can of course use the > library from whatever package we chose (libibnetdisc or infiniband-diags). We > have some code which is prototyped against ibnetdiscover but we plan on using > this library instead. This would be separate from infiniband-diags. But we > can just as easily put a dependancy on infiniband-diags as on libibnetdisc. Yes, it is possible to make dependency. But I'm getting complains about too many in-management dependencies even now. > The fact is that it was actually easier to put this in a new package rather > than try and integrate with infiniband-diags. I would disagree. In later case we need to deal with only one logical change and not with all "new package" issues. > I thought it made for a very > clean conversion by putting the library in as a new patch and then we could > convert the diags as appropriate. > > Anyway, I will integrate it as you say and resubmit the patch. Thanks. Sasha From sashak at voltaire.com Wed Nov 26 03:00:49 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 26 Nov 2008 13:00:49 +0200 Subject: [ofa-general] [PATCH] infiniband-diags/grouping: add 10G IP router devid Message-ID: <20081126110049.GJ12270@sashak.voltaire.com> Add 10G IP router device id for grouping. Signed-off-by: Sasha Khapyorsky --- infiniband-diags/include/grouping.h | 1 + infiniband-diags/src/grouping.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletions(-) diff --git a/infiniband-diags/include/grouping.h b/infiniband-diags/include/grouping.h index 3ba872c..e54efef 100644 --- a/infiniband-diags/include/grouping.h +++ b/infiniband-diags/include/grouping.h @@ -91,6 +91,7 @@ struct AllChassisList { #define VTR_DEVID_ISR2012 0x5a39 #define VTR_DEVID_SFB2004 0x5a40 #define VTR_DEVID_ISR2004 0x5a41 +#define VTR_DEVID_SRB2004 0x5a42 enum ChassisType { UNRESOLVED_CT, ISR9288_CT, ISR9096_CT, ISR2012_CT, ISR2004_CT }; enum ChassisSlot { UNRESOLVED_CS, LINE_CS, SPINE_CS, SRBD_CS }; diff --git a/infiniband-diags/src/grouping.c b/infiniband-diags/src/grouping.c index e2b4488..f1a996f 100644 --- a/infiniband-diags/src/grouping.c +++ b/infiniband-diags/src/grouping.c @@ -242,7 +242,8 @@ static int is_spine(Node *node) static int is_line_24(Node *node) { return (node->devid == VTR_DEVID_SLB24 || - node->devid == VTR_DEVID_SLB24_DDR); + node->devid == VTR_DEVID_SLB24_DDR || + node->devid == VTR_DEVID_SRB2004); } static int is_line_8(Node *node) -- 1.6.0.4.766.g6fc4a From vlad at lists.openfabrics.org Wed Nov 26 03:23:50 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Wed, 26 Nov 2008 03:23:50 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081126-0200 daily build status Message-ID: <20081126112350.84B44E60B89@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From FENKES at de.ibm.com Wed Nov 26 05:44:56 2008 From: FENKES at de.ibm.com (Joachim Fenkes) Date: Wed, 26 Nov 2008 14:44:56 +0100 Subject: [ofa-general] Re: [PATCH] IB/ehca: Change misleading error message In-Reply-To: References: <200806061835.43802.fenkes@de.ibm.com> <48499C11.7030504@gmail.com> <200811251358.06729.fenkes@de.ibm.com> Message-ID: Roland Dreier wrote on 26.11.2008 00:13:51: > That's too bad... I applied this patch but out of curiousity, why > doesn't the hot-remove/hot-add work? I would have thought that > re-registering all of memory after the hot-add would do the right thing. That's right, but right now, we simply try to register all of memory from KERNELBASE to high_memory, which works right until we have memory holes in the middle; then the hypervisor will reject our page registrations. Same goes for huge (16GB) pages, by the way. We're working on a solution to this. Cheers, Joachim From marinal at voltaire.com Wed Nov 26 08:07:41 2008 From: marinal at voltaire.com (Marina Lipshteyn) Date: Wed, 26 Nov 2008 18:07:41 +0200 Subject: [ofa-general] documentation on Fat-Tree algorithm in OpenSM Message-ID: Hi, I believe that the Fat-Tree algorithm has very pure documentation in opensm. There is no description on how the port balancing is done. On the other hand, the other algorithms do have an explanation on their balancing concept. I would like to ask if it is possible to add such a description, at least on a level of general concept explanation. This will help to understand the algorithm. Thanks, Marina. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yosefe at Voltaire.COM Wed Nov 26 08:27:27 2008 From: yosefe at Voltaire.COM (Yossi Etigin) Date: Wed, 26 Nov 2008 18:27:27 +0200 Subject: [ofa-general] [PATCH v2] ipoib: do not join broadcast group if interface is brought down In-Reply-To: <492D0E88.6080009@dev.mellanox.co.il> References: <49246EB7.3070607@Voltaire.COM> <492D0E88.6080009@dev.mellanox.co.il> Message-ID: <492D78EF.4010703@Voltaire.COM> Because ipoib_workqueue is not flushed when ipoib interface is brought down, ipoib_mcast_join() may trigger a join to the broadcast group after priv->broadcast was set to NULL (during cleanup). This will cause ipoib to be joined to the broadcast group when interface is down. As a side effect, this breaks the optimization of setting qkey only when joining the broadcast group. Signed-off-by: Yossi Etigin -- Changes from v1: - Put checks in places where was assumed priv->broadcast != NULL. Fix bugzilla 1370. --- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-11-19 21:33:54.000000000 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-11-26 18:08:48.000000000 +0200 @@ -497,7 +497,7 @@ static void ipoib_mcast_join(struct net_ IB_SA_MCMEMBER_REC_PKEY | IB_SA_MCMEMBER_REC_JOIN_STATE; - if (create) { + if (create && priv->broadcast) { comp_mask |= IB_SA_MCMEMBER_REC_QKEY | IB_SA_MCMEMBER_REC_MTU_SELECTOR | @@ -565,7 +565,8 @@ void ipoib_mcast_join_task(struct work_s ipoib_warn(priv, "ib_query_port failed\n"); } - if (!priv->broadcast) { + rtnl_lock(); + if (test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) && !priv->broadcast) { struct ipoib_mcast *broadcast; broadcast = ipoib_mcast_alloc(dev, 1); @@ -576,6 +577,7 @@ void ipoib_mcast_join_task(struct work_s queue_delayed_work(ipoib_workqueue, &priv->mcast_join_task, HZ); mutex_unlock(&mcast_mutex); + rtnl_unlock(); return; } @@ -587,8 +589,10 @@ void ipoib_mcast_join_task(struct work_s __ipoib_mcast_add(dev, priv->broadcast); spin_unlock_irq(&priv->lock); } + rtnl_unlock(); - if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { + if (priv->broadcast && + !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { if (!test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags)) ipoib_mcast_join(dev, priv->broadcast, 0); return; @@ -617,7 +621,8 @@ void ipoib_mcast_join_task(struct work_s return; } - priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu)); + if (priv->broadcast) + priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu)); if (!ipoib_cm_admin_enabled(dev)) { rtnl_lock(); From alekseys at voltaire.com Wed Nov 26 09:51:39 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Wed, 26 Nov 2008 19:51:39 +0200 Subject: [ofa-general] [RMDA CM IPv6 support. PATCHv4 1/6] AF_INET6 support for rdma_bind_addr Message-ID: <1227721899.3121.18.camel@alst60.voltaire.com> Changes from v3: This set of patches based on the latest, 2.6.28-rc4 kernel. >From 3fd066360f33d4083e183c14b991ed6408d68726 Mon Sep 17 00:00:00 2001 From: Aleksey Senin Date: Wed, 13 Aug 2008 09:55:33 +0300 Subject: [PATCH] AF_INET6 support for rdma_bind_addr Signed-off-by: Aleksey Senin --- drivers/infiniband/core/cma.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d951896..4728265 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -2073,7 +2073,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) struct rdma_id_private *id_priv; int ret; - if (addr->sa_family != AF_INET) + if (addr->sa_family != AF_INET && addr->sa_family != AF_INET6) return -EAFNOSUPPORT; id_priv = container_of(id, struct rdma_id_private, id); -- 1.5.6.dirty From alekseys at voltaire.com Wed Nov 26 09:55:30 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Wed, 26 Nov 2008 17:55:30 +0000 Subject: [ofa-general] [RMDA CM IPv6 support. PATCHv4 2/6] AF_INET6 case to cma_format_hdr function In-Reply-To: <1227721899.3121.18.camel@alst60.voltaire.com> References: <1227721899.3121.18.camel@alst60.voltaire.com> Message-ID: <1227722130.3121.20.camel@alst60.voltaire.com> >From 46f9e4ae3fadb174d26816df8932f97561479307 Mon Sep 17 00:00:00 2001 From: Aleksey Senin Date: Wed, 13 Aug 2008 10:01:05 +0300 Subject: [PATCH] AF_INET6 case to cma_format_hdr function Signed-off-by: Aleksey Senin --- drivers/infiniband/core/cma.c | 73 ++++++++++++++++++++++++++++------------ 1 files changed, 51 insertions(+), 22 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 4728265..31f2aa2 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -2113,32 +2113,61 @@ EXPORT_SYMBOL(rdma_bind_addr); static int cma_format_hdr(void *hdr, enum rdma_port_space ps, struct rdma_route *route) { - struct sockaddr_in *src4, *dst4; struct cma_hdr *cma_hdr; struct sdp_hh *sdp_hdr; - src4 = (struct sockaddr_in *) &route->addr.src_addr; - dst4 = (struct sockaddr_in *) &route->addr.dst_addr; - - switch (ps) { - case RDMA_PS_SDP: - sdp_hdr = hdr; - if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) - return -EINVAL; - sdp_set_ip_ver(sdp_hdr, 4); - sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; - sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; - sdp_hdr->port = src4->sin_port; - break; - default: - cma_hdr = hdr; - cma_hdr->cma_version = CMA_VERSION; - cma_set_ip_ver(cma_hdr, 4); - cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; - cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; - cma_hdr->port = src4->sin_port; - break; + if (route->addr.src_addr.ss_family == AF_INET) { + struct sockaddr_in *src4, *dst4; + + src4 = (struct sockaddr_in *) &route->addr.src_addr; + dst4 = (struct sockaddr_in *) &route->addr.dst_addr; + + switch (ps) { + case RDMA_PS_SDP: + sdp_hdr = hdr; + if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) + return -EINVAL; + sdp_set_ip_ver(sdp_hdr, 4); + sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; + sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; + sdp_hdr->port = src4->sin_port; + break; + default: + cma_hdr = hdr; + cma_hdr->cma_version = CMA_VERSION; + cma_set_ip_ver(cma_hdr, 4); + cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; + cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; + cma_hdr->port = src4->sin_port; + break; + } + } else { + struct sockaddr_in6 *src6, *dst6; + + src6 = (struct sockaddr_in6 *) &route->addr.src_addr; + dst6 = (struct sockaddr_in6 *) &route->addr.dst_addr; + + switch (ps) { + case RDMA_PS_SDP: + sdp_hdr = hdr; + if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) + return -EINVAL; + sdp_set_ip_ver(sdp_hdr, 6); + sdp_hdr->src_addr.ip6 = src6->sin6_addr; + sdp_hdr->dst_addr.ip6 = dst6->sin6_addr; + sdp_hdr->port = src6->sin6_port; + break; + default: + cma_hdr = hdr; + cma_hdr->cma_version = CMA_VERSION; + cma_set_ip_ver(cma_hdr, 6); + cma_hdr->src_addr.ip6 = src6->sin6_addr; + cma_hdr->dst_addr.ip6 = dst6->sin6_addr; + cma_hdr->port = src6->sin6_port; + break; + } } + return 0; } -- 1.5.6.dirty From alekseys at voltaire.com Wed Nov 26 09:56:09 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Wed, 26 Nov 2008 19:56:09 +0200 Subject: [ofa-general] [RMDA CM IPv6 support. PATCHv4 3/6] IPv6 support in cma_bind_any In-Reply-To: <1227721899.3121.18.camel@alst60.voltaire.com> References: <1227721899.3121.18.camel@alst60.voltaire.com> Message-ID: <1227722169.3121.22.camel@alst60.voltaire.com> >From 16579a6bd3da5d2f7fd46bc71261bf87f0baa6ae Mon Sep 17 00:00:00 2001 From: Aleksey Senin Date: Wed, 13 Aug 2008 10:03:16 +0300 Subject: [PATCH] IPv6 support in cma_bind_any Using sockaddr_storage structure instead of sockaddr_in for catching IPv6 protocol Signed-off-by: Aleksey Senin --- drivers/infiniband/core/cma.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 31f2aa2..df22c5c 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1467,10 +1467,10 @@ static void cma_listen_on_all(struct rdma_id_private *id_priv) static int cma_bind_any(struct rdma_cm_id *id, sa_family_t af) { - struct sockaddr_in addr_in; + struct sockaddr_storage addr_in; memset(&addr_in, 0, sizeof addr_in); - addr_in.sin_family = af; + addr_in.ss_family = af; return rdma_bind_addr(id, (struct sockaddr *) &addr_in); } -- 1.5.6.dirty From alekseys at voltaire.com Wed Nov 26 09:56:39 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Wed, 26 Nov 2008 19:56:39 +0200 Subject: [ofa-general] [RMDA CM IPv6 support. PATCHv4 4/6] IPv6 local address resolution In-Reply-To: <1227721899.3121.18.camel@alst60.voltaire.com> References: <1227721899.3121.18.camel@alst60.voltaire.com> Message-ID: <1227722199.3121.25.camel@alst60.voltaire.com> >From 8465a7d33a36cf8a9a92fbeea5d8f3b89f30e632 Mon Sep 17 00:00:00 2001 From: Aleksey Senin Date: Wed, 26 Nov 2008 16:16:09 +0200 Subject: [PATCH] IPv6 local address resolution RDMA CM support on local machine Signed-off-by: Aleksey Senin --- drivers/infiniband/core/addr.c | 65 +++++++++++++++++++++++++++++----------- 1 files changed, 47 insertions(+), 18 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index f95d21f..1d785d7 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -279,29 +279,58 @@ static int addr_resolve_local(struct sockaddr *src_in, struct rdma_dev_addr *addr) { struct net_device *dev; - __be32 src_ip = ((struct sockaddr_in *)src_in)->sin_addr.s_addr; - __be32 dst_ip = ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; - int ret; + int ret = -EADDRNOTAVAIL; - dev = ip_dev_find(&init_net, dst_ip); - if (!dev) - return -EADDRNOTAVAIL; + if (dst_in->sa_family == AF_INET) { + __be32 src_ip = ((struct sockaddr_in *)src_in)->sin_addr.s_addr; + __be32 dst_ip = ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; - if (ipv4_is_zeronet(src_ip)) { - src_in->sa_family = dst_in->sa_family; - ((struct sockaddr_in *)src_in)->sin_addr.s_addr = dst_ip; - ret = rdma_copy_addr(addr, dev, dev->dev_addr); - } else if (ipv4_is_loopback(src_ip)) { - ret = rdma_translate_ip(dst_in, addr); - if (!ret) - memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + dev = ip_dev_find(&init_net, dst_ip); + if (!dev) + return -EADDRNOTAVAIL; + + if (ipv4_is_zeronet(src_ip)) { + src_in->sa_family = dst_in->sa_family; + ((struct sockaddr_in *)src_in)->sin_addr.s_addr = dst_ip; + ret = rdma_copy_addr(addr, dev, dev->dev_addr); + } else if (ipv4_is_loopback(src_ip)) { + ret = rdma_translate_ip(dst_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } else { + ret = rdma_translate_ip(src_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } + dev_put(dev); } else { - ret = rdma_translate_ip(src_in, addr); - if (!ret) - memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + struct in6_addr *a = &((struct sockaddr_in6 *)dst_in)->sin6_addr; + + for_each_netdev(&init_net, dev) + if (ipv6_chk_addr(&init_net, &((struct sockaddr_in6 *) addr)->sin6_addr, dev, 1)) + break; + + if (!dev) + return -EADDRNOTAVAIL; + + a = &((struct sockaddr_in6 *)src_in)->sin6_addr; + + if (ipv6_addr_any(a)) { + src_in->sa_family = dst_in->sa_family; + ((struct sockaddr_in6 *)src_in)->sin6_addr = + ((struct sockaddr_in6 *)dst_in)->sin6_addr; + ret = rdma_copy_addr(addr, dev, dev->dev_addr); + } else if (ipv6_addr_loopback(a)) { + ret = rdma_translate_ip(dst_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } else { + ret = rdma_translate_ip(src_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } } - dev_put(dev); return ret; } -- 1.5.6.dirty From alekseys at voltaire.com Wed Nov 26 09:58:57 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Wed, 26 Nov 2008 19:58:57 +0200 Subject: [ofa-general] [RMDA CM IPv6 support. PATCHv4 5/6] IPv6 support for network discovery In-Reply-To: <1227721899.3121.18.camel@alst60.voltaire.com> References: <1227721899.3121.18.camel@alst60.voltaire.com> Message-ID: <1227722337.21512.0.camel@alst60.voltaire.com> >From 14290555a2c58906214deb44423277fffd77fc4c Mon Sep 17 00:00:00 2001 From: Aleksey Senin Date: Wed, 13 Aug 2008 10:19:13 +0300 Subject: [PATCH] IPv6 support for network discovery Added support for network discovery in addr_send_arp function Signed-off-by: Aleksey Senin --- drivers/infiniband/core/addr.c | 32 ++++++++++++++++++++++++-------- 1 files changed, 24 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index 1d785d7..460dcc2 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -43,6 +43,7 @@ #include #include #include +#include MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("IB Address Translation"); @@ -172,19 +173,34 @@ static void queue_req(struct addr_req *req) mutex_unlock(&lock); } -static void addr_send_arp(struct sockaddr_in *dst_in) +static void addr_send_arp(struct sockaddr *dst_in) { struct rtable *rt; struct flowi fl; - __be32 dst_ip = dst_in->sin_addr.s_addr; + struct dst_entry *dst; memset(&fl, 0, sizeof fl); - fl.nl_u.ip4_u.daddr = dst_ip; - if (ip_route_output_key(&init_net, &rt, &fl)) - return; + if (dst_in->sa_family == AF_INET) { + fl.nl_u.ip4_u.daddr = + ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; - neigh_event_send(rt->u.dst.neighbour, NULL); - ip_rt_put(rt); + if (ip_route_output_key(&init_net, &rt, &fl)) + return; + + neigh_event_send(rt->u.dst.neighbour, NULL); + ip_rt_put(rt); + + } else { + fl.nl_u.ip6_u.daddr = + ((struct sockaddr_in6 *)dst_in)->sin6_addr; + + dst = ip6_route_output(&init_net, NULL, &fl); + if (!dst) + return; + + neigh_event_send(dst->neighbour, NULL); + dst_release(dst); + } } static int addr_resolve_remote(struct sockaddr *src_in, @@ -373,7 +389,7 @@ int rdma_resolve_ip(struct rdma_addr_client *client, case -ENODATA: req->timeout = msecs_to_jiffies(timeout_ms) + jiffies; queue_req(req); - addr_send_arp((struct sockaddr_in *)dst_in); + addr_send_arp(dst_in); break; default: ret = req->status; -- 1.5.6.dirty From alekseys at voltaire.com Wed Nov 26 09:59:31 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Wed, 26 Nov 2008 19:59:31 +0200 Subject: [ofa-general] [RMDA CM IPv6 support. PATCHv4 6/6] Remote IPv6 resolution In-Reply-To: <1227721899.3121.18.camel@alst60.voltaire.com> References: <1227721899.3121.18.camel@alst60.voltaire.com> Message-ID: <1227722371.21512.2.camel@alst60.voltaire.com> >From 34464092a263d339c919432b1e4495dce36ee568 Mon Sep 17 00:00:00 2001 From: Aleksey Senin Date: Wed, 26 Nov 2008 18:24:35 +0200 Subject: [PATCH] Remote IPv6 resolution Added remote address resolusion for RDMA CM Function addr_resolve_remote used as wrapper for two other functions: addr4_resolve_remote ( original addr_resolve_remote ) addr6_resolve_remote ( new function ) Signed-off-by: Aleksey Senin --- drivers/infiniband/core/addr.c | 53 +++++++++++++++++++++++++++++++++++---- 1 files changed, 47 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index 460dcc2..16ffd49 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -203,12 +203,12 @@ static void addr_send_arp(struct sockaddr *dst_in) } } -static int addr_resolve_remote(struct sockaddr *src_in, - struct sockaddr *dst_in, +static int addr4_resolve_remote(struct sockaddr_in *src_in, + struct sockaddr_in *dst_in, struct rdma_dev_addr *addr) { - __be32 src_ip = ((struct sockaddr_in *)src_in)->sin_addr.s_addr; - __be32 dst_ip = ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; + __be32 src_ip = src_in->sin_addr.s_addr; + __be32 dst_ip = dst_in->sin_addr.s_addr; struct flowi fl; struct rtable *rt; struct neighbour *neigh; @@ -239,8 +239,8 @@ static int addr_resolve_remote(struct sockaddr *src_in, } if (!src_ip) { - src_in->sa_family = dst_in->sa_family; - ((struct sockaddr_in *)src_in)->sin_addr.s_addr = rt->rt_src; + src_in->sin_family = dst_in->sin_family; + src_in->sin_addr.s_addr = rt->rt_src; } ret = rdma_copy_addr(addr, neigh->dev, neigh->ha); @@ -252,6 +252,47 @@ out: return ret; } +static int addr6_resolve_remote(struct sockaddr_in6 *src_in, + struct sockaddr_in6 *dst_in, + struct rdma_dev_addr *addr) +{ + struct flowi fl; + struct neighbour *neigh; + struct dst_entry *dst; + int ret = -ENODATA; + + memset(&fl, 0, sizeof fl); + fl.nl_u.ip6_u.daddr = dst_in->sin6_addr; + fl.nl_u.ip6_u.saddr = src_in->sin6_addr; + + dst = ip6_route_output(&init_net, NULL, &fl); + if (!dst) + return ret; + + if (dst->dev->flags & IFF_NOARP) { + ret = rdma_copy_addr(addr, dst->dev, NULL); + } else { + neigh = dst->neighbour; + if (neigh && (neigh->nud_state & NUD_VALID)) + ret = rdma_copy_addr(addr, neigh->dev, neigh->ha); + } + + dst_release(dst); + return ret; +} + +static int addr_resolve_remote(struct sockaddr *src_in, + struct sockaddr *dst_in, + struct rdma_dev_addr *addr) +{ + if (src_in->sa_family == AF_INET) { + return addr4_resolve_remote((struct sockaddr_in *)src_in, + (struct sockaddr_in *)dst_in, addr); + } else + return addr6_resolve_remote((struct sockaddr_in6 *)src_in, + (struct sockaddr_in6 *)dst_in, addr); +} + static void process_req(struct work_struct *work) { struct addr_req *req, *temp_req; -- 1.5.6.dirty From YJia at tmriusa.com Wed Nov 26 14:18:48 2008 From: YJia at tmriusa.com (Yicheng Jia) Date: Wed, 26 Nov 2008 16:18:48 -0600 Subject: [ofa-general] set up QPs with different transfer rate Message-ID: Hi Folks, I have two applications which require different IB transfer rates. I am using Mellanox 25204 HCA. Can I achieve it by setting up two QPs with different service levels? Can I set "SL" field in QP context, or it is controlled by SM? Thanks! Best, Yicheng Software Engineer Toshiba Medical Research Institute USA, Inc. _____________________________________________________________________________ Scanned by IBM Email Security Management Services powered by MessageLabs. For more information please visit http://www.ers.ibm.com _____________________________________________________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmpc at sanger.ac.uk Thu Nov 27 02:36:15 2008 From: gmpc at sanger.ac.uk (Guy Coates) Date: Thu, 27 Nov 2008 10:36:15 +0000 Subject: [ofa-general] Anyone working on debian packages? Message-ID: <492E781F.40709@sanger.ac.uk> Hi all, We have recently been experimenting with IB on debian, and in the course of this work I have built a basic set of OFED-1.3.1 debian packages for our internal use. With a bit more effort the packages could be worked into a state suitable for inclusion into the main Debian archives. Before I start this I would like ensure that I am not re-inventing the wheel; is anyone else working on this? Cheers, Guy -- Dr. Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK Tel: +44 (0)1223 834244 x 6925 Fax: +44 (0)1223 496802 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From vlad at lists.openfabrics.org Thu Nov 27 03:24:29 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Thu, 27 Nov 2008 03:24:29 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081127-0200 daily build status Message-ID: <20081127112430.1844AE60D44@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From olga.shern at gmail.com Thu Nov 27 04:52:44 2008 From: olga.shern at gmail.com (Olga Shern (Voltaire)) Date: Thu, 27 Nov 2008 14:52:44 +0200 Subject: [ofa-general] ***SPAM*** Re: [ewg] OFED Nov 24, 2008 meeting minutes In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EAD0FE7A8@mtlexch01.mtl.com> References: <458BC6B0F287034F92FE78908BD01CE84EF35EF0@mtlexch01.mtl.com> <5D49E7A8952DC44FB38C38FA0D758EAD0FE7A8@mtlexch01.mtl.com> Message-ID: > > OFED 1.4 release: RC6 on Nov 28, GA on Dec 8 Hi, Are you going to build RC6 today/tomorrow? I see that there are still a lot of major bugs. Maybe we should wait? Olga From jackm at dev.mellanox.co.il Thu Nov 27 04:57:32 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 27 Nov 2008 14:57:32 +0200 Subject: [ofa-general] [PATCH] mlx4_ib/mthca: Fix dispatch of IB_EVENT_LID_CHANGE In-Reply-To: <492C226D.7040009@Voltaire.COM> References: <492C226D.7040009@Voltaire.COM> Message-ID: <200811271457.32510.jackm@dev.mellanox.co.il> On Tuesday 25 November 2008 18:06, Moni Shoua wrote: > @@ -263,6 +269,9 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags,     u8 port_num, >         } else >                 return IB_MAD_RESULT_SUCCESS; >   > +       if (!ib_query_port(ibdev, port_num, &pattr)) > +               prev_lid = pattr.lid; > + > Why do ib_query_port for each MAD that is handled? query_port involves a firmware access. Events are generated only for SMP SET packets. I think the condition should read: if ((in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && in_mad->mad_hdr.method == IB_MGMT_METHOD_SET && !ib_query_port(ibdev, port_num, &pattr)) prev_lid = pattr.lid; so that the query_port will be performed only for appropriate packets. - Jack From monis at Voltaire.COM Thu Nov 27 05:26:23 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Thu, 27 Nov 2008 15:26:23 +0200 Subject: [ofa-general] [PATCH] mlx4_ib/mthca: Fix dispatch of IB_EVENT_LID_CHANGE In-Reply-To: <200811271457.32510.jackm@dev.mellanox.co.il> References: <492C226D.7040009@Voltaire.COM> <200811271457.32510.jackm@dev.mellanox.co.il> Message-ID: <492E9FFF.9070107@Voltaire.COM> Jack Morgenstein wrote: > On Tuesday 25 November 2008 18:06, Moni Shoua wrote: >> @@ -263,6 +269,9 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, >> } else >> return IB_MAD_RESULT_SUCCESS; >> >> + if (!ib_query_port(ibdev, port_num, &pattr)) >> + prev_lid = pattr.lid; >> + >> > > Why do ib_query_port for each MAD that is handled? query_port involves a firmware access. > Events are generated only for SMP SET packets. I agreee. Thanks. I'm also changing the action in case ib_query_port() fails. Instead of ignoring the failure I now assume the worst (i.e: LID_CHANGE) From monis at Voltaire.COM Thu Nov 27 05:31:18 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Thu, 27 Nov 2008 15:31:18 +0200 Subject: [ofa-general] [PATCH] mlx4_ib/mthca: Fix dispatch of IB_EVENT_LID_CHANGE In-Reply-To: <200811271457.32510.jackm@dev.mellanox.co.il> References: <492C226D.7040009@Voltaire.COM> <200811271457.32510.jackm@dev.mellanox.co.il> Message-ID: <492EA126.1060104@Voltaire.COM> New patch according to Jack's comment and the other change (credit to Yossi E.) Same change log applies here. -- mlx4/mad.c | 24 ++++++++++++++++++------ mthca/mthca_mad.c | 22 +++++++++++++++++----- 2 files changed, 35 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c index 606f1e2..9528459 100644 --- a/drivers/infiniband/hw/mlx4/mad.c +++ b/drivers/infiniband/hw/mlx4/mad.c @@ -147,7 +147,8 @@ static void update_sm_ah(struct mlx4_ib_dev *dev, u8 port_num, u16 lid, u8 sl) * Snoop SM MADs for port info and P_Key table sets, so we can * synthesize LID change and P_Key change events. */ -static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad) +static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad, + u16 prev_lid) { struct ib_event event; @@ -157,6 +158,7 @@ static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad) if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO) { struct ib_port_info *pinfo = (struct ib_port_info *) ((struct ib_smp *) mad)->data; + u16 lid = be16_to_cpu(pinfo->lid); update_sm_ah(to_mdev(ibdev), port_num, be16_to_cpu(pinfo->sm_lid), @@ -165,12 +167,15 @@ static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad) event.device = ibdev; event.element.port_num = port_num; - if (pinfo->clientrereg_resv_subnetto & 0x80) + if (pinfo->clientrereg_resv_subnetto & 0x80) { event.event = IB_EVENT_CLIENT_REREGISTER; - else + ib_dispatch_event(&event); + } + if (prev_lid != lid) { event.event = IB_EVENT_LID_CHANGE; + ib_dispatch_event(&event); + } - ib_dispatch_event(&event); } if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PKEY_TABLE) { @@ -228,8 +233,9 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, struct ib_wc *in_wc, struct ib_grh *in_grh, struct ib_mad *in_mad, struct ib_mad *out_mad) { - u16 slid; + u16 slid, prev_lid = 0; int err; + struct ib_port_attr pattr; slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE); @@ -263,6 +269,12 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, } else return IB_MAD_RESULT_SUCCESS; + if ((in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || + in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && + in_mad->mad_hdr.method == IB_MGMT_METHOD_SET && + !ib_query_port(ibdev, port_num, &pattr)) + prev_lid = pattr.lid; + err = mlx4_MAD_IFC(to_mdev(ibdev), mad_flags & IB_MAD_IGNORE_MKEY, mad_flags & IB_MAD_IGNORE_BKEY, @@ -271,7 +283,7 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, return IB_MAD_RESULT_FAILURE; if (!out_mad->mad_hdr.status) { - smp_snoop(ibdev, port_num, in_mad); + smp_snoop(ibdev, port_num, in_mad, prev_lid); node_desc_override(ibdev, out_mad); } diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c index 6404495..d872aeb 100644 --- a/drivers/infiniband/hw/mthca/mthca_mad.c +++ b/drivers/infiniband/hw/mthca/mthca_mad.c @@ -104,7 +104,8 @@ static void update_sm_ah(struct mthca_dev *dev, */ static void smp_snoop(struct ib_device *ibdev, u8 port_num, - struct ib_mad *mad) + struct ib_mad *mad, + u16 prev_lid) { struct ib_event event; @@ -114,6 +115,7 @@ static void smp_snoop(struct ib_device *ibdev, if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO) { struct ib_port_info *pinfo = (struct ib_port_info *) ((struct ib_smp *) mad)->data; + u16 lid = be16_to_cpu(pinfo->lid); mthca_update_rate(to_mdev(ibdev), port_num); update_sm_ah(to_mdev(ibdev), port_num, @@ -123,12 +125,15 @@ static void smp_snoop(struct ib_device *ibdev, event.device = ibdev; event.element.port_num = port_num; - if (pinfo->clientrereg_resv_subnetto & 0x80) + if (pinfo->clientrereg_resv_subnetto & 0x80) { event.event = IB_EVENT_CLIENT_REREGISTER; - else + ib_dispatch_event(&event); + } + if (prev_lid != lid) { event.event = IB_EVENT_LID_CHANGE; + ib_dispatch_event(&event); + } - ib_dispatch_event(&event); } if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PKEY_TABLE) { @@ -196,6 +201,8 @@ int mthca_process_mad(struct ib_device *ibdev, int err; u8 status; u16 slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE); + u16 prev_lid = 0; + struct ib_port_attr pattr; /* Forward locally generated traps to the SM */ if (in_mad->mad_hdr.method == IB_MGMT_METHOD_TRAP && @@ -233,6 +240,11 @@ int mthca_process_mad(struct ib_device *ibdev, return IB_MAD_RESULT_SUCCESS; } else return IB_MAD_RESULT_SUCCESS; + if ((in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || + in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && + in_mad->mad_hdr.method == IB_MGMT_METHOD_SET && + !ib_query_port(ibdev, port_num, &pattr)) + prev_lid = pattr.lid; err = mthca_MAD_IFC(to_mdev(ibdev), mad_flags & IB_MAD_IGNORE_MKEY, @@ -252,7 +264,7 @@ int mthca_process_mad(struct ib_device *ibdev, } if (!out_mad->mad_hdr.status) { - smp_snoop(ibdev, port_num, in_mad); + smp_snoop(ibdev, port_num, in_mad, prev_lid); node_desc_override(ibdev, out_mad); } From jackm at dev.mellanox.co.il Thu Nov 27 05:43:10 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 27 Nov 2008 15:43:10 +0200 Subject: [ofa-general] [PATCH] mlx4_ib/mthca: Fix dispatch =?iso-8859-1?q?of=09IB=5FEVENT=5FLID=5FCHANGE?= In-Reply-To: <492E9FFF.9070107@Voltaire.COM> References: <492C226D.7040009@Voltaire.COM> <200811271457.32510.jackm@dev.mellanox.co.il> <492E9FFF.9070107@Voltaire.COM> Message-ID: <200811271543.10613.jackm@dev.mellanox.co.il> On Thursday 27 November 2008 15:26, Moni Shoua wrote: > I'm also changing the action in case ib_query_port() fails. > Instead of ignoring the failure I now assume the worst (i.e: LID_CHANGE) > OK. This will not be worse than the current situation. (actually, it may in certain cases, because you can generate the both LID_CHANGE event and a CLIENT_REREGISTER event, where before only one was generated. However, if query_port fails, we will probably see other failures as well, so what the heck). BTW, The condition I sent in my last post is not enough. It should be: if ((in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && in_mad->mad_hdr.method == IB_MGMT_METHOD_SET && in_mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO && !ib_query_port(ibdev, port_num, &pattr)) prev_lid = pattr.lid; since the query_port response data is only relevant for the IB_SMP_ATTR_PORT_INFO path in smp_snoop. - Jack From jackm at dev.mellanox.co.il Thu Nov 27 05:51:46 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 27 Nov 2008 15:51:46 +0200 Subject: [ofa-general] [PATCH] mlx4_ib/mthca: Fix dispatch of IB_EVENT_LID_CHANGE In-Reply-To: <492EA126.1060104@Voltaire.COM> References: <492C226D.7040009@Voltaire.COM> <200811271457.32510.jackm@dev.mellanox.co.il> <492EA126.1060104@Voltaire.COM> Message-ID: <200811271551.47076.jackm@dev.mellanox.co.il> On Thursday 27 November 2008 15:31, Moni Shoua wrote: > @@ -263,6 +269,12 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags,    u8 port_num, >         } else >                 return IB_MAD_RESULT_SUCCESS; >   > +       if ((in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || > +               in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && > +               in_mad->mad_hdr.method == IB_MGMT_METHOD_SET && > +               !ib_query_port(ibdev, port_num, &pattr)) > +                       prev_lid = pattr.lid; > + >         err = mlx4_MAD_IFC(to_mdev(ibdev), >                            mad_flags & IB_MAD_IGNORE_MKEY, >                            mad_flags & IB_MAD_IGNORE_BKEY, > Per my last post, this should be: @@ -263,6 +269,13 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, } else return IB_MAD_RESULT_SUCCESS; + if ((in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || + in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && + in_mad->mad_hdr.method == IB_MGMT_METHOD_SET && + in_mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO && + !ib_query_port(ibdev, port_num, &pattr)) + prev_lid = pattr.lid; + err = mlx4_MAD_IFC(to_mdev(ibdev), mad_flags & IB_MAD_IGNORE_MKEY, mad_flags & IB_MAD_IGNORE_BKEY, From monis at Voltaire.COM Thu Nov 27 06:14:45 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Thu, 27 Nov 2008 16:14:45 +0200 Subject: [ofa-general] [PATCH] mlx4_ib/mthca: Fix dispatch of IB_EVENT_LID_CHANGE In-Reply-To: <200811271551.47076.jackm@dev.mellanox.co.il> References: <492C226D.7040009@Voltaire.COM> <200811271457.32510.jackm@dev.mellanox.co.il> <492EA126.1060104@Voltaire.COM> <200811271551.47076.jackm@dev.mellanox.co.il> Message-ID: <492EAB55.7020003@Voltaire.COM> Thanks again. Now with your other fix -- drivers/infiniband/hw/mlx4/mad.c | 25 +++++++++++++++++++------ drivers/infiniband/hw/mthca/mthca_mad.c | 23 ++++++++++++++++++----- 2 files changed, 37 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c index 606f1e2..d5971a1 100644 --- a/drivers/infiniband/hw/mlx4/mad.c +++ b/drivers/infiniband/hw/mlx4/mad.c @@ -147,7 +147,8 @@ static void update_sm_ah(struct mlx4_ib_dev *dev, u8 port_num, u16 lid, u8 sl) * Snoop SM MADs for port info and P_Key table sets, so we can * synthesize LID change and P_Key change events. */ -static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad) +static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad, + u16 prev_lid) { struct ib_event event; @@ -157,6 +158,7 @@ static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad) if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO) { struct ib_port_info *pinfo = (struct ib_port_info *) ((struct ib_smp *) mad)->data; + u16 lid = be16_to_cpu(pinfo->lid); update_sm_ah(to_mdev(ibdev), port_num, be16_to_cpu(pinfo->sm_lid), @@ -165,12 +167,15 @@ static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad) event.device = ibdev; event.element.port_num = port_num; - if (pinfo->clientrereg_resv_subnetto & 0x80) + if (pinfo->clientrereg_resv_subnetto & 0x80) { event.event = IB_EVENT_CLIENT_REREGISTER; - else + ib_dispatch_event(&event); + } + if (prev_lid != lid) { event.event = IB_EVENT_LID_CHANGE; + ib_dispatch_event(&event); + } - ib_dispatch_event(&event); } if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PKEY_TABLE) { @@ -228,8 +233,9 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, struct ib_wc *in_wc, struct ib_grh *in_grh, struct ib_mad *in_mad, struct ib_mad *out_mad) { - u16 slid; + u16 slid, prev_lid = 0; int err; + struct ib_port_attr pattr; slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE); @@ -263,6 +269,13 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, } else return IB_MAD_RESULT_SUCCESS; + if ((in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || + in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && + in_mad->mad_hdr.method == IB_MGMT_METHOD_SET && + in_mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO && + !ib_query_port(ibdev, port_num, &pattr)) + prev_lid = pattr.lid; + err = mlx4_MAD_IFC(to_mdev(ibdev), mad_flags & IB_MAD_IGNORE_MKEY, mad_flags & IB_MAD_IGNORE_BKEY, @@ -271,7 +284,7 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, return IB_MAD_RESULT_FAILURE; if (!out_mad->mad_hdr.status) { - smp_snoop(ibdev, port_num, in_mad); + smp_snoop(ibdev, port_num, in_mad, prev_lid); node_desc_override(ibdev, out_mad); } diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c index 6404495..45ac68e 100644 --- a/drivers/infiniband/hw/mthca/mthca_mad.c +++ b/drivers/infiniband/hw/mthca/mthca_mad.c @@ -104,7 +104,8 @@ static void update_sm_ah(struct mthca_dev *dev, */ static void smp_snoop(struct ib_device *ibdev, u8 port_num, - struct ib_mad *mad) + struct ib_mad *mad, + u16 prev_lid) { struct ib_event event; @@ -114,6 +115,7 @@ static void smp_snoop(struct ib_device *ibdev, if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO) { struct ib_port_info *pinfo = (struct ib_port_info *) ((struct ib_smp *) mad)->data; + u16 lid = be16_to_cpu(pinfo->lid); mthca_update_rate(to_mdev(ibdev), port_num); update_sm_ah(to_mdev(ibdev), port_num, @@ -123,12 +125,15 @@ static void smp_snoop(struct ib_device *ibdev, event.device = ibdev; event.element.port_num = port_num; - if (pinfo->clientrereg_resv_subnetto & 0x80) + if (pinfo->clientrereg_resv_subnetto & 0x80) { event.event = IB_EVENT_CLIENT_REREGISTER; - else + ib_dispatch_event(&event); + } + if (prev_lid != lid) { event.event = IB_EVENT_LID_CHANGE; + ib_dispatch_event(&event); + } - ib_dispatch_event(&event); } if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PKEY_TABLE) { @@ -196,6 +201,8 @@ int mthca_process_mad(struct ib_device *ibdev, int err; u8 status; u16 slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE); + u16 prev_lid = 0; + struct ib_port_attr pattr; /* Forward locally generated traps to the SM */ if (in_mad->mad_hdr.method == IB_MGMT_METHOD_TRAP && @@ -233,6 +240,12 @@ int mthca_process_mad(struct ib_device *ibdev, return IB_MAD_RESULT_SUCCESS; } else return IB_MAD_RESULT_SUCCESS; + if ((in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || + in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && + in_mad->mad_hdr.method == IB_MGMT_METHOD_SET && + in_mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO && + !ib_query_port(ibdev, port_num, &pattr)) + prev_lid = pattr.lid; err = mthca_MAD_IFC(to_mdev(ibdev), mad_flags & IB_MAD_IGNORE_MKEY, @@ -252,7 +265,7 @@ int mthca_process_mad(struct ib_device *ibdev, } if (!out_mad->mad_hdr.status) { - smp_snoop(ibdev, port_num, in_mad); + smp_snoop(ibdev, port_num, in_mad, prev_lid); node_desc_override(ibdev, out_mad); } From tziporet at dev.mellanox.co.il Thu Nov 27 06:29:08 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Thu, 27 Nov 2008 16:29:08 +0200 Subject: [ofa-general] Re: ***SPAM*** Re: [ewg] OFED Nov 24, 2008 meeting minutes In-Reply-To: References: <458BC6B0F287034F92FE78908BD01CE84EF35EF0@mtlexch01.mtl.com> <5D49E7A8952DC44FB38C38FA0D758EAD0FE7A8@mtlexch01.mtl.com> Message-ID: <492EAEB4.4000301@mellanox.co.il> Olga Shern (Voltaire) wrote: >> OFED 1.4 release: RC6 on Nov 28, GA on Dec 8 >> > > Hi, > > Are you going to build RC6 today/tomorrow? > I see that there are still a lot of major bugs. Maybe we should wait? > > We already build it and I will publish it later today after some sanity checks we run here. We should not wait since UNH must run their Logo program tests. We can fix few more critical bugs next week too. Tziporet From tziporet at mellanox.co.il Thu Nov 27 08:38:55 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 27 Nov 2008 18:38:55 +0200 Subject: [ofa-general] OFED-1.4-rc6 release is available Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD010F752D@mtlexch01.mtl.com> Hi, OFED-1.4-rc6 release is available on http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-rc6.tgz To get BUILD_ID run ofed_info Please report any issues in bugzilla https://bugs.openfabrics.org/ for OFED 1.4 Vladimir & Tziporet ======================================================================== Release information: ------------------------------ Linux Operating Systems: - RedHat EL4 up4: 2.6.9-42.ELsmp * - RedHat EL4 up5: 2.6.9-55.ELsmp - RedHat EL4 up6: 2.6.9-67.ELsmp - RedHat EL4 up7: 2.6.9-78.ELsmp - RedHat EL5: 2.6.18-8.el5 - RedHat EL5 up1: 2.6.18-53.el5 - RedHat EL5 up2: 2.6.18-92.el5 - OEL 4.5: 2.6.9-55.ELsmp - OEL 5.2: 2.6.18-92.el5 - CentOS 5.2: 2.6.18-92.el5 - Fedora C9: 2.6.25-14.fc9 * - SLES10: 2.6.16.21-0.8-smp - SLES10 SP1: 2.6.16.46-0.12-smp - SLES10 SP1 up1: 2.6.16.53-0.16-smp - SLES10 SP2: 2.6.16.60-0.21-smp - OpenSuSE 10.3: 2.6.22.5-31 * - kernel.org: 2.6.26 and 2.6.27 * Minimal QA for these versions Systems: * x86_64 * x86 * ia64 * ppc64 Main Changes from OFED-1.4-rc4 ============================== - Updated MPI packages: mvapich-1.1.0-3143 - Updated bonding package: ib-bonding-0.9.0-36 - Updated opensm version to opensm-3.2.4 - updated diags package version to infiniband-diags-1.4.3 - 19 bugs fixed (see attached for details) - Attached kernel git tree changes Tasks that should be completed for the release: =================================== 1. High priority bug fixes 2. UNH Logo program testing 3. Documentation update -------------- next part -------------- A non-text attachment was scrubbed... Name: ofed-1.4-rc6-fixed-bugs.csv Type: application/octet-stream Size: 2083 bytes Desc: ofed-1.4-rc6-fixed-bugs.csv URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rc5_rc6_commits Type: application/octet-stream Size: 10513 bytes Desc: rc5_rc6_commits URL: From vlad at lists.openfabrics.org Fri Nov 28 03:28:01 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Fri, 28 Nov 2008 03:28:01 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081128-0200 daily build status Message-ID: <20081128112802.14E7CE60CB2@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From dotanba at gmail.com Fri Nov 28 08:01:02 2008 From: dotanba at gmail.com (Dotan Barak) Date: Fri, 28 Nov 2008 18:01:02 +0200 Subject: ***SPAM*** Re: [ofa-general] set up QPs with different transfer rate In-Reply-To: References: Message-ID: <493015BE.8050702@gmail.com> Yicheng Jia wrote: > > Hi Folks, > > I have two applications which require different IB transfer rates. I > am using Mellanox 25204 HCA. Can I achieve it by setting up two QPs > with different service levels? Can I set "SL" field in QP context, or > it is controlled by SM? Thanks! You can set the SL value in the QP, but the SM controls the SL2VL mapping + VL_arbitration table. Dotan > > Best, > > Yicheng > > Software Engineer > Toshiba Medical Research Institute USA, Inc. > > _____________________________________________________________________________ > Scanned by IBM Email Security Management Services powered by > MessageLabs. For more information please visit http://www.ers.ibm.com > _____________________________________________________________________________ > ------------------------------------------------------------------------ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Fri Nov 28 21:14:09 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 28 Nov 2008 21:14:09 -0800 Subject: [ofa-general] [RMDA CM IPv6 support. PATCHv4 1/6] AF_INET6 support for rdma_bind_addr In-Reply-To: <1227721899.3121.18.camel@alst60.voltaire.com> (Aleksey Senin's message of "Wed, 26 Nov 2008 19:51:39 +0200") References: <1227721899.3121.18.camel@alst60.voltaire.com> Message-ID: I would like to get some input from Sean before proceeding on this, but one thing does jump out at me: the order of the patches seems strange to me (or maybe it's the way the patches are split). Starting with this change only: > @@ -2073,7 +2073,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) > - if (addr->sa_family != AF_INET) > + if (addr->sa_family != AF_INET && addr->sa_family != AF_INET6) > return -EAFNOSUPPORT; seems wrong to me. If I just have this patch applied (eg if I'm doing a bisection to track down a bug) then it seems I'll get some very strange results if I try to bind an IPv6 address. It seems to me we would want all the prep work like using sockaddr_storage where needed, etc. before we actually enable IPv6 in the API. - R. From rdreier at cisco.com Fri Nov 28 21:48:29 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 28 Nov 2008 21:48:29 -0800 Subject: [ofa-general] Re: [PATCH V2] mlx4: save default port ib capabilities, and use when setting port type to IB. In-Reply-To: <200811051444.02306.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Wed, 5 Nov 2008 14:44:01 +0200") References: <200811041214.39085.jackm@dev.mellanox.co.il> <200811051444.02306.jackm@dev.mellanox.co.il> Message-ID: thanks, applied From aostvold at platform.com Sat Nov 29 01:00:42 2008 From: aostvold at platform.com (Asmund Ostvold) Date: Sat, 29 Nov 2008 10:00:42 +0100 Subject: [ofa-general] reviving wrong data after trying to allocation a too large memory chunck Message-ID: <493104BA.9090607@platform.com> We discovered a strange problem running OFED; We're not sure if it is a OFED problem but we post it here anyway. Short description: We have a program that allocates a set of buffers with valloc, sends them with ibv_post_send and free them. This is run in loop; We have a "caching"-algorithm so that we register memory only the first time we come across a buffer address. We starts getting wrong data for parts of sends after a couple of iterations There are a few things worth mentioning: - We must use valloc; the test works with malloc - We must have a malloc allocating a too large chunk before starting the loop (the malloc fails) We have modified the "rdma_lat.c" program to show the error (attached) Regards Asmund -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: bug.c URL: From vlad at lists.openfabrics.org Sat Nov 29 03:20:09 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sat, 29 Nov 2008 03:20:09 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081129-0200 daily build status Message-ID: <20081129112009.CEFB8E60324@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From alekseys at voltaire.com Sun Nov 30 00:24:40 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Sun, 30 Nov 2008 10:24:40 +0200 Subject: [ofa-general] [RMDA CM IPv6 support. PATCHv4 1/6] AF_INET6 support for rdma_bind_addr In-Reply-To: References: <1227721899.3121.18.camel@alst60.voltaire.com> Message-ID: <1228033480.3621.5.camel@alst60.voltaire.com> You are right, this one should be, probably, applied as last in the series. And the first should be this one: > static int cma_bind_any(struct rdma_cm_id *id, sa_family_t af) > { > - struct sockaddr_in addr_in; > + struct sockaddr_storage addr_in; > > memset(&addr_in, 0, sizeof addr_in); > - addr_in.sin_family = af; > + addr_in.ss_family = af; > return rdma_bind_addr(id, (struct sockaddr *) &addr_in); > } > But.. All other patches depends one on another, and in my opinion better to apply it all together, otherwise, when separated, all those 'if' statements have no sense. So, I'll be waiting for Sean input too. From jackm at dev.mellanox.co.il Sun Nov 30 00:28:18 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sun, 30 Nov 2008 10:28:18 +0200 Subject: [ofa-general] [PATCH] mlx4_ib/mthca: Fix dispatch =?iso-8859-1?q?of=09IB=5FEVENT=5FLID=5FCHANGE?= In-Reply-To: <492EAB55.7020003@Voltaire.COM> References: <492C226D.7040009@Voltaire.COM> <200811271551.47076.jackm@dev.mellanox.co.il> <492EAB55.7020003@Voltaire.COM> Message-ID: <200811301028.19171.jackm@dev.mellanox.co.il> I've split this patch into two separate patches, one for ib_mthca, and another for mlx4_ib (for better trackability). I've committed both of them to OFED 1.4 (so that they will be in tomorrow's daily) I'll post them shortly to the list as a 2-patch sequence. - Jack On Thursday 27 November 2008 16:14, Moni Shoua wrote: > Thanks again. > Now with your other fix > > -- > drivers/infiniband/hw/mlx4/mad.c | 25 +++++++++++++++++++------ > drivers/infiniband/hw/mthca/mthca_mad.c | 23 ++++++++++++++++++----- > 2 files changed, 37 insertions(+), 11 deletions(-) > > diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c > index 606f1e2..d5971a1 100644 > --- a/drivers/infiniband/hw/mlx4/mad.c > +++ b/drivers/infiniband/hw/mlx4/mad.c > @@ -147,7 +147,8 @@ static void update_sm_ah(struct mlx4_ib_dev *dev, u8 port_num, u16 lid, u8 sl) > * Snoop SM MADs for port info and P_Key table sets, so we can > * synthesize LID change and P_Key change events. > */ > -static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad) > +static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad, > + u16 prev_lid) > { > struct ib_event event; > > @@ -157,6 +158,7 @@ static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad) > if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO) { > struct ib_port_info *pinfo = > (struct ib_port_info *) ((struct ib_smp *) mad)->data; > + u16 lid = be16_to_cpu(pinfo->lid); > > update_sm_ah(to_mdev(ibdev), port_num, > be16_to_cpu(pinfo->sm_lid), > @@ -165,12 +167,15 @@ static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad) > event.device = ibdev; > event.element.port_num = port_num; > > - if (pinfo->clientrereg_resv_subnetto & 0x80) > + if (pinfo->clientrereg_resv_subnetto & 0x80) { > event.event = IB_EVENT_CLIENT_REREGISTER; > - else > + ib_dispatch_event(&event); > + } > + if (prev_lid != lid) { > event.event = IB_EVENT_LID_CHANGE; > + ib_dispatch_event(&event); > + } > > - ib_dispatch_event(&event); > } > > if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PKEY_TABLE) { > @@ -228,8 +233,9 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, > struct ib_wc *in_wc, struct ib_grh *in_grh, > struct ib_mad *in_mad, struct ib_mad *out_mad) > { > - u16 slid; > + u16 slid, prev_lid = 0; > int err; > + struct ib_port_attr pattr; > > slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE); > > @@ -263,6 +269,13 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, > } else > return IB_MAD_RESULT_SUCCESS; > > + if ((in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || > + in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && > + in_mad->mad_hdr.method == IB_MGMT_METHOD_SET && > + in_mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO && > + !ib_query_port(ibdev, port_num, &pattr)) > + prev_lid = pattr.lid; > + > err = mlx4_MAD_IFC(to_mdev(ibdev), > mad_flags & IB_MAD_IGNORE_MKEY, > mad_flags & IB_MAD_IGNORE_BKEY, > @@ -271,7 +284,7 @@ int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, > return IB_MAD_RESULT_FAILURE; > > if (!out_mad->mad_hdr.status) { > - smp_snoop(ibdev, port_num, in_mad); > + smp_snoop(ibdev, port_num, in_mad, prev_lid); > node_desc_override(ibdev, out_mad); > } > > diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c > index 6404495..45ac68e 100644 > --- a/drivers/infiniband/hw/mthca/mthca_mad.c > +++ b/drivers/infiniband/hw/mthca/mthca_mad.c > @@ -104,7 +104,8 @@ static void update_sm_ah(struct mthca_dev *dev, > */ > static void smp_snoop(struct ib_device *ibdev, > u8 port_num, > - struct ib_mad *mad) > + struct ib_mad *mad, > + u16 prev_lid) > { > struct ib_event event; > > @@ -114,6 +115,7 @@ static void smp_snoop(struct ib_device *ibdev, > if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO) { > struct ib_port_info *pinfo = > (struct ib_port_info *) ((struct ib_smp *) mad)->data; > + u16 lid = be16_to_cpu(pinfo->lid); > > mthca_update_rate(to_mdev(ibdev), port_num); > update_sm_ah(to_mdev(ibdev), port_num, > @@ -123,12 +125,15 @@ static void smp_snoop(struct ib_device *ibdev, > event.device = ibdev; > event.element.port_num = port_num; > > - if (pinfo->clientrereg_resv_subnetto & 0x80) > + if (pinfo->clientrereg_resv_subnetto & 0x80) { > event.event = IB_EVENT_CLIENT_REREGISTER; > - else > + ib_dispatch_event(&event); > + } > + if (prev_lid != lid) { > event.event = IB_EVENT_LID_CHANGE; > + ib_dispatch_event(&event); > + } > > - ib_dispatch_event(&event); > } > > if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PKEY_TABLE) { > @@ -196,6 +201,8 @@ int mthca_process_mad(struct ib_device *ibdev, > int err; > u8 status; > u16 slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE); > + u16 prev_lid = 0; > + struct ib_port_attr pattr; > > /* Forward locally generated traps to the SM */ > if (in_mad->mad_hdr.method == IB_MGMT_METHOD_TRAP && > @@ -233,6 +240,12 @@ int mthca_process_mad(struct ib_device *ibdev, > return IB_MAD_RESULT_SUCCESS; > } else > return IB_MAD_RESULT_SUCCESS; > + if ((in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || > + in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && > + in_mad->mad_hdr.method == IB_MGMT_METHOD_SET && > + in_mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO && > + !ib_query_port(ibdev, port_num, &pattr)) > + prev_lid = pattr.lid; > > err = mthca_MAD_IFC(to_mdev(ibdev), > mad_flags & IB_MAD_IGNORE_MKEY, > @@ -252,7 +265,7 @@ int mthca_process_mad(struct ib_device *ibdev, > } > > if (!out_mad->mad_hdr.status) { > - smp_snoop(ibdev, port_num, in_mad); > + smp_snoop(ibdev, port_num, in_mad, prev_lid); > node_desc_override(ibdev, out_mad); > } > > > From jackm at dev.mellanox.co.il Sun Nov 30 00:28:59 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sun, 30 Nov 2008 10:28:59 +0200 Subject: [ofa-general] [PATCH 1 of 2] mlx4_ib: Fix dispatch of IB_EVENT_LID_CHANGE In-Reply-To: <492EAB55.7020003@Voltaire.COM> References: <492C226D.7040009@Voltaire.COM> <200811271551.47076.jackm@dev.mellanox.co.il> <492EAB55.7020003@Voltaire.COM> Message-ID: <200811301028.59757.jackm@dev.mellanox.co.il> mlx4_ib: Fix dispatch of IB_EVENT_LID_CHANGE When snooping a portinfo MAD, its client_reregister bit is checked. If the bit is ON then a CLIENT_REREGISTER event is dispatched, otherwise a LID_CHANGE event is dispatched. This ignores the cases where the MAD changes the LID along with an instruction to reregister (so a necessary LID_CHANGE event won't be dispatched), or the MAD is neither of these (and an unnecessary LID_CHANGE event is dispatched). This patch dispatches an event if the client_reregister bit is set. In addition, the patch compares the LID in the MAD to the current LID. If and only if they are not identical, a LID_CHANGE event is dispatched. From: Moni Shoua Signed-off-by: Moni Shoua Signed-off-by: Jack Morgenstein Signed-off-by: Yossi Etigin --- Roland, Here is Moni's patch separated into two patches, one for mlx4_ib and one for ib_mthca. Jack Index: infiniband/drivers/infiniband/hw/mlx4/mad.c =================================================================== --- infiniband.orig/drivers/infiniband/hw/mlx4/mad.c 2008-11-04 10:21:02.000000000 +0200 +++ infiniband/drivers/infiniband/hw/mlx4/mad.c 2008-11-30 09:47:39.000000000 +0200 @@ -147,7 +147,8 @@ static void update_sm_ah(struct mlx4_ib_ * Snoop SM MADs for port info and P_Key table sets, so we can * synthesize LID change and P_Key change events. */ -static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad) +static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad, + u16 prev_lid) { struct ib_event event; @@ -157,6 +158,7 @@ static void smp_snoop(struct ib_device * if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO) { struct ib_port_info *pinfo = (struct ib_port_info *) ((struct ib_smp *) mad)->data; + u16 lid = be16_to_cpu(pinfo->lid); update_sm_ah(to_mdev(ibdev), port_num, be16_to_cpu(pinfo->sm_lid), @@ -165,12 +167,15 @@ static void smp_snoop(struct ib_device * event.device = ibdev; event.element.port_num = port_num; - if (pinfo->clientrereg_resv_subnetto & 0x80) + if (pinfo->clientrereg_resv_subnetto & 0x80) { event.event = IB_EVENT_CLIENT_REREGISTER; - else + ib_dispatch_event(&event); + } + if (prev_lid != lid) { event.event = IB_EVENT_LID_CHANGE; + ib_dispatch_event(&event); + } - ib_dispatch_event(&event); } if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PKEY_TABLE) { @@ -228,8 +233,9 @@ int mlx4_ib_process_mad(struct ib_device struct ib_wc *in_wc, struct ib_grh *in_grh, struct ib_mad *in_mad, struct ib_mad *out_mad) { - u16 slid; + u16 slid, prev_lid = 0; int err; + struct ib_port_attr pattr; slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE); @@ -263,6 +269,13 @@ int mlx4_ib_process_mad(struct ib_device } else return IB_MAD_RESULT_SUCCESS; + if ((in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || + in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && + in_mad->mad_hdr.method == IB_MGMT_METHOD_SET && + in_mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO && + !ib_query_port(ibdev, port_num, &pattr)) + prev_lid = pattr.lid; + err = mlx4_MAD_IFC(to_mdev(ibdev), mad_flags & IB_MAD_IGNORE_MKEY, mad_flags & IB_MAD_IGNORE_BKEY, @@ -271,7 +284,7 @@ int mlx4_ib_process_mad(struct ib_device return IB_MAD_RESULT_FAILURE; if (!out_mad->mad_hdr.status) { - smp_snoop(ibdev, port_num, in_mad); + smp_snoop(ibdev, port_num, in_mad, prev_lid); node_desc_override(ibdev, out_mad); } From jackm at dev.mellanox.co.il Sun Nov 30 00:29:01 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sun, 30 Nov 2008 10:29:01 +0200 Subject: [ofa-general] [PATCH 2 of 2] ib_mthca: Fix dispatch of IB_EVENT_LID_CHANGE Message-ID: <200811301029.02196.jackm@dev.mellanox.co.il> ib_mthca: Fix dispatch of IB_EVENT_LID_CHANGE When snooping a portinfo MAD, its client_reregister bit is checked. If the bit is ON then a CLIENT_REREGISTER event is dispatched, otherwise a LID_CHANGE event is dispatched. This ignores the cases where the MAD changes the LID along with an instruction to reregister (so a necessary LID_CHANGE event won't be dispatched), or the MAD is neither of these (and an unnecessary LID_CHANGE event is dispatched). This patch dispatches an event if the client_reregister bit is set. In addition, the patch compares the LID in the MAD to the current LID. If and only if they are not identical, a LID_CHANGE event is dispatched. From: Moni Shoua Signed-off-by: Moni Shoua Signed-off-by: Jack Morgenstein Signed-off-by: Yossi Etigin --- Roland, Here is Moni's patch separated into two patches, one for mlx4_ib and one for ib_mthca. Jack Index: infiniband/drivers/infiniband/hw/mthca/mthca_mad.c =================================================================== --- infiniband.orig/drivers/infiniband/hw/mthca/mthca_mad.c 2008-11-04 10:21:02.000000000 +0200 +++ infiniband/drivers/infiniband/hw/mthca/mthca_mad.c 2008-11-30 09:48:35.000000000 +0200 @@ -104,7 +104,8 @@ static void update_sm_ah(struct mthca_de */ static void smp_snoop(struct ib_device *ibdev, u8 port_num, - struct ib_mad *mad) + struct ib_mad *mad, + u16 prev_lid) { struct ib_event event; @@ -114,6 +115,7 @@ static void smp_snoop(struct ib_device * if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO) { struct ib_port_info *pinfo = (struct ib_port_info *) ((struct ib_smp *) mad)->data; + u16 lid = be16_to_cpu(pinfo->lid); mthca_update_rate(to_mdev(ibdev), port_num); update_sm_ah(to_mdev(ibdev), port_num, @@ -123,12 +125,15 @@ static void smp_snoop(struct ib_device * event.device = ibdev; event.element.port_num = port_num; - if (pinfo->clientrereg_resv_subnetto & 0x80) + if (pinfo->clientrereg_resv_subnetto & 0x80) { event.event = IB_EVENT_CLIENT_REREGISTER; - else + ib_dispatch_event(&event); + } + if (prev_lid != lid) { event.event = IB_EVENT_LID_CHANGE; + ib_dispatch_event(&event); + } - ib_dispatch_event(&event); } if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PKEY_TABLE) { @@ -196,6 +201,8 @@ int mthca_process_mad(struct ib_device * int err; u8 status; u16 slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE); + u16 prev_lid = 0; + struct ib_port_attr pattr; /* Forward locally generated traps to the SM */ if (in_mad->mad_hdr.method == IB_MGMT_METHOD_TRAP && @@ -233,6 +240,12 @@ int mthca_process_mad(struct ib_device * return IB_MAD_RESULT_SUCCESS; } else return IB_MAD_RESULT_SUCCESS; + if ((in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || + in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && + in_mad->mad_hdr.method == IB_MGMT_METHOD_SET && + in_mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO && + !ib_query_port(ibdev, port_num, &pattr)) + prev_lid = pattr.lid; err = mthca_MAD_IFC(to_mdev(ibdev), mad_flags & IB_MAD_IGNORE_MKEY, @@ -252,7 +265,7 @@ int mthca_process_mad(struct ib_device * } if (!out_mad->mad_hdr.status) { - smp_snoop(ibdev, port_num, in_mad); + smp_snoop(ibdev, port_num, in_mad, prev_lid); node_desc_override(ibdev, out_mad); } From vlad at lists.openfabrics.org Sun Nov 30 03:20:52 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sun, 30 Nov 2008 03:20:52 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081130-0200 daily build status Message-ID: <20081130112052.7C30AE60D46@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From sashak at voltaire.com Sun Nov 30 05:30:26 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 30 Nov 2008 15:30:26 +0200 Subject: [ofa-general] Re: [PATCH] opensm: skeleton for toroidal mesh analysis In-Reply-To: <000001c943c8$fef921f0$fceb65d0$@com> References: <000001c943c8$fef921f0$fceb65d0$@com> Message-ID: <20081130133026.GE9338@sashak.voltaire.com> Hi Bob, On 00:44 Tue 11 Nov , Robert Pearson wrote: > Sasha, > > Here is the first patch in a series to implement the algorithm described in > the file lash_changes.doc. > > This patch > - creates a new command line flag --do_mesh_analysis and a new Boolean > that is set if the flag is used. > - adds code to main to implement the flag and option. This also requires addition in OpenSM man page and ideally some explanations in opensm/doc/current-routing.txt document. This can be done as separate patch if you like. > - creates a new file osm_mesh.c to hold the algorithm code > - moves declarations from osm_ucast_lash.c and osm_mesh.c into header > files > - adds these files to Makefile.am > - adds a stub do_mesh_analysis() that is called from lash_core. > > Signed-off-by: Bob Pearson > > ----- > > diff --git a/opensm/include/opensm/osm_mesh.h > b/opensm/include/opensm/osm_mesh.h > new file mode 100644 > index 0000000..1467440 > --- /dev/null > +++ b/opensm/include/opensm/osm_mesh.h > @@ -0,0 +1,46 @@ > +/* > + * Copyright (c) 2088 System Fabric Works, Inc. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +/* > + * Abstract: > + * Declarations for mesh analysis > + */ > + > +#ifndef OSM_UCAST_MESH_H > +#define OSM_UCAST_MESH_H > + > +struct _lash; > + > +int do_mesh_analysis(struct _lash *p_lash); > + > +#endif > diff --git a/opensm/include/opensm/osm_subnet.h > b/opensm/include/opensm/osm_subnet.h > index 7259587..2abe36d 100644 > --- a/opensm/include/opensm/osm_subnet.h > +++ b/opensm/include/opensm/osm_subnet.h > @@ -215,6 +215,7 @@ typedef struct osm_subn_opt { > char *node_name_map_name; > char *prefix_routes_file; > boolean_t consolidate_ipv6_snm_req; > + boolean_t do_mesh_analysis; > } osm_subn_opt_t; > /* > * FIELDS > diff --git a/opensm/include/opensm/osm_ucast_lash.h > b/opensm/include/opensm/osm_ucast_lash.h > new file mode 100644 > index 0000000..646e9a3 > --- /dev/null > +++ b/opensm/include/opensm/osm_ucast_lash.h > @@ -0,0 +1,100 @@ > +/* > + * Copyright (c) 2008 System Fabric Works, Inc. > + * Copyright (c) 2004-2007 Voltaire, Inc. All rights reserved. > + * Copyright (c) 2002-2006 Mellanox Technologies LTD. All rights reserved. > + * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. > + * Copyright (c) 2007 Simula Research Laboratory. All rights reserved. > + * Copyright (c) 2007 Silicon Graphics Inc. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +/* > + * Abstract: > + * Declarations for LASH algorithm > + */ > + > +#ifndef OSM_UCAST_LASH_H > +#define OSM_UCAST_LASH_H > + > +enum { > + UNQUEUED, > + Q_MEMBER, > + MST_MEMBER, > + MAX_INT = 9999, > + NONE = MAX_INT > +}; > + > +typedef struct _cdg_vertex { > + int num_dependencies; > + struct _cdg_vertex **dependency; > + int from; > + int to; > + int seen; > + int temp; > + int visiting_number; > + struct _cdg_vertex *next; > + int num_temp_depend; > + int num_using_vertex; > + int *num_using_this_depend; > +} cdg_vertex_t; > + > +typedef struct _reachable_dest { > + int switch_id; > + struct _reachable_dest *next; > +} reachable_dest_t; > + > +typedef struct _switch { > + osm_switch_t *p_sw; > + int *dij_channels; > + int id; > + int used_channels; > + int q_state; > + struct routing_table { > + unsigned out_link; > + unsigned lane; > + } *routing_table; > + unsigned int num_connections; > + int *virtual_physical_port_table; > + int *phys_connections; > +} switch_t; > + > +typedef struct _lash { > + osm_opensm_t *p_osm; > + int num_switches; > + uint8_t vl_min; > + int balance_limit; > + switch_t **switches; > + cdg_vertex_t ****cdg_vertex_matrix; > + int *num_mst_in_lane; > + int ***virtual_location; > +} lash_t; > + > +#endif > diff --git a/opensm/opensm/Makefile.am b/opensm/opensm/Makefile.am > index 01573d2..7b9da18 100644 > --- a/opensm/opensm/Makefile.am > +++ b/opensm/opensm/Makefile.am > @@ -31,7 +31,7 @@ opensm_SOURCES = main.c osm_console_io.c osm_console.c > osm_db_files.c \ > osm_inform.c osm_lid_mgr.c osm_lin_fwd_rcv.c \ > osm_link_mgr.c osm_mcast_fwd_rcv.c \ > osm_mcast_mgr.c osm_mcast_tbl.c osm_mcm_info.c \ > - osm_mcm_port.c osm_mtree.c osm_multicast.c osm_node.c \ > + osm_mcm_port.c osm_mesh.c osm_mtree.c osm_multicast.c > osm_node.c \ > osm_node_desc_rcv.c osm_node_info_rcv.c \ > osm_opensm.c osm_pkey.c osm_pkey_mgr.c osm_pkey_rcv.c \ > osm_port.c osm_port_info_rcv.c \ > @@ -76,6 +76,7 @@ opensminclude_HEADERS = \ > $(srcdir)/../include/opensm/osm_errors.h \ > $(srcdir)/../include/opensm/osm_helper.h \ > $(srcdir)/../include/opensm/osm_inform.h \ > + $(srcdir)/../include/opensm/osm_ucast_lash.h \ > $(srcdir)/../include/opensm/osm_lid_mgr.h \ > $(srcdir)/../include/opensm/osm_log.h \ > $(srcdir)/../include/opensm/osm_mad_pool.h \ > @@ -83,6 +84,7 @@ opensminclude_HEADERS = \ > $(srcdir)/../include/opensm/osm_mcast_tbl.h \ > $(srcdir)/../include/opensm/osm_mcm_info.h \ > $(srcdir)/../include/opensm/osm_mcm_port.h \ > + $(srcdir)/../include/opensm/osm_mesh.h \ > $(srcdir)/../include/opensm/osm_mtree.h \ > $(srcdir)/../include/opensm/osm_multicast.h \ > $(srcdir)/../include/opensm/osm_msgdef.h \ > diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c > index 53648d6..63bd5a6 100644 > --- a/opensm/opensm/main.c > +++ b/opensm/opensm/main.c > @@ -585,6 +585,7 @@ int main(int argc, char *argv[]) > #endif > {"prefix_routes_file", 1, NULL, 3}, > {"consolidate_ipv6_snm_req", 0, NULL, 4}, > + {"do_mesh_analysis", 0, NULL, 5}, A new command line option requires addition (and some short explanation) in usage() function (invoked on 'opensm --help') and in OpenSM man page. Also I suppose this option should be added to OpenSM config file and not to be "command line only". > {NULL, 0, NULL, 0} /* Required at the end of the array > */ > }; > > @@ -922,6 +923,9 @@ int main(int argc, char *argv[]) > case 4: > opt.consolidate_ipv6_snm_req = TRUE; > break; > + case 5: > + opt.do_mesh_analysis = TRUE; > + break; > case 'h': > case '?': > case ':': > diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c > new file mode 100644 > index 0000000..7943274 > --- /dev/null > +++ b/opensm/opensm/osm_mesh.c > @@ -0,0 +1,65 @@ > +/* > + * Copyright (c) 2008 System Fabric Works, Inc. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +/* > + * Abstract: > + * routines to analyze certain meshes > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +/* > + * do_mesh_analysis > + */ > +int do_mesh_analysis(lash_t *p_lash) > +{ > + int ret = 0; > + osm_log_t *p_log = &p_lash->p_osm->log; > + > + OSM_LOG_ENTER(p_log); > + > + printf("lash: do_mesh_analysis stub called\n"); > + > + OSM_LOG_EXIT(p_log); > + > + return ret; > +} > diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c > index c082798..e10371c 100644 > --- a/opensm/opensm/osm_ucast_lash.c > +++ b/opensm/opensm/osm_ucast_lash.c > @@ -52,64 +52,13 @@ > #include > #include > #include > +#include > +#include > > /* //////////////////////////// */ > /* Local types */ > /* //////////////////////////// */ > > -enum { > - UNQUEUED, > - Q_MEMBER, > - MST_MEMBER, > - MAX_INT = 9999, > - NONE = MAX_INT > -}; > - > -typedef struct _cdg_vertex { > - int num_dependencies; > - struct _cdg_vertex **dependency; > - int from; > - int to; > - int seen; > - int temp; > - int visiting_number; > - struct _cdg_vertex *next; > - int num_temp_depend; > - int num_using_vertex; > - int *num_using_this_depend; > -} cdg_vertex_t; > - > -typedef struct _reachable_dest { > - int switch_id; > - struct _reachable_dest *next; > -} reachable_dest_t; > - > -typedef struct _switch { > - osm_switch_t *p_sw; > - int *dij_channels; > - int id; > - int used_channels; > - int q_state; > - struct routing_table { > - unsigned out_link; > - unsigned lane; > - } *routing_table; > - unsigned int num_connections; > - int *virtual_physical_port_table; > - int *phys_connections; > -} switch_t; > - > -typedef struct _lash { > - osm_opensm_t *p_osm; > - int num_switches; > - uint8_t vl_min; > - int balance_limit; > - switch_t **switches; > - cdg_vertex_t ****cdg_vertex_matrix; > - int *num_mst_in_lane; > - int ***virtual_location; > -} lash_t; > - > static cdg_vertex_t *create_cdg_vertex(unsigned num_switches) > { > cdg_vertex_t *cdg_vertex = (cdg_vertex_t *) > malloc(sizeof(cdg_vertex_t)); > @@ -872,10 +821,15 @@ static int lash_core(lash_t * p_lash) > int output_link2, i_next_switch2; > int cycle_found2 = 0; > int status = 0; > - int *switch_bitmap; /* Bitmap to check if we have processed this > pair */ > + int *switch_bitmap = NULL; /* Bitmap to check if we have > processed this pair */ Why this initialization is needed? > > OSM_LOG_ENTER(p_log); > > + if (p_lash->p_osm->subn.opt.do_mesh_analysis && > do_mesh_analysis(p_lash)) { > + OSM_LOG(p_log, OSM_LOG_ERROR, "Mesh analysis failed\n"); > + goto Exit; > + } > + > for (i = 0; i < num_switches; i++) { > > shortest_path(p_lash, i); From sashak at voltaire.com Sun Nov 30 05:48:57 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 30 Nov 2008 15:48:57 +0200 Subject: [ofa-general] Re: [PATCH][3] opensm: per mesh node information In-Reply-To: <000501c943d4$57b3f8f0$071bead0$@com> References: <000501c943d4$57b3f8f0$071bead0$@com> Message-ID: <20081130134857.GF9338@sashak.voltaire.com> On 02:06 Tue 11 Nov , Robert Pearson wrote: > Sasha, > > This is the third patch implementing the mesh analysis algorithm > > This patch > - creates per mesh node (e.g. switch) data structure mesh_node_t > - adds a pointer to mesh_node_t in the switch_t structure > - implements create and cleanup methods for node_t > - calls these in switch_create and swich_delete in *lash.c > > Regards, > > Bob Pearson > > Signed-off-by: Bob Pearson > ---- > diff --git a/opensm/include/opensm/osm_mesh.h > b/opensm/include/opensm/osm_mesh.h > index 8313614..78af086 100644 > --- a/opensm/include/opensm/osm_mesh.h > +++ b/opensm/include/opensm/osm_mesh.h > @@ -40,6 +40,39 @@ > #define OSM_UCAST_MESH_H > > struct _lash; > +struct _switch; > + > +enum mesh_node_type { > + mesh_type_none, > + mesh_type_cartesian, > +}; > + > +/* > + * per switch to switch link info > + */ > +typedef struct _link { > + int switch_id; > + int link_id; > + int *ports; > + int num_ports; > + int next_port; > +} link_t; > + > +/* > + * per switch node mesh info > + */ > +typedef struct _mesh_node { > + unsigned int num_links; /* number of 'links' to adjacent > switches */ > + link_t **links; /* per link information */ > + int *axes; /* used to hold and reorder assigned > axes */ > + int *coord; /* mesh coordinates of switch */ > + int **matrix; /* distances between adjacent > switches */ > + int *poly; /* characteristic polynomial of > matrix */ > + /* used as an invariant > classification */ > + enum mesh_node_type type; > + int dimension; /* apparent dimension of mesh around > node */ > + int temp; /* temporary holder for distance > info */ > +} mesh_node_t; > > /* > * per fabric mesh info > @@ -55,4 +88,7 @@ typedef struct _mesh { > void osm_mesh_cleanup(struct _lash *p_lash); > int osm_do_mesh_analysis(struct _lash *p_lash); > > +void osm_mesh_node_cleanup(struct _switch *sw); > +int osm_mesh_node_create(struct _lash *p_lash, struct _switch *sw); > + > #endif > diff --git a/opensm/include/opensm/osm_ucast_lash.h > b/opensm/include/opensm/osm_ucast_lash.h > index 1ae3bb6..c037571 100644 > --- a/opensm/include/opensm/osm_ucast_lash.h > +++ b/opensm/include/opensm/osm_ucast_lash.h > @@ -81,6 +81,7 @@ typedef struct _switch { > unsigned out_link; > unsigned lane; > } *routing_table; > + mesh_node_t *node; > unsigned int num_connections; > int *virtual_physical_port_table; > int *phys_connections; > diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c > index c97925b..6ef397c 100644 > --- a/opensm/opensm/osm_mesh.c > +++ b/opensm/opensm/osm_mesh.c > @@ -98,7 +98,7 @@ static int mesh_create(lash_t *p_lash) > } > > /* > - * do_mesh_analysis > + * osm_do_mesh_analysis > */ > int osm_do_mesh_analysis(lash_t *p_lash) > { > @@ -121,3 +121,83 @@ int osm_do_mesh_analysis(lash_t *p_lash) > > return ret; > } > + > +/* > + * osm_mesh_node_cleanup - cleanup per switch resources > + */ > +void osm_mesh_node_cleanup(switch_t *sw) > +{ > + int i; > + mesh_node_t *node = sw->node; > + unsigned num_ports = sw->p_sw->num_ports; > + > + if (node) { > + if (node->links) { > + for (i = 0; i < num_ports; i++) { > + if (node->links[i]) { > + if (node->links[i]->ports) > + free(node->links[i]->ports); > + free(node->links[i]); > + } > + } > + free(node->links); > + } > + > + if (node->poly) > + free(node->poly); > + > + if (node->matrix) { > + for (i = 0; i < node->num_links; i++) { > + if (node->matrix[i]) > + free(node->matrix[i]); > + } > + free(node->matrix); > + } > + > + if (node->axes) > + free(node->axes); > + > + free(node); > + > + sw->node = NULL; > + } > +} > + > +/* > + * osm_mesh_node_create - allocate per switch resources > + */ > +int osm_mesh_node_create(lash_t *p_lash, switch_t *sw) > +{ > + osm_log_t *p_log = &p_lash->p_osm->log; > + int i; > + mesh_node_t *node; > + unsigned num_ports = sw->p_sw->num_ports; > + > + if (!(node = sw->node = calloc(1, sizeof(mesh_node_t)))) { > + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating mesh node - > out of memory\n"); > + return -1; > + } > + > + if (!(node->links = calloc(num_ports, sizeof(link_t *)))) > + goto err; > + > + for (i = 0; i < num_ports; i++) { > + if (!(node->links[i] = calloc(1, sizeof(link_t))) || > + !(node->links[i]->ports = calloc(num_ports, > sizeof(int)))) > + goto err; > + } Assuming that ports array is preallocated, wouldn't it be simpler to define link as: typedef struct _link { int switch_id; int link_id; int num_ports; int next_port; int ports[0]; } link_t; , and then: node->links[i] = calloc(1, sizeof(link_t *) + num_ports * sizeof(int)))) ? (Similar optimizations are probably relevant in other places). Sasha > + > + if (!(node->axes = calloc(num_ports, sizeof(int)))) > + goto err; > + > + for (i = 0; i < num_ports; i++) { > + node->links[i]->switch_id = NONE; > + } > + > + return 0; > + > +err: > + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating mesh node - out of > memory\n"); > + osm_mesh_node_cleanup(sw); > + return -1; > +} > diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c > index 3577cca..b9394af 100644 > --- a/opensm/opensm/osm_ucast_lash.c > +++ b/opensm/opensm/osm_ucast_lash.c > @@ -651,6 +651,9 @@ static switch_t *switch_create(lash_t * p_lash, unsigned > id, osm_switch_t * p_sw > sw->phys_connections[i] = NONE; > } > > + if (osm_mesh_node_create(p_lash, sw)) > + return -1; > + > sw->p_sw = p_sw; > if (p_sw) > p_sw->priv = sw; > @@ -660,6 +663,8 @@ static switch_t *switch_create(lash_t * p_lash, unsigned > id, osm_switch_t * p_sw > > static void switch_delete(switch_t * sw) > { > + osm_mesh_node_cleanup(sw); > + > if (sw->dij_channels) > free(sw->dij_channels); > if (sw->virtual_physical_port_table) > > From sashak at voltaire.com Sun Nov 30 05:50:04 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 30 Nov 2008 15:50:04 +0200 Subject: [ofa-general] opensm support for toroidal meshes In-Reply-To: <000501c9437d$ffa7cd90$fef768b0$@com> References: <000501c9437d$ffa7cd90$fef768b0$@com> Message-ID: <20081130135004.GG9338@sashak.voltaire.com> Hi Bob, On 15:47 Mon 10 Nov , Robert Pearson wrote: > We have been involved in a project to deliver a large system based on a > toroidal mesh fabric. One of the requirements for this system is to be able > to guarantee a deadlock free routing of the fabric. The lash routing engine > in opensm did not work in this case because required number of VLs for the > machine as configured was 12 which exceeded the number of VLs supported by > Mellanox switch ASICs. It turns out that if one has the freedom to reorder > the order of the port assignments used by lash optimally that lash can > successfully route the fabric but that is impractical in the hardware. The > attached note describes an algorithm for automatically recognizing when a > Cartesian mesh fabric is a torus, determining its size and optimally > reordering the ports in opensm so that lash can generate a route with the > smallest number of VLs. > > We have implemented a set of changes to opensm that implement this algorithm > and will submit the changes as patches. This note will help to understand > the code. Thanks for the great work! I'm sending some initial comments (still learning the code). Sasha > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sashak at voltaire.com Sun Nov 30 05:54:40 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 30 Nov 2008 15:54:40 +0200 Subject: [ofa-general] Re: [PATCH][4] opensm: vector and matrix utilities In-Reply-To: <003201c9441c$d23ce8f0$76b6bad0$@com> References: <003201c9441c$d23ce8f0$76b6bad0$@com> Message-ID: <20081130135440.GH9338@sashak.voltaire.com> On 10:44 Tue 11 Nov , Robert Pearson wrote: > Sasha, > > Here is the fourth patch in a series implementing the mesh analysis > algorithm. > > This patch implements > - create and cleanup methods for polynomial with integer coefficients > - create and cleanup methods for square matrix with integer > coefficients > - create and cleanup methods for square matrix with polynomial > coefficients > - routine to compute the determinant of a matrix with polynomial > coefficients > > (Note the determinant is restricted to computing the characteristic > polynomial) > > Regards, > > Bob Pearson > > Signed-off-by: Bob Pearson > ---- > diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c > index 6ef397c..5dee1d0 100644 > --- a/opensm/opensm/osm_mesh.c > +++ b/opensm/opensm/osm_mesh.c > @@ -49,6 +49,295 @@ > #include > > /* > + * poly_alloc > + * > + * allocate a polynomial of degree n > + */ > +static int *poly_alloc(lash_t *p_lash, int n) > +{ > + osm_log_t *p_log = &p_lash->p_osm->log; > + int *p; > + > + if (!(p = calloc(n+1, sizeof(int)))) { > + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating poly - out > of memory\n"); > + } > + > + return p; > +} > + > +/* > + * poly_diff > + * > + * return a nonzero value if polynomials differ else 0 > + */ > +static int poly_diff(int n, int *p, switch_t *s) > +{ > + int i; > + > + if (s->node->num_links != n) > + return 1; > + > + for (i = 0; i <= n; i++) { > + if (s->node->poly[i] != p[i]) > + return 1; > + } memcmp(s->node->poly, p, n)? > + > + return 0; > +} > + > +/* > + * m_free > + * > + * free a square matrix of rank l > + */ > +static void m_free(int **m, int l) > +{ > + int i; > + > + if (m) { > + for (i = 0; i < l; i++) { > + if (m[i]) > + free(m[i]); > + } > + free(m); > + } > +} > + > +/* > + * m_alloc > + * > + * allocate a square matrix of rank l > + */ > +static int **m_alloc(lash_t *p_lash, int l) > +{ > + osm_log_t *p_log = &p_lash->p_osm->log; > + int i; > + int **m = NULL; > + > + do { > + if (!(m = calloc(l, sizeof(int *)))) > + break; > + > + for (i = 0; i < l; i++) { > + if (!(m[i] = calloc(l, sizeof(int)))) > + break; > + } > + if (i != l) > + break; > + > + return m; > + } while(0); Maybe just m = calloc(l*l, sizeof(int))? > + > + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating matrix - out of > memory\n"); > + > + m_free(m, l); > + return NULL; > +} > + > +/* > + * pm_free > + * > + * free a square matrix of rank l of polynomials > + */ > +static void pm_free(int ***m, int l) > +{ > + int i, j; > + > + if (m) { > + for (i = 0; i < l; i++) { > + if (m[i]) { > + for (j = 0; j < l; j++) { > + if (m[i][j]) > + free(m[i][j]); > + } > + free(m[i]); > + } > + } > + free(m); > + } > +} > + > +/* > + * pm_alloc > + * > + * allocate a square matrix of rank l of polynomials of degree n > + */ > +static int ***pm_alloc(lash_t *p_lash, int l, int n) > +{ > + osm_log_t *p_log = &p_lash->p_osm->log; > + int i, j; > + int ***m = NULL; > + > + do { > + if (!(m = calloc(l, sizeof(int **)))) > + break; > + > + for (i = 0; i < l; i++) { > + if (!(m[i] = calloc(l, sizeof(int *)))) > + break; > + > + for (j = 0; j < l; j++) { > + if (!(m[i][j] = calloc(n+1, sizeof(int)))) > + break; > + } > + if (j != l) > + break; > + } > + if (i != l) > + break; > + > + return m; > + } while(0); Ditto. > + > + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating matrix - out of > memory\n"); > + > + pm_free(m, l); > + return NULL; > +} > + > +static int determinant(lash_t *p_lash, int n, int rank, int ***m, int *p); > + > +/* > + * sub_determinant > + * > + * compute the determinant of a submatrix of matrix of rank l of > polynomials of degree n > + * with row and col removed in poly. caller must free poly > + */ > +static int sub_determinant(lash_t *p_lash, int n, int l, int row, int col, > int ***matrix, int **poly) > +{ > + int ret = -1; > + int ***m = NULL; > + int *p = NULL; > + int i, j, k, x, y; > + int rank = l - 1; > + > + do { > + if (!(p = poly_alloc(p_lash, n))) { > + break; > + } > + > + if (rank <= 0) { > + p[0] = 1; > + ret = 0; > + break; > + } > + > + if (!(m = pm_alloc(p_lash, rank, n))) { > + free(p); > + p = NULL; > + break; > + } > + > + x = 0; > + for (i = 0; i < l; i++) { > + if (i == row) > + continue; > + > + y = 0; > + for (j = 0; j < l; j++) { > + if (j == col) > + continue; > + > + for (k = 0; k <= n; k++) > + m[x][y][k] = matrix[i][j][k]; > + > + y++; > + } > + x++; > + } > + > + if (determinant(p_lash, n, rank, m, p)) { > + free(p); > + p = NULL; > + break; > + } > + > + ret = 0; > + } while(0); > + > + pm_free(m, rank); > + *poly = p; > + return ret; > +} > + > +/* > + * determinant > + * > + * compute the determinant of matrix m of rank of polynomials of degree deg > + * and add the result to polynomial p allocated by caller > + */ > +static int determinant(lash_t *p_lash, int deg, int rank, int ***m, int *p) > +{ > + int i, j, k; > + int *q; > + int sign = 1; > + > + /* > + * handle simple case of 1x1 matrix > + */ > + if (rank == 1) { > + for (i = 0; i <= deg; i++) > + p[i] += m[0][0][i]; > + } > + > + /* > + * handle simple case of 2x2 matrix > + */ > + else if (rank == 2) { > + for (i = 0; i <= deg; i++) { > + if (m[0][0][i] == 0) > + continue; > + > + for (j = 0; j <= deg; j++) { > + if (m[1][1][j] == 0) > + continue; > + > + p[i+j] += m[0][0][i]*m[1][1][j]; > + } > + } > + > + for (i = 0; i <= deg; i++) { > + if (m[0][1][i] == 0) > + continue; > + > + for (j = 0; j <= deg; j++) { > + if (m[1][0][j] == 0) > + continue; > + > + p[i+j] -= m[0][1][i]*m[1][0][j]; > + } > + } > + } > + > + /* > + * handle the general case > + */ > + else { > + for (i = 0; i < rank; i++) { > + if (sub_determinant(p_lash, deg, rank, 0, i, m, &q)) > + return -1; > + > + for (j = 0; j <= deg; j++) { > + if (m[0][i][j] == 0) > + continue; > + > + for (k = 0; k <= deg; k++) { > + if (q[k] == 0) > + continue; > + > + p[j+k] += sign*m[0][i][j]*q[k]; > + } > + } > + > + free(q); > + sign = -sign; > + } > + } > + > + return 0; > +} > + > +/* > * osm_mesh_cleanup - free per mesh resources > */ > void osm_mesh_cleanup(lash_t *p_lash) > > From sashak at voltaire.com Sun Nov 30 07:28:18 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 30 Nov 2008 17:28:18 +0200 Subject: [ofa-general] Re: [PATCH][5] opensm: compute local geometry In-Reply-To: <003301c9441e$eed2f480$cc78dd80$@com> References: <003301c9441e$eed2f480$cc78dd80$@com> Message-ID: <20081130152818.GI9338@sashak.voltaire.com> On 10:59 Tue 11 Nov , Robert Pearson wrote: > Sasha, > > Here is the fifth patch implementing the mesh analysis algorithm. > > This patch implements > - routine to compute characteristics polynomial of a matrix > - routine to compute the local 'metric' around each switch > - routine to classify switches into a histogram of local geometry > classes > > Regards, > > Bob Pearson > > Signed-off-by: Bob Pearson > ---- > diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c > index 7434fee..9254de3 100644 > --- a/opensm/opensm/osm_mesh.c > +++ b/opensm/opensm/osm_mesh.c > @@ -338,6 +338,172 @@ static int determinant(lash_t *p_lash, int deg, int > rank, int ***m, int *p) > } > > /* > + * char_poly > + * > + * compute the characteristic polynomial of matrix of rank > + * by computing the determinant of m-x*I and return in poly > + * as an array. caller must free poly > + */ > +static int char_poly(lash_t *p_lash, int rank, int **matrix, int **poly) > +{ > + int ret = -1; > + int i, j; > + int ***m = NULL; > + int *p = NULL; > + int deg = rank; > + > + do { > + if (!(p = poly_alloc(p_lash, deg))) { > + break; > + } > + > + if (!(m = pm_alloc(p_lash, rank, deg))) { > + free(p); > + p = NULL; > + break; > + } > + > + for (i = 0; i < rank; i++) { > + for (j = 0; j < rank; j++) { > + m[i][j][0] = matrix[i][j]; > + } > + m[i][i][1] = -1; > + } > + > + if (determinant(p_lash, deg, rank, m, p)) { > + free(p); > + p = NULL; > + break; > + } > + > + ret = 0; > + } while(0); > + > + pm_free(m, rank); > + *poly = p; > + return ret; > +} > + > +/* > + * get_switch_metric > + * > + * compute the matrix of minimum distances between each of > + * the adjacent switch nodes to sw along paths > + * that do not go through sw. do calculation by > + * relaxation method > + * allocate space for the matrix and save in node_t structure > + */ > +static int get_switch_metric(lash_t *p_lash, int sw) > +{ > + int ret = -1; > + int i, j, change; > + int sw1, sw2, sw3; > + switch_t *s = p_lash->switches[sw]; > + switch_t *s1, *s2, *s3; > + int **m; > + mesh_node_t *node = s->node; > + int num_links = node->num_links; > + > + do { > + if (!(m = m_alloc(p_lash, num_links))) > + break; > + > + for (i = 0; i < num_links; i++) { > + sw1 = node->links[i]->switch_id; > + s1 = p_lash->switches[sw1]; > + > + /* make all distances big except s1 to itself */ > + for (sw2 = 0; sw2 < p_lash->num_switches; sw2++) > + p_lash->switches[sw2]->node->temp = > 0x7fffffff; > + > + s1->node->temp = 0; > + > + do { > + change = 0; > + > + for (sw2 = 0; sw2 < p_lash->num_switches; > sw2++) { > + s2 = p_lash->switches[sw2]; > + if (s2->node->temp == 0x7fffffff) > + continue; > + for (j = 0; j < s2->node->num_links; > j++) { > + sw3 = > s2->node->links[j]->switch_id; > + s3 = p_lash->switches[sw3]; > + > + if (sw3 == sw) > + continue; > + > + if ((s2->node->temp + 1) < > s3->node->temp) { > + s3->node->temp = > s2->node->temp + 1; > + change++; > + } > + } > + } > + } while(change); As far as I can understand it is minimal hops calculation. We already have this information in OpenSM switches lmx mtrices. Using this matrix 'm' could be created as: for (i = 0; i < num_links; i++) { sw1 = node->links[i]->switch_id; s1 = p_lash->switches[sw1]; for (i = 0; i < num_links; i++) { unsigned lid; sw2 = node->links[i]->switch_id; s2 = p_lash->switches[sw2]; lid = cl_ntoh16(osm_node_get_base_lid(s2->p_sw->p_node, 0)); m[i][j] = osm_switch_get_least_hops(s1->p_sw, lid); } } > + > + for (j = 0; j < num_links; j++) { > + sw2 = node->links[j]->switch_id; > + s2 = p_lash->switches[sw2]; > + m[i][j] = s2->node->temp; > + } > + } > + > + if (char_poly(p_lash, num_links, m, &node->poly)) { > + m_free(m, num_links); > + m = NULL; > + break; > + } > + > + ret = 0; > + } while(0); > + > + node->matrix = m; > + return ret; > +} > + > +/* > + * classify_switch > + * > + * add switch to histogram of switch types > + */ > +static void classify_switch(lash_t *p_lash, int sw) > +{ > + int i; > + switch_t *s = p_lash->switches[sw]; > + switch_t *s1; > + mesh_t *mesh = p_lash->mesh; > + > + for (i = 0; i < mesh->num_class; i++) { > + s1 = p_lash->switches[mesh->class_type[i]]; > + > + if (poly_diff(s->node->num_links, s->node->poly, s1)) > + continue; > + > + mesh->class_count[i]++; > + return; > + } > + > + mesh->class_type[mesh->num_class] = sw; > + mesh->class_count[mesh->num_class] = 1; > + mesh->num_class++; > + return; > +} > + > +/* > + * get_local_geometry > + * > + * analyze the local geometry around each switch > + */ > +static void get_local_geometry(lash_t *p_lash) > +{ > + int sw; > + > + for (sw = 0; sw < p_lash->num_switches; sw++) { > + get_switch_metric(p_lash, sw); > + classify_switch(p_lash, sw); > + } > +} > + > +/* > * osm_mesh_cleanup - free per mesh resources > */ > void osm_mesh_cleanup(lash_t *p_lash) > @@ -404,6 +570,12 @@ int osm_do_mesh_analysis(lash_t *p_lash) > return -1; > } > > + /* > + * get local metric and invariant for each switch > + * also classify each switch > + */ > + get_local_geometry(p_lash); > + > printf("lash: do_mesh_analysis stub called\n"); > > OSM_LOG_EXIT(p_log); > > From sashak at voltaire.com Sun Nov 30 08:36:37 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 30 Nov 2008 18:36:37 +0200 Subject: [ofa-general] Re: [PATCH][5] opensm: compute local geometry In-Reply-To: <003301c9441e$eed2f480$cc78dd80$@com> References: <003301c9441e$eed2f480$cc78dd80$@com> Message-ID: <20081130163637.GJ9338@sashak.voltaire.com> On 10:59 Tue 11 Nov , Robert Pearson wrote: > Sasha, > > Here is the fifth patch implementing the mesh analysis algorithm. > > This patch implements > - routine to compute characteristics polynomial of a matrix > - routine to compute the local 'metric' around each switch I checked performance of determinant calculation - when switch has 8 links it takes 11-12 seconds per switch, with 10 links - 2177 seconds. Is it possible to improve performance there? Sasha From sashak at voltaire.com Sun Nov 30 08:39:38 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 30 Nov 2008 18:39:38 +0200 Subject: [ofa-general] Re: [PATCH][5] opensm: compute local geometry In-Reply-To: <20081130163637.GJ9338@sashak.voltaire.com> References: <003301c9441e$eed2f480$cc78dd80$@com> <20081130163637.GJ9338@sashak.voltaire.com> Message-ID: <20081130163938.GK9338@sashak.voltaire.com> On 18:36 Sun 30 Nov , Sasha Khapyorsky wrote: > On 10:59 Tue 11 Nov , Robert Pearson wrote: > > Sasha, > > > > Here is the fifth patch implementing the mesh analysis algorithm. > > > > This patch implements > > - routine to compute characteristics polynomial of a matrix > > - routine to compute the local 'metric' around each switch > > I checked performance of determinant calculation - when switch has 8 > links it takes 11-12 seconds per switch, with 10 links - 2177 seconds. Oops, sorry. The results above are for 10 and 12 links. Sasha From rpearson at systemfabricworks.com Sun Nov 30 08:56:36 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Sun, 30 Nov 2008 10:56:36 -0600 Subject: [ofa-general] RE: [PATCH][5] opensm: compute local geometry In-Reply-To: <20081130163938.GK9338@sashak.voltaire.com> References: <003301c9441e$eed2f480$cc78dd80$@com> <20081130163637.GJ9338@sashak.voltaire.com> <20081130163938.GK9338@sashak.voltaire.com> Message-ID: <00f401c9530c$9bb4a530$d31def90$@com> I am looking at the earlier posts. I had thought about this one before. All the cases where this algorithm applies have low port counts. I can fix this by just not doing the determinant if the port count is larger than the highest order polynomial in the table since none of them will match. -----Original Message----- From: Sasha Khapyorsky [mailto:sashak at voltaire.com] Sent: Sunday, November 30, 2008 10:40 AM To: Robert Pearson Cc: general at lists.openfabrics.org Subject: Re: [PATCH][5] opensm: compute local geometry On 18:36 Sun 30 Nov , Sasha Khapyorsky wrote: > On 10:59 Tue 11 Nov , Robert Pearson wrote: > > Sasha, > > > > Here is the fifth patch implementing the mesh analysis algorithm. > > > > This patch implements > > - routine to compute characteristics polynomial of a matrix > > - routine to compute the local 'metric' around each switch > > I checked performance of determinant calculation - when switch has 8 > links it takes 11-12 seconds per switch, with 10 links - 2177 seconds. Oops, sorry. The results above are for 10 and 12 links. Sasha From rdreier at cisco.com Sun Nov 30 09:28:38 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 30 Nov 2008 09:28:38 -0800 Subject: [ofa-general] [RMDA CM IPv6 support. PATCHv4 1/6] AF_INET6 support for rdma_bind_addr In-Reply-To: <1228033480.3621.5.camel@alst60.voltaire.com> (Aleksey Senin's message of "Sun, 30 Nov 2008 10:24:40 +0200") References: <1227721899.3121.18.camel@alst60.voltaire.com> <1228033480.3621.5.camel@alst60.voltaire.com> Message-ID: > But.. All other patches depends one on another, and in my opinion better > to apply it all together, otherwise, when separated, all those 'if' > statements have no sense. Umm, OK. So why did you send 6 separate patches? From sashak at voltaire.com Sun Nov 30 09:59:29 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 30 Nov 2008 19:59:29 +0200 Subject: [ofa-general] Re: [PATCH][5] opensm: compute local geometry In-Reply-To: <20081130152818.GI9338@sashak.voltaire.com> References: <003301c9441e$eed2f480$cc78dd80$@com> <20081130152818.GI9338@sashak.voltaire.com> Message-ID: <20081130175929.GL9338@sashak.voltaire.com> On 17:28 Sun 30 Nov , Sasha Khapyorsky wrote: > > + > > + do { > > + if (!(m = m_alloc(p_lash, num_links))) > > + break; > > + > > + for (i = 0; i < num_links; i++) { > > + sw1 = node->links[i]->switch_id; > > + s1 = p_lash->switches[sw1]; > > + > > + /* make all distances big except s1 to itself */ > > + for (sw2 = 0; sw2 < p_lash->num_switches; sw2++) > > + p_lash->switches[sw2]->node->temp = > > 0x7fffffff; > > + > > + s1->node->temp = 0; > > + > > + do { > > + change = 0; > > + > > + for (sw2 = 0; sw2 < p_lash->num_switches; > > sw2++) { > > + s2 = p_lash->switches[sw2]; > > + if (s2->node->temp == 0x7fffffff) > > + continue; > > + for (j = 0; j < s2->node->num_links; > > j++) { > > + sw3 = > > s2->node->links[j]->switch_id; > > + s3 = p_lash->switches[sw3]; > > + > > + if (sw3 == sw) > > + continue; > > + > > + if ((s2->node->temp + 1) < > > s3->node->temp) { > > + s3->node->temp = > > s2->node->temp + 1; > > + change++; > > + } > > + } > > + } > > + } while(change); > > As far as I can understand it is minimal hops calculation. > > We already have this information in OpenSM switches lmx mtrices. Using > this matrix 'm' could be created as: > > for (i = 0; i < num_links; i++) { > sw1 = node->links[i]->switch_id; > s1 = p_lash->switches[sw1]; > > for (i = 0; i < num_links; i++) { > unsigned lid; > sw2 = node->links[i]->switch_id; > s2 = p_lash->switches[sw2]; > lid = cl_ntoh16(osm_node_get_base_lid(s2->p_sw->p_node, 0)); > > m[i][j] = osm_switch_get_least_hops(s1->p_sw, lid); > } > } Actually this my assumption is wrong. 'm' matrix contains min hops except paths which can cross the original switch. So it should be done differently, maybe something like this: for (i = 0; i < num_links; i++) { sw1 = node->links[i]->switch_id; s1 = p_lash->switches[sw1]; for (j = 0; j < num_links; j++) { unsigned lid, p, h, hops = 0xff; sw2 = node->links[j]->switch_id; if (sw1 == sw2) { m1[i][j] = 0; continue; } s2 = p_lash->switches[sw2]; lid = cl_ntoh16(osm_node_get_base_lid(s2->p_sw->p_node, 0)); for (p = 1 ; p < s1->p_sw->num_ports; p++) { h = osm_switch_get_hop_count(s1->p_sw, lid, p); osm_physp_t *physp = osm_node_get_physp_ptr(s1->p_sw->p_node, p); if (h < hops && physp->p_remote_physp->p_node->sw != s->p_sw) hops = h; } m1[i][j] = hops; } } Sasha From sashak at voltaire.com Sun Nov 30 10:03:07 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 30 Nov 2008 20:03:07 +0200 Subject: [ofa-general] Re: [PATCH][5] opensm: compute local geometry In-Reply-To: <00f401c9530c$9bb4a530$d31def90$@com> References: <003301c9441e$eed2f480$cc78dd80$@com> <20081130163637.GJ9338@sashak.voltaire.com> <20081130163938.GK9338@sashak.voltaire.com> <00f401c9530c$9bb4a530$d31def90$@com> Message-ID: <20081130180307.GM9338@sashak.voltaire.com> On 10:56 Sun 30 Nov , Robert Pearson wrote: > > I had thought about this one before. All the cases where this algorithm > applies have low port counts. I can fix this by just not doing the > determinant if the port count is larger than the highest order polynomial in > the table since none of them will match. I think it would be nice fix. Sasha From rpearson at systemfabricworks.com Sun Nov 30 10:39:53 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Sun, 30 Nov 2008 12:39:53 -0600 Subject: [ofa-general] [PATCH][11] opensm: add descriptions to docs and man page Message-ID: <00f601c9531b$096c88a0$1c4599e0$@com> Sasha, This patch adds some descriptive language to current_routing.txt and opensm.8.in. Regards, Bob Pearson Signed-off-by: Bob Pearson -------------- next part -------------- A non-text attachment was scrubbed... Name: patch11 Type: application/octet-stream Size: 2680 bytes Desc: not available URL: From rpearson at systemfabricworks.com Sun Nov 30 10:51:53 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Sun, 30 Nov 2008 12:51:53 -0600 Subject: [ofa-general] [PATCH][12] opensm: add descriptions to show_usage Message-ID: <010001c9531c$b636e390$22a4aab0$@com> Sasha, This patch adds language to show_usage for the --do_mesh_analysis flag. Regards, Bob Pearson Signed-off-by: Bob Pearson -------------- next part -------------- A non-text attachment was scrubbed... Name: patch12 Type: application/octet-stream Size: 952 bytes Desc: not available URL: From rpearson at systemfabricworks.com Sun Nov 30 10:58:49 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Sun, 30 Nov 2008 12:58:49 -0600 Subject: [ofa-general] RE: [PATCH] opensm: skeleton for toroidal mesh analysis In-Reply-To: <20081130133026.GE9338@sashak.voltaire.com> References: <000001c943c8$fef921f0$fceb65d0$@com> <20081130133026.GE9338@sashak.voltaire.com> Message-ID: <010a01c9531d$ae434150$0ac9c3f0$@com> You wrote: > @@ -872,10 +821,15 @@ static int lash_core(lash_t * p_lash) > int output_link2, i_next_switch2; > int cycle_found2 = 0; > int status = 0; > - int *switch_bitmap; /* Bitmap to check if we have processed this > pair */ > + int *switch_bitmap = NULL; /* Bitmap to check if we have > processed this pair */ Why this initialization is needed? The added code can fail which will cause a goto to Exit. At Exit switch_bitmap is freed if it is not zero. The added initialization makes sure it is zero. From sashak at voltaire.com Sun Nov 30 11:07:52 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 30 Nov 2008 21:07:52 +0200 Subject: [ofa-general] Re: [PATCH] opensm: skeleton for toroidal mesh analysis In-Reply-To: <010a01c9531d$ae434150$0ac9c3f0$@com> References: <000001c943c8$fef921f0$fceb65d0$@com> <20081130133026.GE9338@sashak.voltaire.com> <010a01c9531d$ae434150$0ac9c3f0$@com> Message-ID: <20081130190752.GN9338@sashak.voltaire.com> On 12:58 Sun 30 Nov , Robert Pearson wrote: > You wrote: > > @@ -872,10 +821,15 @@ static int lash_core(lash_t * p_lash) > > int output_link2, i_next_switch2; > > int cycle_found2 = 0; > > int status = 0; > > - int *switch_bitmap; /* Bitmap to check if we have processed this > > pair */ > > + int *switch_bitmap = NULL; /* Bitmap to check if we have > > processed this pair */ > > Why this initialization is needed? > > The added code can fail which will cause a goto to Exit. At Exit > switch_bitmap is freed if it is not zero. The added initialization makes > sure it is zero. Ok. I missed that. Sasha From rpearson at systemfabricworks.com Sun Nov 30 11:24:45 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Sun, 30 Nov 2008 13:24:45 -0600 Subject: [ofa-general] RE: [PATCH][3] opensm: per mesh node information In-Reply-To: <20081130134857.GF9338@sashak.voltaire.com> References: <000501c943d4$57b3f8f0$071bead0$@com> <20081130134857.GF9338@sashak.voltaire.com> Message-ID: <010b01c95321$4e962a20$ebc27e60$@com> Hi Sasha You wrote: > + if (!(node->links = calloc(num_ports, sizeof(link_t *)))) > + goto err; > + > + for (i = 0; i < num_ports; i++) { > + if (!(node->links[i] = calloc(1, sizeof(link_t))) || > + !(node->links[i]->ports = calloc(num_ports, > sizeof(int)))) > + goto err; > + } Assuming that ports array is preallocated, wouldn't it be simpler to define link as: typedef struct _link { int switch_id; int link_id; int num_ports; int next_port; int ports[0]; } link_t; , and then: node->links[i] = calloc(1, sizeof(link_t *) + num_ports * sizeof(int)))) ? (Similar optimizations are probably relevant in other places). I agree they accomplish the same goal. It is a tradeoff between code that is a little shorter and faster and ease of understanding. I don't have strong feelings. (For the same reason I tend to use 'x = calloc(1, foo)' instead of 'x = malloc(foo); memset(x, 0, foo);' which is a very common usage pattern.) The same applies to your later note. We can represent a two dimensional array as int **array; followed by array = calloc(1, n*sizeof(int *)); array[i] = calloc(1, m*sizeof(int)); ... and then you get to type array[i][j] = xxx; vs int *array; array = calloc(1, m*n*sizeof(int)); and then array[i*m+j] = xxx; You can't use array[i][j] here because the compiler doesn't know the size of the array until run time. If the code is at all complex I prefer the [][] notation because it is easier to read and understand. The optimizer in the compiler will take the pointer dereference or the multiply out of inner loops so there is not normally a big performance difference. I guess that this code is complex enough that at least for now it is preferable to err on the side of keeping everything as straight forward as possible until we are sure that it is correct. Then if performance is an issue we can optimize it. I am happy either way. Let me know what you want me to do. Regards, Bob From sashak at voltaire.com Sun Nov 30 12:57:53 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 30 Nov 2008 22:57:53 +0200 Subject: [ofa-general] Re: [PATCH][8] opensm: measure size and reorder links In-Reply-To: <004501c94424$23551620$69ff4260$@com> References: <004501c94424$23551620$69ff4260$@com> Message-ID: <20081130205753.GO9338@sashak.voltaire.com> On 11:37 Tue 11 Nov , Robert Pearson wrote: > Sasha, > > > > Here is the eighth patch implementing the mesh analysis algorithm. > > > > This patch implements > > - routine to reorder links and measure the size of the mesh > > > > Regards, > > > > Bob Pearson > > > > Signed-off-by: Bob Pearson > > ---- > > diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c > > index 65afae6..a248522 100644 > > --- a/opensm/opensm/osm_mesh.c > > +++ b/opensm/opensm/osm_mesh.c > > @@ -832,6 +832,183 @@ next_j: > > } > > > > /* > > + * return |a| < |b| > > + */ > > +static inline int ltmag(int a, int b) > > +{ > > + int a1 = (a >= 0)? a : -a; > > + int b1 = (b >= 0)? b : -b; > > + > > + return (a1 < b1) || (a1 == b1 && a > b); > > +} > > + > > +/* > > + * reorder_links > > + * > > + * reorder the links out of a switch in sign/dimension order > > + */ > > +static int reorder_links(lash_t *p_lash, int sw) > > +{ > > + osm_log_t *p_log = &p_lash->p_osm->log; > > + switch_t *s = p_lash->switches[sw]; > > + mesh_node_t *node = s->node; > > + int n = node->num_links; > > + link_t **links; > > + int *axes; > > + int i, j; > > + int c; > > + int next = 0; > > + > > + if (!(links = calloc(n, sizeof(link_t *)))) { > > + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating temp array - > out of memory\n"); > > + return -1; > > + } > > + > > + if (!(axes = calloc(n, sizeof(int)))) { > > + free(links); > > + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating temp array - > out of memory\n"); > > + return -1; > > + } > > + > > + /* > > + * find the links with axes > > + */ > > + for (j = 1; j <= 2*node->dimension; j++) { > > + c = j; > > + if (node->coord[(c-1)/2] > 0) > > + c = opposite(s, c); > > + > > + for (i = 0; i < n; i++) { > > + if (!node->links[i]) > > + continue; > > + if (node->axes[i] == c) { > > + links[next] = node->links[i]; > > + axes[next] = node->axes[i]; > > + node->links[i] = NULL; > > + next++; > > + } > > + } > > + } > > + > > + /* > > + * get the rest > > + */ > > + for (i = 0; i < n; i++) { > > + if (!node->links[i]) > > + continue; > > + > > + links[next] = node->links[i]; > > + axes[next] = node->axes[i]; > > + node->links[i] = NULL; > > + next++; > > + } > > + > > + for (i = 0; i < n; i++) { > > + node->links[i] = links[i]; > > + node->axes[i] = axes[i]; > > + } > > + > > + free(links); > > + free(axes); > > + > > + return 0; > > +} > > + > > +/* > > + * measure geometry > > + */ > > +static int measure_geometry(lash_t *p_lash, int seed) > > +{ > > + int i, j, k; > > + int sw; > > + switch_t *s, *s1; > > + int change; > > + int dimension = p_lash->mesh->dimension; > > + int num_switches = p_lash->num_switches; > > + int assigned_axes = 0, unassigned_axes = 0; > > + int *max, *min; > > + > > + for (sw = 0; sw < num_switches; sw++) { > > + s = p_lash->switches[sw]; > > + > > + s->node->coord = calloc(dimension, sizeof(int)); Is there free() anywhere? I cannot find. > > + for (i = 0; i < dimension; i++) > > + s->node->coord[i] = (sw == seed)? 0 : 0x7fffffff; > > + > > + for (i = 0; i < s->node->num_links; i++) > > + if (s->node->axes[i] == 0) > > + unassigned_axes++; > > + else > > + assigned_axes++; > > + } > > + > > + printf("lash: %d/%d unassigned/assigned axes\n", unassigned_axes, > assigned_axes); > > + > > + do { > > + change = 0; > > + > > + for (sw = 0; sw < num_switches; sw++) { > > + s = p_lash->switches[sw]; > > + > > + if (s->node->coord[0] == 0x7fffffff) > > + continue; > > + > > + for (j = 0; j < s->node->num_links; j++) { > > + if (!s->node->axes[j]) > > + continue; > > + > > + s1 = p_lash->switches[s->node->links[j]->switch_id]; > > + > > + for (k = 0; k < dimension; k++) { > > + int coord = s->node->coord[k]; > > + int axis = s->node->axes[j] - 1; > > + > > + if (k == axis/2) > > + coord += (axis & 1)? -1 : +1; > > + > > + if (ltmag(coord, s1->node->coord[k])) { > > + s1->node->coord[k] = coord; > > + change++; > > + } > > + } > > + } > > + } > > + } while (change); > > + > > + for (sw = 0; sw < num_switches; sw++) { > > + if (reorder_links(p_lash, sw)) > > + return -1; > > + } > > + > > + max = calloc(dimension, sizeof(int)); > > + min = calloc(dimension, sizeof(int)); Are min and max freed? Sasha > > + p_lash->mesh->size = calloc(dimension, sizeof(int)); > > + > > + for (i = 0; i < dimension; i++) { > > + max[i] = -0x7fffffff; > > + min[i] = 0x7fffffff; > > + } > > + > > + for (sw = 0; sw < num_switches; sw++) { > > + s = p_lash->switches[sw]; > > + > > + for (i = 0; i < dimension; i++) { > > + if (s->node->coord[i] == 0x7fffffff) > > + continue; > > + if (s->node->coord[i] > max[i]) > > + max[i] = s->node->coord[i]; > > + if (s->node->coord[i] < min[i]) > > + min[i] = s->node->coord[i]; > > + } > > + } > > + > > + for (i = 0; i < dimension; i++) > > + p_lash->mesh->size[i] = max[i] - min[i] + 1; > > + > > + return 0; > > +} > > + > > +/* > > * osm_mesh_cleanup - free per mesh resources > > */ > > void osm_mesh_cleanup(lash_t *p_lash) > > @@ -941,6 +1118,14 @@ int osm_do_mesh_analysis(lash_t *p_lash) > > > > if (s->node->type) { > > make_geometry(p_lash, max_class_type); > > + > > + if (measure_geometry(p_lash, max_class_type)) > > + return -1; > > + > > + printf("lash: found "); > > + for (i = 0; i < mesh->dimension; i++) > > + printf("%s%d", i? "X" : "", mesh->size[i]); > > + printf(" mesh\n"); > > } > > > > OSM_LOG_EXIT(p_log); > > > > > From sashak at voltaire.com Sun Nov 30 13:09:11 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 30 Nov 2008 23:09:11 +0200 Subject: [ofa-general] Re: [PATCH][9] opensm: lash preparation In-Reply-To: <008701c9443c$cfc1f050$6f45d0f0$@com> References: <008701c9443c$cfc1f050$6f45d0f0$@com> Message-ID: <20081130210911.GP9338@sashak.voltaire.com> On 14:33 Tue 11 Nov , Robert Pearson wrote: > Sasha, > > Here is the ninth patch implementing the mesh analysis algorithm. > > This patch makes some minor cleanups in osm_ucast_lash.c in preparation for > next steps. > The main change is to minimize the occurrences of phys_connections. > Also there are a few nits: > - delete banner for local variables that moved to ...lash.h > - fix bad return value of osm_mesh_node_create fails I think it should be fixed in related patches (v2), so we will not have broken code in our history. > - clear sw->p_sw->priv on switch cleanup > - fix spelling error in comment > - discover_network_properties returns an error which was not checked Actually most of those (and maybe also get_next_port() function) are not really part of the mesh changes. I'm fine to get separately and to apply even before GA, since it fixes something. Sasha > > Regards, > > Bob Pearson > > Signed-off-by: Bob Pearson > ---- > diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c > index b9394af..95dbcc2 100644 > --- a/opensm/opensm/osm_ucast_lash.c > +++ b/opensm/opensm/osm_ucast_lash.c > @@ -55,10 +55,6 @@ > #include > #include > > -/* //////////////////////////// */ > -/* Local types */ > -/* //////////////////////////// */ > - > static cdg_vertex_t *create_cdg_vertex(unsigned num_switches) > { > cdg_vertex_t *cdg_vertex = (cdg_vertex_t *) > malloc(sizeof(cdg_vertex_t)); > @@ -150,6 +146,11 @@ static int cycle_exists(cdg_vertex_t * start, > cdg_vertex_t * current, > return cycle_found; > } > > +static inline int get_next_switch(lash_t *p_lash, int sw, int link) > +{ > + return p_lash->switches[sw]->phys_connections[link]; > +} > + > static void remove_semipermanent_depend_for_sp(lash_t * p_lash, int sw, > int dest_switch, int lane) > { > @@ -161,7 +162,7 @@ static void remove_semipermanent_depend_for_sp(lash_t * > p_lash, int sw, > int found; > > output_link = switches[sw]->routing_table[dest_switch].out_link; > - i_next_switch = switches[sw]->phys_connections[output_link]; > + i_next_switch = get_next_switch(p_lash, sw, output_link); > > while (sw != dest_switch) { > v = cdg_vertex_matrix[lane][sw][i_next_switch]; > @@ -177,8 +178,7 @@ static void remove_semipermanent_depend_for_sp(lash_t * > p_lash, int sw, > if (i_next_switch != dest_switch) { > next_link = > > switches[i_next_switch]->routing_table[dest_switch].out_link; > - i_next_next_switch = > - > switches[i_next_switch]->phys_connections[next_link]; > + i_next_next_switch = get_next_switch(p_lash, > i_next_switch, next_link); > found = 0; > > for (i = 0; i < v->num_dependencies; i++) > @@ -211,8 +211,7 @@ static void remove_semipermanent_depend_for_sp(lash_t * > p_lash, int sw, > output_link = > switches[sw]->routing_table[dest_switch].out_link; > > if (sw != dest_switch) > - i_next_switch = > - switches[sw]->phys_connections[output_link]; > + i_next_switch = get_next_switch(p_lash, sw, > output_link); > } > } > > @@ -312,7 +311,7 @@ static void generate_cdg_for_sp(lash_t * p_lash, int sw, > int dest_switch, > cdg_vertex_t *v, *prev = NULL; > > output_link = switches[sw]->routing_table[dest_switch].out_link; > - next_switch = switches[sw]->phys_connections[output_link]; > + next_switch = get_next_switch(p_lash, sw, output_link); > > while (sw != dest_switch) { > > @@ -368,7 +367,7 @@ static void generate_cdg_for_sp(lash_t * p_lash, int sw, > int dest_switch, > > if (sw != dest_switch) { > CL_ASSERT(output_link != NONE); > - next_switch = > switches[sw]->phys_connections[output_link]; > + next_switch = get_next_switch(p_lash, sw, > output_link); > } > > prev = v; > @@ -384,7 +383,7 @@ static void set_temp_depend_to_permanent_for_sp(lash_t * > p_lash, int sw, > cdg_vertex_t *v; > > output_link = switches[sw]->routing_table[dest_switch].out_link; > - next_switch = switches[sw]->phys_connections[output_link]; > + next_switch = get_next_switch(p_lash, sw, output_link); > > while (sw != dest_switch) { > v = cdg_vertex_matrix[lane][sw][next_switch]; > @@ -399,8 +398,7 @@ static void set_temp_depend_to_permanent_for_sp(lash_t * > p_lash, int sw, > output_link = > switches[sw]->routing_table[dest_switch].out_link; > > if (sw != dest_switch) > - next_switch = > - switches[sw]->phys_connections[output_link]; > + next_switch = get_next_switch(p_lash, sw, > output_link); > } > > } > @@ -414,7 +412,7 @@ static void remove_temp_depend_for_sp(lash_t * p_lash, > int sw, int dest_switch, > cdg_vertex_t *v; > > output_link = switches[sw]->routing_table[dest_switch].out_link; > - next_switch = switches[sw]->phys_connections[output_link]; > + next_switch = get_next_switch(p_lash, sw, output_link); > > while (sw != dest_switch) { > v = cdg_vertex_matrix[lane][sw][next_switch]; > @@ -439,8 +437,7 @@ static void remove_temp_depend_for_sp(lash_t * p_lash, > int sw, int dest_switch, > output_link = > switches[sw]->routing_table[dest_switch].out_link; > > if (sw != dest_switch) > - next_switch = > - switches[sw]->phys_connections[output_link]; > + next_switch = get_next_switch(p_lash, sw, > output_link); > > } > } > @@ -502,10 +499,10 @@ static void balance_virtual_lanes(lash_t * p_lash, > unsigned lanes_needed) > generate_cdg_for_sp(p_lash, dest, src, min_filled_lane); > > output_link = > p_lash->switches[src]->routing_table[dest].out_link; > - next_switch = > p_lash->switches[src]->phys_connections[output_link]; > + next_switch = get_next_switch(p_lash, src, output_link); > > output_link2 = > p_lash->switches[dest]->routing_table[src].out_link; > - next_switch2 = > p_lash->switches[dest]->phys_connections[output_link2]; > + next_switch2 = get_next_switch(p_lash, dest, output_link2); > > > CL_ASSERT(cdg_vertex_matrix[min_filled_lane][src][next_switch] != NULL); > > CL_ASSERT(cdg_vertex_matrix[min_filled_lane][dest][next_switch2] != NULL); > @@ -652,7 +649,7 @@ static switch_t *switch_create(lash_t * p_lash, unsigned > id, osm_switch_t * p_sw > } > > if (osm_mesh_node_create(p_lash, sw)) > - return -1; > + return NULL; > > sw->p_sw = p_sw; > if (p_sw) > @@ -673,6 +670,8 @@ static void switch_delete(switch_t * sw) > free(sw->phys_connections); > if (sw->routing_table) > free(sw->routing_table); > + if (sw->p_sw) > + sw->p_sw->priv = NULL; > free(sw); > } > > @@ -875,9 +874,8 @@ static int lash_core(lash_t * p_lash) > output_link2 = > > switches[dest_switch]->routing_table[i].out_link; > > - i_next_switch = > switches[i]->phys_connections[output_link]; > - i_next_switch2 = > - > switches[dest_switch]->phys_connections[output_link2]; > + i_next_switch = > get_next_switch(p_lash, i, output_link); > + i_next_switch2 = > get_next_switch(p_lash, dest_switch, output_link2); > > CL_ASSERT(p_lash-> > > cdg_vertex_matrix[v_lane][i][i_next_switch] != > @@ -1205,7 +1203,7 @@ static void process_switches(lash_t * p_lash) > osm_switch_t *p_sw, *p_next_sw; > osm_subn_t *p_subn = &p_lash->p_osm->subn; > > - /* Go through each swithc and process it. i.e build the connection > + /* Go through each switch and process it. i.e build the connection > structure required by LASH */ > p_next_sw = (osm_switch_t *) cl_qmap_head(&p_subn->sw_guid_tbl); > while (p_next_sw != (osm_switch_t *) > cl_qmap_end(&p_subn->sw_guid_tbl)) { > @@ -1229,7 +1227,9 @@ static int lash_process(void *context) > // everything starts here > lash_cleanup(p_lash); > > - discover_network_properties(p_lash); > + return_status = discover_network_properties(p_lash); > + if (return_status != IB_SUCCESS) > + goto Exit; > > return_status = init_lash_structures(p_lash); > if (return_status != IB_SUCCESS) > > From alekseys at voltaire.com Sun Nov 30 13:13:06 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Sun, 30 Nov 2008 23:13:06 +0200 Subject: [ofa-general] [RMDA CM IPv6 support. PATCHv4 1/6] AF_INET6 support for rdma_bind_addr References: <1227721899.3121.18.camel@alst60.voltaire.com><1228033480.3621.5.camel@alst60.voltaire.com> Message-ID: <39C75744D164D948A170E9792AF8E7CA428B8F@exil.voltaire.com> This is my first patch for kernel, and I thought that a smallest pieces is better, but now I'd like to say that the fat one, where all parts must to be applied together in order to work, more suitable for such change. -----Original Message----- From: Roland Dreier [mailto:rdreier at cisco.com] Sent: Sun 11/30/2008 7:28 PM To: Aleksey Senin Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] [RMDA CM IPv6 support. PATCHv4 1/6] AF_INET6 support for rdma_bind_addr > But.. All other patches depends one on another, and in my opinion better > to apply it all together, otherwise, when separated, all those 'if' > statements have no sense. Umm, OK. So why did you send 6 separate patches? From sashak at voltaire.com Sun Nov 30 13:15:19 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 30 Nov 2008 23:15:19 +0200 Subject: [ofa-general] Re: [PATCH][3] opensm: per mesh node information In-Reply-To: <010b01c95321$4e962a20$ebc27e60$@com> References: <000501c943d4$57b3f8f0$071bead0$@com> <20081130134857.GF9338@sashak.voltaire.com> <010b01c95321$4e962a20$ebc27e60$@com> Message-ID: <20081130211519.GQ9338@sashak.voltaire.com> On 13:24 Sun 30 Nov , Robert Pearson wrote: > Hi Sasha > > You wrote: > > > + if (!(node->links = calloc(num_ports, sizeof(link_t *)))) > > + goto err; > > + > > + for (i = 0; i < num_ports; i++) { > > + if (!(node->links[i] = calloc(1, sizeof(link_t))) || > > + !(node->links[i]->ports = calloc(num_ports, > > sizeof(int)))) > > + goto err; > > + } > > Assuming that ports array is preallocated, wouldn't it be simpler to > define link as: > > typedef struct _link { > int switch_id; > int link_id; > int num_ports; > int next_port; > int ports[0]; > } link_t; > > , and then: > > node->links[i] = calloc(1, sizeof(link_t *) + num_ports * > sizeof(int)))) > > ? > > (Similar optimizations are probably relevant in other places). > > I agree they accomplish the same goal. It is a tradeoff between code that is > a little shorter and faster and ease of understanding. I don't have strong > feelings. (For the same reason I tend to use 'x = calloc(1, foo)' instead of > 'x = malloc(foo); memset(x, 0, foo);' which is a very common usage pattern.) > > The same applies to your later note. We can represent a two dimensional > array as > > int **array; > > followed by array = calloc(1, n*sizeof(int *)); > array[i] = calloc(1, m*sizeof(int)); ... > > and then you get to type > > array[i][j] = xxx; > > vs > > int *array; > > array = calloc(1, m*n*sizeof(int)); > > and then > > array[i*m+j] = xxx; > > You can't use array[i][j] here because the compiler doesn't know the size of > the array until run time. > > > If the code is at all complex I prefer the [][] notation because it is > easier to read and understand. The optimizer in the compiler will take the > pointer dereference or the multiply out of inner loops so there is not > normally a big performance difference. > > I guess that this code is complex enough that at least for now it is > preferable to err on the side of keeping everything as straight forward as > possible until we are sure that it is correct. Then if performance is an > issue we can optimize it. > > I am happy either way. Let me know what you want me to do. Ok. Let's leave it for now and will look later. Sasha From rdreier at cisco.com Sun Nov 30 13:20:37 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 30 Nov 2008 13:20:37 -0800 Subject: [ofa-general] [RMDA CM IPv6 support. PATCHv4 1/6] AF_INET6 support for rdma_bind_addr In-Reply-To: <39C75744D164D948A170E9792AF8E7CA428B8F@exil.voltaire.com> (Aleksey Senin's message of "Sun, 30 Nov 2008 23:13:06 +0200") References: <1227721899.3121.18.camel@alst60.voltaire.com> <1228033480.3621.5.camel@alst60.voltaire.com> <39C75744D164D948A170E9792AF8E7CA428B8F@exil.voltaire.com> Message-ID: Aleksey> This is my first patch for kernel, and I thought that a Aleksey> smallest pieces is better, but now I'd like to say that the Aleksey> fat one, where all parts must to be applied together in Aleksey> order to work, more suitable for such change. Yes, it's a tricky balance. You don't want to combine multiple ideas in one patch, because such patches are hard to review and hard to debug later. But splitting one ideas into multiple patches also causes similar problems (and you have to get the pieces in the right order too). And of course the whole question of what constitues an "idea" is rather subjective. So we just do the best we can. - R. From sashak at voltaire.com Sun Nov 30 13:34:02 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 30 Nov 2008 23:34:02 +0200 Subject: [ofa-general] Re: [PATCH][10] opensm: hook mesh code into lash (updated) In-Reply-To: <00ad01c9444e$96e5f300$c4b1d900$@com> References: <00ad01c9444e$96e5f300$c4b1d900$@com> Message-ID: <20081130213401.GR9338@sashak.voltaire.com> On 16:41 Tue 11 Nov , Robert Pearson wrote: > Sasha, > > Here is the tenth patch implementing the mesh analysis algorithm. > I am resending it because I inadvertently left a bug in the last version. > > This patch > - hooks mesh code into lash > - replaces sw->phys_connections by the equivalent switch->node->links > - replaces sw->num_connections by the equivalent > switch->node->num_links > - replaces sw->virtual_physical_port_table by > switch->node->links[]->ports > > When the do_mesh_analysis flag is not set there is no change to the function > except To replace the variables with variables in node that have the same > size. In this Case the port table in link_t will always have just one port. > > When the do_mesh_analysis flag is set multiple physical links will collapse > to a Single logical link with a port list with more than one element. > > - fixed bug, mesh not set in osm_do_mesh_analysis I think it should be fixed in related patch. > - rewrote connect switches to use variables in node > - in log Lane requirements (%d) exceed available lanes (%d) > Arguments were reversed, fixed Nice finding. > - compute physical egress port in routine get_next_port > Which will use round robin if there are more than one > Physical links between switches > - changed printf's to OSM_LOG's in mesh.c > > Regards, > > Bob Pearson > > Signed-off-by: Bob Pearson > ---- > diff --git a/opensm/include/opensm/osm_ucast_lash.h > b/opensm/include/opensm/osm_ucast_lash.h > index c037571..f3bde5d 100644 > --- a/opensm/include/opensm/osm_ucast_lash.h > +++ b/opensm/include/opensm/osm_ucast_lash.h > @@ -82,9 +82,6 @@ typedef struct _switch { > unsigned lane; > } *routing_table; > mesh_node_t *node; > - unsigned int num_connections; > - int *virtual_physical_port_table; > - int *phys_connections; > } switch_t; > > typedef struct _lash { > diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c > index a248522..dbe3eeb 100644 > --- a/opensm/opensm/osm_mesh.c > +++ b/opensm/opensm/osm_mesh.c > @@ -750,7 +750,7 @@ static void make_geometry(lash_t *p_lash, int sw) > continue; > > if (l2 == -1) { > - printf("ERROR no reverse link\n"); > + OSM_LOG(p_log, OSM_LOG_DEBUG, "ERROR > no reverse link\n"); > continue; > } > > @@ -919,6 +919,7 @@ static int reorder_links(lash_t *p_lash, int sw) > */ > static int measure_geometry(lash_t *p_lash, int seed) > { > + osm_log_t *p_log = &p_lash->p_osm->log; > int i, j, k; > int sw; > switch_t *s, *s1; > @@ -942,7 +943,7 @@ static int measure_geometry(lash_t *p_lash, int seed) > assigned_axes++; > } > > - printf("lash: %d/%d unassigned/assigned axes\n", unassigned_axes, > assigned_axes); > + OSM_LOG(p_log, OSM_LOG_DEBUG, "%d/%d unassigned/assigned axes\n", > unassigned_axes, assigned_axes); > > do { > change = 0; > @@ -1069,8 +1070,7 @@ int osm_do_mesh_analysis(lash_t *p_lash) > int i; > mesh_t *mesh; > switch_t *s; > - > - OSM_LOG_ENTER(p_log); > + char buf[256], *p; > > /* > * allocate per mesh data structures > @@ -1080,6 +1080,8 @@ int osm_do_mesh_analysis(lash_t *p_lash) > return -1; > } > > + mesh = p_lash->mesh; > + > /* > * get local metric and invariant for each switch > * also classify each switch > @@ -1099,36 +1101,41 @@ int osm_do_mesh_analysis(lash_t *p_lash) > > s = p_lash->switches[max_class_type]; > > - printf("lash: found %d node type%s\n", mesh->num_class, > (mesh->num_class == 1)? "" : "s"); > - printf("lash: %snode type is ", (mesh->num_class == 1)? "" : "most > common "); > + OSM_LOG(p_log, OSM_LOG_INFO, "found %d node type%s\n", > mesh->num_class, (mesh->num_class == 1)? "" : "s"); > + > + p = buf; > + p += sprintf( p, "%snode type is ", (mesh->num_class == 1)? "" : > "most common "); > > if (s->node->type) { > struct _mesh_info *t = &mesh_info[s->node->type]; > > for (i = 0; i < t->dimension; i++) { > - printf("%s%d%s", i? "X" : "", t->size[i], > + p += sprintf(p, "%s%d%s", i? " x " : "", t->size[i], > (t->size[i] == 6)? "+" : ""); Would snprintf() be more suitable here in order to prevent potential overflow? (This is a nit - dimension value is limited now in mesh_info structure). > } > - printf(" mesh\n"); > + p += sprintf(p, " mesh\n"); > > p_lash->mesh->dimension = t->dimension; > } else { > - printf("unknown geometry\n"); > + p += sprintf(p, "unknown geometry\n"); > } > > + OSM_LOG(p_log, OSM_LOG_INFO, "%s", buf); > + > if (s->node->type) { > make_geometry(p_lash, max_class_type); > > if (measure_geometry(p_lash, max_class_type)) > return -1; > > - printf("lash: found "); > + p = buf; > + p += sprintf(p, "found "); > for (i = 0; i < mesh->dimension; i++) > - printf("%s%d", i? "X" : "", mesh->size[i]); > - printf(" mesh\n"); > - } > + p += sprintf(p, "%s%d", i? " x " : "", > mesh->size[i]); > + p += sprintf(p, " mesh\n"); > > - OSM_LOG_EXIT(p_log); > + OSM_LOG(p_log, OSM_LOG_INFO, "%s", buf); > + } > > return 0; > } > diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c > index 95dbcc2..660ad56 100644 > --- a/opensm/opensm/osm_ucast_lash.c > +++ b/opensm/opensm/osm_ucast_lash.c > @@ -67,16 +67,53 @@ static cdg_vertex_t *create_cdg_vertex(unsigned > num_switches) > static void connect_switches(lash_t * p_lash, int sw1, int sw2, int > phy_port_1) > { > osm_log_t *p_log = &p_lash->p_osm->log; > - unsigned num = p_lash->switches[sw1]->num_connections; > + unsigned num = p_lash->switches[sw1]->node->num_links; > + switch_t *s1 = p_lash->switches[sw1]; > + mesh_node_t *node = s1->node; > + switch_t *s2; > + link_t *l; > + int i; > + > + /* > + * if doing mesh analysis: > + * - do not consider connections to self > + * - collapse multiple connections between > + * pair of switches to a single locical link > + */ > + if (p_lash->p_osm->subn.opt.do_mesh_analysis) { > + if (sw1 == sw2) > + return; This 'if (sw1 == sw2)' is related for non mesh case too, right? Sasha > + > + /* see if we are alredy linked to sw2 */ > + for (i = 0; i < num; i++) { > + l = node->links[i]; > + > + if (node->links[i]->switch_id == sw2) { > + l->ports[l->num_ports++] = phy_port_1; > + return; > + } > + } > + } > + > + l = node->links[num]; > + l->switch_id = sw2; > + l->link_id = -1; > + l->ports[l->num_ports++] = phy_port_1; > + > + s2 = p_lash->switches[sw2]; > + for (i = 0; i < s2->node->num_links; i++) { > + if (s2->node->links[i]->switch_id == sw1) { > + s2->node->links[i]->link_id = num; > + l->link_id = i; > + break; > + } > + } > > - p_lash->switches[sw1]->phys_connections[num] = sw2; > - p_lash->switches[sw1]->virtual_physical_port_table[num] = > phy_port_1; > - p_lash->switches[sw1]->num_connections++; > + node->num_links++; > > OSM_LOG(p_log, OSM_LOG_VERBOSE, > "LASH connect: %d, %d, %d\n", sw1, sw2, > phy_port_1); > - > } > > static osm_switch_t *get_osm_switch_from_port(osm_port_t * port) > @@ -148,7 +185,7 @@ static int cycle_exists(cdg_vertex_t * start, > cdg_vertex_t * current, > > static inline int get_next_switch(lash_t *p_lash, int sw, int link) > { > - return p_lash->switches[sw]->phys_connections[link]; > + return p_lash->switches[sw]->node->links[link]->switch_id; > } > > static void remove_semipermanent_depend_for_sp(lash_t * p_lash, int sw, > @@ -233,8 +270,8 @@ static int get_phys_connection(switch_t *sw, int > switch_to) > { > unsigned int i = 0; > > - for (i = 0; i < sw->num_connections; i++) > - if (sw->phys_connections[i] == switch_to) > + for (i = 0; i < sw->node->num_links; i++) > + if (sw->node->links[i]->switch_id == switch_to) > return i; > return i; > } > @@ -252,8 +289,8 @@ static void shortest_path(lash_t * p_lash, int ir) > > while (!cl_is_list_empty(&bfsq)) { > dequeue(&bfsq, &sw); > - for (i = 0; i < sw->num_connections; i++) { > - swi = switches[sw->phys_connections[i]]; > + for (i = 0; i < sw->node->num_links; i++) { > + swi = switches[sw->node->links[i]->switch_id]; > if (swi->q_state == UNQUEUED) { > enqueue(&bfsq, swi); > sw->dij_channels[sw->used_channels++] = > swi->id; > @@ -614,25 +651,8 @@ static switch_t *switch_create(lash_t * p_lash, > unsigned id, osm_switch_t * p_sw > return NULL; > } > > - sw->virtual_physical_port_table = malloc(num_ports * sizeof(int)); > - if (!sw->virtual_physical_port_table) { > - free(sw->dij_channels); > - free(sw); > - return NULL; > - } > - > - sw->phys_connections = malloc(num_ports * sizeof(int)); > - if (!sw->phys_connections) { > - free(sw->virtual_physical_port_table); > - free(sw->dij_channels); > - free(sw); > - return NULL; > - } > - > sw->routing_table = malloc(num_switches * > sizeof(sw->routing_table[0])); > if (!sw->routing_table) { > - free(sw->phys_connections); > - free(sw->virtual_physical_port_table); > free(sw->dij_channels); > free(sw); > return NULL; > @@ -643,18 +663,13 @@ static switch_t *switch_create(lash_t * p_lash, > unsigned id, osm_switch_t * p_sw > sw->routing_table[i].lane = NONE; > } > > - for (i = 0; i < num_ports; i++) { > - sw->virtual_physical_port_table[i] = -1; > - sw->phys_connections[i] = NONE; > - } > - > - if (osm_mesh_node_create(p_lash, sw)) > - return NULL; > - > sw->p_sw = p_sw; > if (p_sw) > p_sw->priv = sw; > > + if (osm_mesh_node_create(p_lash, sw)) > + return NULL; > + > return sw; > } > > @@ -664,10 +679,6 @@ static void switch_delete(switch_t * sw) > > if (sw->dij_channels) > free(sw->dij_channels); > - if (sw->virtual_physical_port_table) > - free(sw->virtual_physical_port_table); > - if (sw->phys_connections) > - free(sw->phys_connections); > if (sw->routing_table) > free(sw->routing_table); > if (sw->p_sw) > @@ -972,7 +983,7 @@ Error_Not_Enough_Lanes: > status = -1; > OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 4D02: " > "Lane requirements (%d) exceed available lanes (%d)\n", > - p_lash->vl_min, lanes_needed); > + lanes_needed, p_lash->vl_min); > Exit: > if (switch_bitmap) > free(switch_bitmap); > @@ -985,6 +996,21 @@ static unsigned get_lash_id(osm_switch_t * p_sw) > return ((switch_t *) p_sw->priv)->id; > } > > +int get_next_port(switch_t *sw, int link) > +{ > + link_t *l = sw->node->links[link]; > + int port = l->next_port++; > + > + /* > + * note if not doing mesh analysis > + * then num_ports is always 1 > + */ > + if (l->next_port >= l->num_ports) > + l->next_port = 0; > + > + return l->ports[port]; > +} > + > static void populate_fwd_tbls(lash_t * p_lash) > { > osm_log_t *p_log = &p_lash->p_osm->log; > @@ -1036,9 +1062,7 @@ static void populate_fwd_tbls(lash_t * p_lash) > (uint8_t) sw-> > > routing_table[dst_lash_switch_id].out_link; > uint8_t physical_egress_port = > - (uint8_t) sw-> > - virtual_physical_port_table > - [lash_egress_port]; > + get_next_port(sw, lash_egress_port); > > p_sw->lft_buf[lid] = physical_egress_port; > OSM_LOG(p_log, OSM_LOG_VERBOSE, > > From sashak at voltaire.com Sun Nov 30 15:54:14 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 1 Dec 2008 01:54:14 +0200 Subject: [ofa-general] Re: {PATCH] [2] opensm: per mesh data In-Reply-To: <000101c943ce$d2707880$77516980$@com> References: <000101c943ce$d2707880$77516980$@com> Message-ID: <20081130235414.GS9338@sashak.voltaire.com> On 01:26 Tue 11 Nov , Robert Pearson wrote: > Sasha, > > Here is the second patch implementing the mesh analysis algorithm. > > This patch: > - creates a data structure, mesh_t, that holds per mesh information > - adds a pointer to this structure in lash_t > - creates methods to allocate and free memory for mesh_t > - adds osm_ prefix to global routine names (oops) > - calls create and cleanup methods > > Regards, > > Bob Pearson > > Signed-off-by: Bob Pearson > ---- > diff --git a/opensm/include/opensm/osm_mesh.h > b/opensm/include/opensm/osm_mesh.h > index 1467440..8313614 100644 > --- a/opensm/include/opensm/osm_mesh.h > +++ b/opensm/include/opensm/osm_mesh.h > @@ -41,6 +41,18 @@ > > struct _lash; > > -int do_mesh_analysis(struct _lash *p_lash); > +/* > + * per fabric mesh info > + */ > +typedef struct _mesh { > + int num_class; /* number of switch classes */ > + int *class_type; /* index of first switch found for > each class */ > + int *class_count; /* population of each class */ > + int dimension; /* mesh dimension */ > + int *size; /* an array to hold size of mesh */ > +} mesh_t; > + > +void osm_mesh_cleanup(struct _lash *p_lash); > +int osm_do_mesh_analysis(struct _lash *p_lash); > > #endif > diff --git a/opensm/include/opensm/osm_ucast_lash.h > b/opensm/include/opensm/osm_ucast_lash.h > index 646e9a3..1ae3bb6 100644 > --- a/opensm/include/opensm/osm_ucast_lash.h > +++ b/opensm/include/opensm/osm_ucast_lash.h > @@ -95,6 +95,7 @@ typedef struct _lash { > cdg_vertex_t ****cdg_vertex_matrix; > int *num_mst_in_lane; > int ***virtual_location; > + mesh_t *mesh; > } lash_t; > > #endif > diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c > index 7943274..c97925b 100644 > --- a/opensm/opensm/osm_mesh.c > +++ b/opensm/opensm/osm_mesh.c > @@ -41,6 +41,7 @@ > #endif /* HAVE_CONFIG_H */ > > #include > +#include > #include > #include > #include > @@ -48,15 +49,72 @@ > #include > > /* > + * osm_mesh_cleanup - free per mesh resources > + */ > +void osm_mesh_cleanup(lash_t *p_lash) > +{ > + mesh_t *mesh = p_lash->mesh; > + > + if (mesh) { > + if (mesh->class_type) > + free(mesh->class_type); > + > + if (mesh->class_count) > + free(mesh->class_count); > + > + free(mesh); > + > + p_lash->mesh = NULL; > + } > +} > + > +/* > + * mesh_create - allocate per mesh resources > + */ > +static int mesh_create(lash_t *p_lash) > +{ > + osm_log_t *p_log = &p_lash->p_osm->log; > + mesh_t *mesh; > + > + if(!(mesh = p_lash->mesh = calloc(1, sizeof(mesh_t)))) { > + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating mesh - out > of memory\n"); > + return -1; > + } > + > + if (!(mesh->class_type = calloc(p_lash->num_switches, sizeof(int)))) > { > + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating > mesh->class_type - out of memory\n"); > + free(mesh); > + return -1; > + } > + > + if (!(mesh->class_count = calloc(p_lash->num_switches, > sizeof(int)))) { > + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating > mesh->class_count - out of memory\n"); > + free(mesh->class_type); > + free(mesh); > + return -1; > + } > + > + return 0; > +} > + > +/* > * do_mesh_analysis > */ > -int do_mesh_analysis(lash_t *p_lash) > +int osm_do_mesh_analysis(lash_t *p_lash) > { > int ret = 0; > osm_log_t *p_log = &p_lash->p_osm->log; > > OSM_LOG_ENTER(p_log); > > + /* > + * allocate per mesh data structures > + */ > + if (mesh_create(p_lash)) { > + OSM_LOG_EXIT(p_log); > + return -1; > + } > + > printf("lash: do_mesh_analysis stub called\n"); > > OSM_LOG_EXIT(p_log); > diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c > index e10371c..3577cca 100644 > --- a/opensm/opensm/osm_ucast_lash.c > +++ b/opensm/opensm/osm_ucast_lash.c > @@ -825,7 +825,7 @@ static int lash_core(lash_t * p_lash) > > OSM_LOG_ENTER(p_log); > > - if (p_lash->p_osm->subn.opt.do_mesh_analysis && > do_mesh_analysis(p_lash)) { > + if (p_lash->p_osm->subn.opt.do_mesh_analysis && > osm_do_mesh_analysis(p_lash)) { > OSM_LOG(p_log, OSM_LOG_ERROR, "Mesh analysis failed\n"); > goto Exit; > } > @@ -1124,6 +1124,8 @@ static void lash_cleanup(lash_t * p_lash) > free(p_lash->switches); > } > p_lash->switches = NULL; > + > + osm_mesh_cleanup(p_lash); > } lash_cleanup() is called at start of LASH processor, so mesh will keep allocated data between routing calculation cycles. But as far as I can see it is not used there. Also osm_mesh_cleanup() is not called on lash deletion and we have a memory leak. Maybe osm_mesh_cleanup() should be static function (mesh_cleanup()) and be called somewhere at end of osm_do_mesh_analysis()? Sasha From aostvold at platform.com Sun Nov 30 23:01:09 2008 From: aostvold at platform.com (Asmund Ostvold) Date: Mon, 01 Dec 2008 08:01:09 +0100 Subject: [ofa-general] receiving wrong data after trying to allocation a too large memory chunk In-Reply-To: <493104BA.9090607@platform.com> References: <493104BA.9090607@platform.com> Message-ID: <49338BB5.6010709@platform.com> I apologize for my dyslectic subject. This should be better. We would very much like to know if anybody else can reproduce the results? If you need more info please contact us. Regards, Asmund (dyslectic programmer) Asmund Ostvold wrote: > We discovered a strange problem running OFED; We're not sure if it is a > OFED problem but we post it here anyway. > > > Short description: > We have a program that allocates a set of buffers with valloc, sends > them with ibv_post_send and free them. > This is run in loop; > We have a "caching"-algorithm so that we register memory only the first > time we come across a buffer address. > We starts getting wrong data for parts of sends after a couple of > iterations > > There are a few things worth mentioning: > - We must use valloc; the test works with malloc > - We must have a malloc allocating a too large chunk before starting the > loop (the malloc fails) > > We have modified the "rdma_lat.c" program to show the error (attached) > > Regards > Asmund > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From nicolas.morey-chaisemartin at ext.bull.net Sun Nov 30 23:11:31 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Mon, 01 Dec 2008 08:11:31 +0100 Subject: [ofa-general] GitWeb really slow Message-ID: <49338E23.9000600@ext.bull.net> Hi, This is not necessary the best place to post it but I was wondering why is ofed's gitweb so slow on the main page? It takes only a few seconds to display all the repository on kernel.org (and there's a lot more) but it takes nearly a minute to display the OFED git main page... I know it's probably not the most critical issue you have to work on but I connect quite often on this page and it starts to be really bugging me. And I'm probably not the only one ;) Thanks in advance Nicolas Morey-Chaisemartin From sean.hefty at intel.com Sun Nov 30 23:41:37 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Sun, 30 Nov 2008 23:41:37 -0800 Subject: [ofa-general] [RMDA CM IPv6 support. PATCHv4 1/6] AF_INET6 support for rdma_bind_addr In-Reply-To: References: <1227721899.3121.18.camel@alst60.voltaire.com> <1228033480.3621.5.camel@alst60.voltaire.com> <39C75744D164D948A170E9792AF8E7CA428B8F@exil.voltaire.com> Message-ID: <000101c95388$3e141150$dce0180a@amr.corp.intel.com> >Yes, it's a tricky balance. You don't want to combine multiple ideas in >one patch, because such patches are hard to review and hard to debug >later. But splitting one ideas into multiple patches also causes >similar problems (and you have to get the pieces in the right order >too). And of course the whole question of what constitues an "idea" is >rather subjective. So we just do the best we can. The patch set taken collectively looks good to me. I think it makes sense to view the series as 2 patches, one for ib_addr (patches 4-6), and one for rdma_cm (patches 1-3). The ib_addr patch should come first. - Sean